KNIME
PRESENTED BY-
JAIMINI SOLANKI
SUCHITA MISHRA
STUTI SMART
What is KNIME?
KNIME stands for Konstanz Information Miner
KNIME is a free and open-source data analytics, reporting and
integration platform.
KNIME integrates various components for machine learning and data
mining through its modular data pipelining concept.
A graphical user interface and use of JDBC allows assembly of nodes
blending different data sources, including pre-processing (ETL:
Extraction, Transformation, Loading), for modelling , data analysis and
visualization without, or with only minimal, programming.
To some extent as advanced analytics tool KNIME can be considered as
a SAS alternative.
Since 2006, KNIME has been used in pharmaceutical research, It also
used in other areas like CRM customer data analysis, business
intelligence and financial data analysis.
History of KNIME
The Development of KNIME was started January 2004 by a team of software
engineers at University of Konstanz as a proprietary product.
The original developer team headed by Michael Berthold came from a
company in Silicon Valley providing software for the pharmaceutical industry.
The initial goal was to create a modular, highly scalable and open data
processing platform which allowed for the easy integration of different data
loading, processing, transformation, analysis and visual exploration modules
without the focus on any particular application area.
The platform was intended to be a collaboration and research platform and
should also serve as an integration platform for various other data analysis
projects
History of KNIME
In 2006 the first version of KNIME was released and several
pharmaceutical companies started using KNIME and a number of life
science software vendors began integrating their tools into KNIME.
As of 2012, KNIME is in use by over 15,000 actual users (i.e. not
counting downloads but users regularly retrieving updates when they
become available) not only in the life sciences but also at banks,
publishers, car manufacturer, telcos, consulting firms, and various
other industries but also at a large number of research groups
worldwide.
Latest updates to KNIME Server and KNIME Big Data Extensions,
provide support for Apache Spark 2.0.
KNIME
KNIME allows users to visually create data flows (or pipelines),
selectively execute some or all analysis steps, and later inspect the
results, models, and interactive views.
KNIME is written in Java and based on Eclipse and makes use of its
extension mechanism to add plugins providing additional functionality.
The core version already includes hundreds of modules for data
integration (file I/O, database nodes supporting all common database
management systems through JDBC or native connectors: SQLite, SQL
Server, MySQL, PostgreSQL, Vertica and H2), data transformation
(filter, converter, splitter, combiner, joiner) as well as the commonly
used methods of statistics, data mining, analysis and text analytics.
KNIME
Visualization supports with the free Report Designer extension.
KNIME workflows can be used as data sets to create report templates
that can be exported to document formats like doc, ppt, xls, pdf and
others.
KNIME is implemented in Java but also allows for wrappers calling
other code in addition to providing nodes that allow to run Java,
Python, Perl and other code fragments.
Capabilities of KNIME
KNIMEs core-architecture allows processing of large data volumes that
are only limited by the available hard disk space (not limited to the
available RAM). E.g. KNIME allows analysis of 300 million customer
addresses, 20 million cell images and 10 million molecular structures.
Additional plugins allows the integration of methods for Text mining,
Image mining, as well as time series analysis.
KNIME integrates various other open-source projects, e.g. machine
learning algorithms from Weka, the statistics package R project, as well
as LIBSVM, JFreeChart, ImageJ and the Chemistry Development Kit.