0% found this document useful (0 votes)

42 views32 pages

Introduction to Data Mining Concepts

The document provides an introduction to data mining and knowledge discovery. It discusses the motivation for data mining due to the large amount of data being collected. It defines data mining as the process of extracting interesting and useful patterns and knowledge from large amounts of data. The document also outlines the typical steps involved in a knowledge discovery process including data cleaning, transformation, mining, pattern evaluation, and knowledge presentation. Finally, it discusses the different types of data that data mining can be applied to such as relational databases, data warehouses, transactional databases, and other advanced databases.

Uploaded by

api-27259648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views32 pages

Introduction to Data Mining Concepts

Uploaded by

api-27259648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Chapter 1.

Introduction

 Motivation: Why data mining?

 What is data mining?
 Data Mining: On what kind of data?
 Data mining functionality
 Are all the patterns interesting?
 Classification of data mining systems
 Major issues in data mining
Motivation: “Necessity is the
Mother of Invention”

 Data explosion problem

 Automated data collection tools and mature database
technology lead to tremendous amounts of data stored
in databases, data warehouses and other information
repositories
 We are drowning in data, but starving for knowledge!
 Solution: Data warehousing and data mining

Data warehousing and on-line analytical processing

 Extraction of interesting knowledge (rules, regularities,

patterns, constraints) from data in large databases
Evolution of Database
Technology
(See Fig. 1.1)

 1960s:
 Data collection, database creation, IMS and network
DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial,
scientific, engineering, etc.)
 1990s—2000s:
 Data mining and data warehousing, multimedia
databases, and Web databases
What Is Data Mining?

 Data mining (knowledge discovery in

databases):
 Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
information or patterns from data in large
databases
 Alternative names and their “inside stories”:
 Data mining: a misnomer?
 Knowledge discovery(mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
 What is not data mining?
 (Deductive) query processing.
Relation to Statistics
 Statistical concept as determining a Data
Distribution and calculating a mean and a variance
can be viewed as data mining techniques.

 Often used tool in data mining & machine learning

is that of sampling.

 Statistics research has produced many of the

proposed data mining algorithms.

 Some data mining applications determine

correlations among data.

 Statistical inference techniques can be viewed as

special estimators and prediction methods.
Difference between
DM & Statistics
“Data mining is meant
to be used by the
business user – not by
the statistician “
Empirical life cycle for any
scientific research

Analysis

Hypothesis
Theorem
Data Collection

Prediction
Why Data Mining? — Potential
Applications
 Database analysis and decision support
 Market analysis and management

target marketing, customer relation management,
market basket analysis, cross selling, market
segmentation
 Risk analysis and management

Forecasting, customer retention, improved
underwriting, quality control, competitive analysis
 Fraud detection and management
 Other Applications
 Text mining (news group, email, documents) and Web
analysis.
 Intelligent query answering
Data Mining: A KDD Process

Pattern Evaluation
 Data mining: the core
of knowledge
discovery process. Data Mining

Task-relevant Data
Data Transformation

Data Selection
Warehouse
Data Cleaning

Data Integration

Databases
Steps of a KDD Process
 (1)
Learning the application domain:
 relevant prior knowledge and goals of application

 Creating a target data set: data selection

 Data cleaning and preprocessing: (may take 60% of effort!)

 Data reduction and transformation:

 Find useful features, dimensionality/variable reduction,
invariant representation.

 Choosing functions of data mining

 summarization, classification, regression, association,
clustering.

 Choosing the mining algorithm(s)

 Data mining: search for patterns of interest

 Pattern evaluation and knowledge presentation

 visualization, transformation, removing redundant patterns,
Steps of a KDD Process (2)
 Data cleaning – To remove noise & inconsistent data.
 Data integration – Where multiple data sources may be
combined.

 Data selection – Relevant data are retrieved from the data

base.


Data transformation – Transformed or consolidated
-Summary or Aggregation.

 Data mining – An essential process where intelligent

methods are applied to extract data
pattern.

 Pattern evolution – To identify the truly interesting patterns.

 Knowledge presentation – To present mined knowledge to
the user.
Data Mining and Business
Intelligence
Increasing potential
to support
business decisions End User
Making
Decisions

Data Presentation Business

Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts

OLAP, MDA DBA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
Architecture of a Typical
Data Mining System
Graphical user interface

Pattern evaluation

Data mining engine

Knowledge-
Database or
base
data warehouse
server
Data cleaning & data integration Filtering

Data
Databases Warehouse
Architecture Major Components (1)
 Database or other information repository

 Data warehouse – Fetching the relevant data

 Knowledge Base
- Domain knowledge
. Interestingness
. Constraints
. Threshold
. Meta data
- Concept Hierarchy

 Data mining Engine – Functional module for task

- Characterization
- Association
- Classification
- Cluster Analysis
- Evolution
- Derivation Analysis
Architecture Major Components (2)

 Pattern evaluation Module

- Employs Interestingness Measures

- Focus the search towards interest
 Graphical User Interface

- Communicates b/w users & the DM system

- uses query
 Discovered Knowledge can be applied to:

- Decision making
- Process control
- Information management
- Query Processing
Data Mining: On What Kind
of Data?

 Relational databases
 Data warehouses
 Transactional databases
 Advanced DB and information repositories
 Object-oriented and object-relational
databases
 Spatial databases
 Time-series data and temporal data
 Text databases and multimedia databases
 Heterogeneous and legacy databases

 Relational Database
Is a collection of tables, each of which is assigned a unique
name.
Each table consists of
Attributes (Cols or fields)
Tuples (Records or rows)
DM in RDB – is searching for trends or data patterns.
RDB – most popularly available, rich information
repositories.
It consists of the major data for DM.

 Transactional Database
Consists of a file.
Each record represents a transaction.
It includes a unique transaction identity no.
List of item included in transactions.

 Object-Oriented Database
Based on OOPS.
Each object associated with : set of variable, methods.
Instance.
Inheritance.
 Object-Relational Database – Extends the basic relational data
model
by adding the power to handle:
- Complex data types.
- Class hierarchies.
- Object inheritance.
 Spatial Database – Contains spatial-related information.
- Includes: Geographic (map) DB.
VLSI chip design DB.
Medical & Satellite image DB.
- Represented in vector format.
- Maps are: roads, bridges, buildings & lakes.
- Mining can be used in:
Describing the characteristics of houses located
near a
specified kind of location, such as a park.
 Temporal Database or Time series Database – Stores time
related data
- Usually stores relational data that include time-related
attributes
- It involves several time stamps.
- Time may be decomposed to fiscal, academic, calendar
year.
 Text Database – Contains word description for objects.
- Error or bug reports, warning message, summary reports,
notes
- Unstructured: Web pages.
- Semi structured: e-mail messages.
- Well structured: Library DB.

 Multimedia Database – Stores image, audio and video data.

- Used in:
Picture content-based retrieval.
Voice-mail system.
Video-on-demand systems.

 Heterogeneous & Legacy Database

- Long history of information technology department.
- A large database in a group of heterogeneous DB.
- It contains different types of data systems.
- Relational or object database.
- Hierarchical database.
- N/W database.
- Spreadsheets.
- MM database.
- File system.
Data Mining Functionalities
(1)
 Concept description: Characterization and
discrimination
 Generalize, summarize, and contrast data
characteristics, e.g., dry vs. wet regions
 Association (correlation and causality)
 Multi-dimensional vs. single-dimensional
association
 age(X, “20..29”) ^ income(X, “20..29K”) 
buys(X, “PC”) [support = 2%, confidence =
60%]

Data Mining Functionalities
(2)
 Classification and Prediction
 Finding models (functions) that describe and distinguish
classes or concepts for future prediction
 E.g., classify countries based on climate, or classify cars
based on gas mileage
 Presentation: decision-tree, classification rule, neural
network
 Prediction: Predict some unknown or missing numerical
values
 Cluster analysis
 Class label is unknown: Group data to form new classes,
e.g., cluster houses to find distribution patterns
 Clustering based on the principle: maximizing the intra-
Data Mining Functionalities
(3)
 Outlier analysis
 Outlier: a data object that does not comply with the general
behavior of the data
 It can be considered as noise or exception but is quite useful
in fraud detection, rare events analysis

 Trend and evolution analysis

 Trend and deviation: regression analysis
 Sequential pattern mining, periodicity analysis
 Similarity-based analysis
 Other pattern-directed or statistical analyses
Are All the “Discovered”
Patterns Interesting?
 A data mining system/query may generate thousands of
patterns, not all of them are interesting.
 Suggested approach: Human-centered, query-based, focused
mining
 Interestingness measures: A pattern is interesting if it is
easily understood by humans, valid on new or test data
with some degree of certainty, potentially useful, novel, or
validates some hypothesis that a user seeks to confirm
 Objective vs. subjective interestingness measures:
 Objective: based on statistics and structures of patterns, e.g.,
support, confidence, etc.
 Subjective: based on user’s belief in the data, e.g.,
Can We Find All and Only
Interesting Patterns?

 Find all the interesting patterns: Completeness

 Can a data mining system find all the interesting
patterns?
 Association vs. classification vs. clustering
 Search for only interesting patterns: Optimization
 Can a data mining system find only the interesting
patterns?
 Approaches
 First generate all the patterns and then filter out the
uninteresting ones.
 Generate only the interesting patterns—mining query
Data Mining: Confluence of Multiple
Disciplines

Database Statistics Image Process

Technology

Neural N/W
High [Link].

Machine
Learning Data Mining Data Visualization

Information
Info. Retrieval
Science

Pattern Recogn.
Signal Processing Spatial Data Anal.
Data Mining: Classification
Schemes

 General functionality
 Descriptive data mining
 Predictive data mining
 Different views, different
classifications
 Kinds of databases to be mined
 Kinds of knowledge to be discovered
 Kinds of techniques utilized
A Multi-Dimensional View of
Data Mining Classification
 Databases to be mined
 Relational, transactional, object-oriented, object-

relational, active, spatial, time-series, text, multi-media,

heterogeneous, legacy, WWW, etc.
 Knowledge to be mined
 Characterization, discrimination, association,

classification, clustering, trend, deviation and outlier

analysis, etc.
 Multiple/integrated functions and mining at multiple

levels
 Techniques utilized
 Database-oriented, data warehouse (OLAP), machine

learning, statistics, visualization, neural network, etc.

 Applications adapted
OLAP Mining: An Integration of
Data Mining and Data
Warehousing
 Data mining systems, DBMS, Data
warehouse systems coupling
 No coupling, loose-coupling, semi-tight-coupling, tight-
coupling
 On-line analytical mining data
 integration of mining and OLAP technologies
 Interactive mining multi-level knowledge
 Necessity of mining knowledge and patterns at different
levels of abstraction by drilling/rolling, pivoting,
slicing/dicing, etc.
 Integration of multiple mining functions
An OLAM Architecture
Mining query Mining result Layer4
User Interface
User GUI API
Layer3
OLAM OLAP
Engine Engine OLAP/OLAM

Data Cube API

Layer2
MDDB
MDDB
Meta
Data
Filtering& Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse Repository
Major Issues in Data Mining (1)
 Mining methodology and user interaction
 Mining different kinds of knowledge in databases
- Since different users can be interested in different kinds of knowledge,
data mining should cover a wide spectrum of data analysis and KD tasks.
 Interactive mining of knowledge at multiple levels of abstraction
- Since it is difficult to know exactly what can be discovered within a DB.
 Incorporation of background knowledge
-Such as integrity constraints and deduction rules, can help focus and
speed up a data mining process.
 Data mining query languages and ad-hoc data mining
-Such a language should be integrated with a DB or DW/H query
language, and optimized for efficient and flexible data mining.
 Presentation and visualization of data mining results
-Discovered knowledge should be expressed in high level languages,
visual representation (i.e) easily understood and directly usable by
humans. Such as trees, tables, rules, graphs, charts, crosstabs, matrices,
curves.
 Handling noise and incomplete data
-As a result the accuracy of the discovered patterns can be poor.

 Pattern evaluation: the interestingness problem

Major Issues in Data Mining (2)
 Performance and scalability
 Efficiency and scalability of data mining algorithms
-The running time of a Data Mining algorithm must be predictable and
acceptable in large Database.
 Parallel, distributed and incremental mining methods
-Such algorithms divide the data into partitions, which are processed in
parallel.
-Not from scratch, it will amend and strengthen what was previously
discovered.
 Issues relating to the diversity of data types
 Handling relational and complex types of data
-One may expect to have different data mining systems for different
kinds of data.
 Mining information from heterogeneous databases and global
information systems (WWW)
-Becomes a very challenging and highly dynamic field in data mining.
 Issues related to applications and social impacts
 Application of discovered knowledge
 Domain-specific data mining tools

 Intelligent query answering

 Process control and decision making

Summary
 Data mining: discovering interesting patterns from large
amounts of data
 A natural evolution of database technology, in great
demand, with wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation,
and knowledge presentation
 Mining can be performed in a variety of information
repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend
analysis, etc.
 Classification of data mining systems

Data Mining and Warehousing Course Overview
No ratings yet
Data Mining and Warehousing Course Overview
84 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
106 pages
Evolution of Database Technology and Data Mining
No ratings yet
Evolution of Database Technology and Data Mining
27 pages
Data Mining Overview and Applications
100% (1)
Data Mining Overview and Applications
115 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
52 pages
Data Mining & Warehousing Overview
No ratings yet
Data Mining & Warehousing Overview
31 pages
Understanding Data vs. Information
No ratings yet
Understanding Data vs. Information
145 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
39 pages
Understanding Data Mining Concepts
No ratings yet
Understanding Data Mining Concepts
32 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
48 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
55 pages
Data Objects and Discretization in Mining
No ratings yet
Data Objects and Discretization in Mining
76 pages
Overview of Data Mining Techniques
No ratings yet
Overview of Data Mining Techniques
46 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
32 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
54 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
46 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
27 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
27 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
45 pages
Understanding Data Mining Fundamentals
No ratings yet
Understanding Data Mining Fundamentals
323 pages
Data Mining Applications at Jamia Hamdard
No ratings yet
Data Mining Applications at Jamia Hamdard
43 pages
Data Mining in Bioinformatics Overview
No ratings yet
Data Mining in Bioinformatics Overview
43 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
93 pages
Data Mining Functionality Overview
No ratings yet
Data Mining Functionality Overview
34 pages
Understanding Data Mining Motivation
No ratings yet
Understanding Data Mining Motivation
86 pages
Data Mining Process Overview
No ratings yet
Data Mining Process Overview
77 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
27 pages
History and Patterns in Data Mining
No ratings yet
History and Patterns in Data Mining
25 pages
Understanding Data Mining Concepts
No ratings yet
Understanding Data Mining Concepts
66 pages
Comprehensive Guide to Data Mining Techniques
No ratings yet
Comprehensive Guide to Data Mining Techniques
17 pages
Overview of Data Mining Concepts
No ratings yet
Overview of Data Mining Concepts
71 pages
Database Technology and Data Mining Evolution
No ratings yet
Database Technology and Data Mining Evolution
59 pages
Data Mining Course Outcomes Overview
No ratings yet
Data Mining Course Outcomes Overview
124 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
27 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
10 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
33 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
27 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
36 pages
Data Mining: Issues and Motivations
No ratings yet
Data Mining: Issues and Motivations
23 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
34 pages
Understanding Knowledge Discovery Process
No ratings yet
Understanding Knowledge Discovery Process
13 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
10 pages
Data Mining Techniques and Motivation
No ratings yet
Data Mining Techniques and Motivation
84 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
59 pages
Data Mining: Functions and Applications
No ratings yet
Data Mining: Functions and Applications
46 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
38 pages
Data Mining for Business Intelligence
No ratings yet
Data Mining for Business Intelligence
35 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
38 pages
FDS Unit 1
No ratings yet
FDS Unit 1
14 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
16 pages
DB4 1
No ratings yet
DB4 1
24 pages
Unit 1
No ratings yet
Unit 1
31 pages
Data Mining
No ratings yet
Data Mining
234 pages
Enhancing PPT Engagement in Data Mining
No ratings yet
Enhancing PPT Engagement in Data Mining
83 pages
GDI Mining in Data Analysis Techniques
No ratings yet
GDI Mining in Data Analysis Techniques
145 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
27 pages
Advanced Data Mining & Warehousing Course
No ratings yet
Advanced Data Mining & Warehousing Course
31 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
93 pages
Reference Letter for Jagan Raj.J
No ratings yet
Reference Letter for Jagan Raj.J
1 page
Pentium
No ratings yet
Pentium
1 page
Virtual Memory and TLB Overview
No ratings yet
Virtual Memory and TLB Overview
5 pages
Students with No Standing Arrears
No ratings yet
Students with No Standing Arrears
2 pages
Application Format 126 GDOC
No ratings yet
Application Format 126 GDOC
2 pages
Intel 80186 Pin Out
No ratings yet
Intel 80186 Pin Out
20 pages
Unit 2 Final
No ratings yet
Unit 2 Final
46 pages
8279 Keyboard/Display Interface Overview
100% (2)
8279 Keyboard/Display Interface Overview
4 pages
Flash Menu Embedding Tutorial
100% (1)
Flash Menu Embedding Tutorial
9 pages
CTS Topper Shopper List - SVCE
No ratings yet
CTS Topper Shopper List - SVCE
1 page
Sri Venkateswara College PG Semester Results
No ratings yet
Sri Venkateswara College PG Semester Results
2 pages
Photoshop Night Effects Tutorial
No ratings yet
Photoshop Night Effects Tutorial
5 pages
Application Layer Protocols Overview
No ratings yet
Application Layer Protocols Overview
46 pages
MCA IV Semester Time Table 2008
No ratings yet
MCA IV Semester Time Table 2008
2 pages
Flash Masking Techniques Explained
No ratings yet
Flash Masking Techniques Explained
1 page
Greenhouse Effect Explained by Heat Storage
No ratings yet
Greenhouse Effect Explained by Heat Storage
8 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
55 pages
Absolute Humidity and Temperature Data Analysis
No ratings yet
Absolute Humidity and Temperature Data Analysis
27 pages
Market Basket Analysis with A-Priori
No ratings yet
Market Basket Analysis with A-Priori
30 pages
Data Mining: Classification Techniques
No ratings yet
Data Mining: Classification Techniques
71 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
77 pages
OLE DB for Data Mining Overview
No ratings yet
OLE DB for Data Mining Overview
21 pages
Five-Day Weather Forecasts for MA & PA
No ratings yet
Five-Day Weather Forecasts for MA & PA
30 pages
Mathematica 9.0.1 Cracked Download Guide
0% (2)
Mathematica 9.0.1 Cracked Download Guide
4 pages
Implementing Cisco IP Switched Networks (SWITCH v2.0)
No ratings yet
Implementing Cisco IP Switched Networks (SWITCH v2.0)
12 pages
HDFS Architecture and Components Overview
No ratings yet
HDFS Architecture and Components Overview
30 pages
CS Soft Solutions Pvt. Ltd Overview
No ratings yet
CS Soft Solutions Pvt. Ltd Overview
53 pages
JavaScript and HTML Programming Questions
No ratings yet
JavaScript and HTML Programming Questions
8 pages
Overview of the 80386 Microprocessor
No ratings yet
Overview of the 80386 Microprocessor
24 pages
Java MCQ Questions and Answers Guide
100% (2)
Java MCQ Questions and Answers Guide
10 pages
JNTUH B.Tech Certificate Details
No ratings yet
JNTUH B.Tech Certificate Details
21 pages
ASCP API Implementation Guide
No ratings yet
ASCP API Implementation Guide
3 pages
MT6571 Configuration and Layout Details
No ratings yet
MT6571 Configuration and Layout Details
6 pages
MySQL Database Tutorial Guide
No ratings yet
MySQL Database Tutorial Guide
44 pages
CPU Scheduling in Linux vs. UNIX
No ratings yet
CPU Scheduling in Linux vs. UNIX
14 pages
Average Access Time for 2-Level Cache
No ratings yet
Average Access Time for 2-Level Cache
23 pages
DBMS Exam Questions and Guidelines
No ratings yet
DBMS Exam Questions and Guidelines
2 pages
Stacking RBMs for Deep Learning
No ratings yet
Stacking RBMs for Deep Learning
39 pages
Inner and Self Joins Explained
No ratings yet
Inner and Self Joins Explained
19 pages
SAP TAO: Runtime Library Overview
No ratings yet
SAP TAO: Runtime Library Overview
7 pages
Types of Security Threats Explained
No ratings yet
Types of Security Threats Explained
6 pages
SQL SELECT Statement - Examples
No ratings yet
SQL SELECT Statement - Examples
1 page
Programming Pearls: Second Edition
No ratings yet
Programming Pearls: Second Edition
4 pages
Building a Gbox Server Guide
No ratings yet
Building a Gbox Server Guide
4 pages
Understanding Linker and Compiler Errors
No ratings yet
Understanding Linker and Compiler Errors
13 pages
Introduction to Systems Modeling Basics
No ratings yet
Introduction to Systems Modeling Basics
17 pages
Boolean Algebra and Logic Gates Overview
100% (5)
Boolean Algebra and Logic Gates Overview
39 pages
Classical Synchronization Problems in OS
No ratings yet
Classical Synchronization Problems in OS
18 pages
TCL Static Timimg Analysis Xilinx
No ratings yet
TCL Static Timimg Analysis Xilinx
34 pages
Spring MVC Interview Questions Guide
No ratings yet
Spring MVC Interview Questions Guide
12 pages
BE Computer 2012 Course 27-8-15
No ratings yet
BE Computer 2012 Course 27-8-15
64 pages
Netezza Monitoring with Nagios Guide
No ratings yet
Netezza Monitoring with Nagios Guide
27 pages
Understanding Tree Data Structures
No ratings yet
Understanding Tree Data Structures
42 pages

Introduction to Data Mining Concepts

Uploaded by

Introduction to Data Mining Concepts

Uploaded by

Chapter 1.

 Motivation: Why data mining?

 Data explosion problem

 Extraction of interesting knowledge (rules, regularities,

 Data mining (knowledge discovery in

 Often used tool in data mining & machine learning

 Statistics research has produced many of the

 Some data mining applications determine

 Statistical inference techniques can be viewed as

 Creating a target data set: data selection

 Data cleaning and preprocessing: (may take 60% of effort!)

 Data reduction and transformation:

 Choosing functions of data mining

 Choosing the mining algorithm(s)

 Data mining: search for patterns of interest

 Pattern evaluation and knowledge presentation

 Data selection – Relevant data are retrieved from the data

 Data mining – An essential process where intelligent

 Pattern evolution – To identify the truly interesting patterns.

Data Presentation Business

Data Warehouses / Data Marts

Data mining engine

 Data warehouse – Fetching the relevant data

 Data mining Engine – Functional module for task

 Pattern evaluation Module

- Employs Interestingness Measures

- Communicates b/w users & the DM system

 Multimedia Database – Stores image, audio and video data.

 Heterogeneous & Legacy Database

 Trend and evolution analysis

 Find all the interesting patterns: Completeness

Database Statistics Image Process

relational, active, spatial, time-series, text, multi-media,

classification, clustering, trend, deviation and outlier

learning, statistics, visualization, neural network, etc.

Data Cube API

 Pattern evaluation: the interestingness problem

 Intelligent query answering

 Process control and decision making

You might also like