Data Mining:
Knowledge discovery in databases
—
Presented by:
ANKITA AGARWAL
DEEPIKA RAIPURIA
MODY INSTITUTE OF TECHNOLOGY AND
SCIENCE,LAXMANGARH
December 3, 2009 1
Introduction
Motivation: Why data mining?
What is data mining?
Classification of data mining systems
Architecture: Typical Data Mining System
Data mining functionality
December 3, 2009 2
Necessity Is the Mother of
Invention
Data explosion problem
Automated data collection tools and mature database
technology lead to tremendous amounts of data
accumulated and/or to be analyzed in databases, data
warehouses, and other information repositories
We are drowning in data, but starving for knowledge!
Solution: Data warehousing and data mining
Data warehousing and on-line analytical processing
Miing interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases
December 3, 2009 3
Evolution of Database
Technology
1960s:
Data collection, database creation, IMS and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO, deductive,
etc.)
Application-oriented DBMS (spatial, scientific, engineering, etc.)
1990s:
Data mining, data warehousing, multimedia databases, and Web
databases
2000s
Stream data management and mining
Data mining with a variety of applications
Web technology and global information systems
December 3, 2009 4
What Is Data Mining?
Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
Data mining: a misnomer?
Alternative names
Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
Watch out: Is everything “data mining”?
(Deductive) query processing.
Expert systems or small ML/statistical programs
December 3, 2009 5
Data Mining: A KDD Process
Data mining—core of Pattern Evaluation
knowledge discovery
process
Data Mining
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
December 3, 2009 6
Steps of a KDD Process
Learning the application domain
relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation
Find useful features, dimensionality/variable reduction, invariant
representation.
Choosing functions of data mining
summarization, classification, regression, association, clustering.
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.
Use of discovered knowledge
December 3, 2009 7
Architecture: Typical Data Mining
System
Graphical user interface
Pattern evaluation
Data mining engine
Knowledge-
Database or
base
data warehouse
server
Data cleaning & data integration Filtering
Data
Databases Warehouse
December 3, 2009 8
Data Mining Functionalities
Concept description: Characterization and discrimination
Generalize, summarize, and contrast data characteristics, e.g., dry vs.
wet regions
Association (correlation and causality)
Diaper Beer [0.5%, 75%]
Classification and Prediction
Construct models (functions) that describe and distinguish classes or
concepts for future prediction
E.g., classify countries based on climate, or classify cars based on
gas mileage
Presentation: decision-tree, classification rule, neural network
Predict some unknown or missing numerical values
December 3, 2009 9
Data Mining Functionalities
(2)
Cluster analysis
Class label is unknown: Group data to form new
classes, e.g., cluster houses to find distribution
patterns
Maximizing intra-class similarity & minimizing
interclass similarity
Outlier analysis
Outlier: a data object that does not comply with
the general behavior of the data
Noise or exception? No! useful in fraud detection,
rare events analysis
December 3, 2009 10
Summary
Data mining: discovering interesting patterns from large amounts
of data
A natural evolution of database technology, in great demand, with
wide applications
A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
Mining can be performed in a variety of information repositories
Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis,
etc.
Data mining systems and architectures
Major issues in data mining
December 3, 2009 11
Recommended Reference
Books
R. Agrawal, J. Han, and H. Mannila, Readings in Data
Mining: A Database Perspective, Morgan Kaufmann
(in preparation)
J. Han and M. Kamber. Data Mining: Concepts and
Techniques. Morgan Kaufmann, 2001
December 3, 2009 12
Where to Find the Set of
Slides?
Book page: (MS PowerPoint files):
[Link]/~hanj/dmbook
Updated course presentation slides (.ppt):
[Link]/~cs497jh/
Research papers, DBMiner system, and other
related information:
[Link]/~hanj or [Link]
December 3, 2009 13
Thank you !!!
December 3, 2009 14