0% found this document useful (0 votes)
46 views14 pages

Data Mining Overview at MITS Laxmangarh

The document is a presentation on data mining that was given on December 3, 2009. It introduces data mining, discussing why it is necessary due to data explosion problems. It defines data mining as the extraction of interesting patterns from large amounts of data and outlines the typical steps in a knowledge discovery process, including data cleaning, transformation, mining, and pattern evaluation. Finally, it summarizes several common data mining functionalities such as concept description, association, classification, clustering, and outlier analysis.

Uploaded by

api-20013961
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views14 pages

Data Mining Overview at MITS Laxmangarh

The document is a presentation on data mining that was given on December 3, 2009. It introduces data mining, discussing why it is necessary due to data explosion problems. It defines data mining as the extraction of interesting patterns from large amounts of data and outlines the typical steps in a knowledge discovery process, including data cleaning, transformation, mining, and pattern evaluation. Finally, it summarizes several common data mining functionalities such as concept description, association, classification, clustering, and outlier analysis.

Uploaded by

api-20013961
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Data Mining:

Knowledge discovery in databases


Presented by:
ANKITA AGARWAL
DEEPIKA RAIPURIA
MODY INSTITUTE OF TECHNOLOGY AND
SCIENCE,LAXMANGARH

December 3, 2009 1
Introduction

 Motivation: Why data mining?


 What is data mining?
 Classification of data mining systems
 Architecture: Typical Data Mining System
 Data mining functionality

December 3, 2009 2
Necessity Is the Mother of
Invention
 Data explosion problem
 Automated data collection tools and mature database
technology lead to tremendous amounts of data
accumulated and/or to be analyzed in databases, data
warehouses, and other information repositories
 We are drowning in data, but starving for knowledge!
 Solution: Data warehousing and data mining

Data warehousing and on-line analytical processing

 Miing interesting knowledge (rules, regularities, patterns,


constraints) from data in large databases
December 3, 2009 3
Evolution of Database
Technology
 1960s:
 Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive,
etc.)
 Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s
 Stream data management and mining
 Data mining with a variety of applications
 Web technology and global information systems
December 3, 2009 4
What Is Data Mining?
 Data mining (knowledge discovery from data)
 Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge
from huge amount of data
 Data mining: a misnomer?
 Alternative names
 Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
 Watch out: Is everything “data mining”?
 (Deductive) query processing.
 Expert systems or small ML/statistical programs
December 3, 2009 5
Data Mining: A KDD Process

 Data mining—core of Pattern Evaluation


knowledge discovery
process
Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

Databases
December 3, 2009 6
Steps of a KDD Process

 Learning the application domain


 relevant prior knowledge and goals of application
 Creating a target data set: data selection
 Data cleaning and preprocessing: (may take 60% of effort!)
 Data reduction and transformation
 Find useful features, dimensionality/variable reduction, invariant
representation.
 Choosing functions of data mining
 summarization, classification, regression, association, clustering.
 Choosing the mining algorithm(s)
 Data mining: search for patterns of interest
 Pattern evaluation and knowledge presentation
 visualization, transformation, removing redundant patterns, etc.
 Use of discovered knowledge

December 3, 2009 7
Architecture: Typical Data Mining
System

Graphical user interface

Pattern evaluation

Data mining engine


Knowledge-
Database or
base
data warehouse
server
Data cleaning & data integration Filtering

Data
Databases Warehouse

December 3, 2009 8
Data Mining Functionalities
 Concept description: Characterization and discrimination
 Generalize, summarize, and contrast data characteristics, e.g., dry vs.
wet regions
 Association (correlation and causality)
 Diaper  Beer [0.5%, 75%]
 Classification and Prediction
 Construct models (functions) that describe and distinguish classes or
concepts for future prediction
 E.g., classify countries based on climate, or classify cars based on
gas mileage
 Presentation: decision-tree, classification rule, neural network
 Predict some unknown or missing numerical values

December 3, 2009 9
Data Mining Functionalities
(2)
 Cluster analysis
 Class label is unknown: Group data to form new

classes, e.g., cluster houses to find distribution


patterns
 Maximizing intra-class similarity & minimizing

interclass similarity
 Outlier analysis
 Outlier: a data object that does not comply with

the general behavior of the data


 Noise or exception? No! useful in fraud detection,

rare events analysis

December 3, 2009 10
Summary
 Data mining: discovering interesting patterns from large amounts
of data
 A natural evolution of database technology, in great demand, with
wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis,
etc.
 Data mining systems and architectures
 Major issues in data mining

December 3, 2009 11
Recommended Reference
Books

 R. Agrawal, J. Han, and H. Mannila, Readings in Data


Mining: A Database Perspective, Morgan Kaufmann
(in preparation)
 J. Han and M. Kamber. Data Mining: Concepts and
Techniques. Morgan Kaufmann, 2001

December 3, 2009 12
Where to Find the Set of
Slides?

 Book page: (MS PowerPoint files):


 [Link]/~hanj/dmbook
 Updated course presentation slides (.ppt):
 [Link]/~cs497jh/

 Research papers, DBMiner system, and other


related information:
 [Link]/~hanj or [Link]

December 3, 2009 13
Thank you !!!
December 3, 2009 14

You might also like