Introduction to Data Mining Concepts

The document discusses data mining, including its motivation due to increasing data volumes, definitions of data mining, the types of data it can be applied to such as databases and data warehouses, and data mining functionalities including classification, clustering, and association rule discovery.

Uploaded by

addis alemayhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views21 pages

Introduction to Data Mining Concepts

Uploaded by

addis alemayhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Bahir DarInstitute of Technology

Faculty
of Computing
Department of Information System
Data Mining and Data Warehousing

By: Belete B.

1
Chapter one
Introduction
 In this chapter we will cover the following issues in
brief
– Motivation: Why data mining?
– What is data mining?
– Data Mining: On what kind of data?
– Data mining functionalities

2
Motivation:
“Necessity is the Mother of Invention”
 Our capacity of generating and collecting data have been
increased rapidly in the last several decades
 Huge amount of data is available at the tip of our hand
 It is predicted that more data will be produced in the next
year than has been generated during the entire existence of
humankind!
 According to Witten and Frank, it is estimated that the
amount of data stored in the world's database grows every
twenty months at a rate of 100%

3
Motivation:
“Necessity is the Mother of Invention”
 Contributing factors include
– Widespread use of bar code for most commercial products,
– Computerization of many business, scientific, and governmental
transactions,
– Advances in data collection tools (audio, video, satellite remote
sensing, scanning, image capturing tools)
– Usage of WWW as a global information system (the Internet in
general),
– Development of comprehensive application software,
– New computing and storage technologies

4
Motivation:
“Necessity is the Mother of Invention”
 All this have made it easier to create, collect, and store all
types of data
 As a result it creates a problem what is called data explosion
 Data explosion is the problem of having huge amount of
data in an enterprise stored in databases, data warehouses
and other information repositories generated by automated
data collection tools
 As the volume of data increases, the proportion of
information in which people could understand decreases
substantially or as the size of data get larger, analyzing the
data becomes very difficult
5
Motivation:
“Necessity is the Mother of Invention”
 This shows that the level of understanding of people about
the data at hand could not keep pace with the rate of
generation of data in various forms, which results in
increasing information gap
 Consequently, scholars begin to realize this bottleneck and
to look into possible remedies/solutions
 Current technological progress permits the storage and
6
access of large amounts of data at virtually no cost
Motivation:
“Necessity is the Mother of Invention”
 The true value is not in storing the data, but rather in our ability to
extract useful reports and to find interesting trends & correlations
to support decisions and policies made by businesses
 We are drowning in data, but starving for knowledge!

 To bridge the gap of analyzing large volume of data and

extracting useful information and knowledge for decision making
that the new generation of computerized methods known as Data
Mining (DM) has emerged in recent years
7
What is Data Mining?
 Different scholars provided different definitions about DM
 According to Berry and Linoff (2000); Han and Kamber (2006), DM
is the process of extracting or mining knowledge from large amounts
of data in order to discover meaningful patterns and rules
 Data mining is extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful) information or patterns
from data in large databases (e.g. data warehouse)
 The term Data mining is a misnomer as it doesn’t directly related to
what is does
 Data mining should best describe as knowledge mining from data
rather than data mining
 Any way, we will use the term with this understanding
8
What is Data Mining?
 Alternative names
– Knowledge discovery (mining) from databases (KDD),

– knowledge extraction,

– data/pattern analysis,

– data archeology,

– data dredging,

– information harvesting,

– business intelligence, etc.

9
What is Data Mining?
 DM involves the use of sophisticated data analysis tools
to discover previously unknown, valid patterns and
relationships in large datasets
– These tools can include statistical models, mathematical
algorithms, and machine learning methods
 According to Han and Kamber (2006), the major reason
that DM has attracted a great deal of attention in the
information industry in recent years is due to
– the wide availability of huge amounts of data and
– the imminent/expected need for turning such data into useful
information and knowledge
 The information and knowledge gained can be used for
applications ranging from market analysis, fraud detection,
and customer retention, to production control and science
10
exploration
Data Mining: On What Kind of Data?
 In principle, data mining is not specific to one type of media
or data
– Data mining should be applicable to any kind of information
repository
– Data mining is being put into use and studied for databases,
 Relational databases
– a collection of tables, each of which is assigned a unique name.
– are one of the most commonly available and rich information
repositories, and thus they are a major data form in our study of data
mining
– DM algorithms using relational databases can be more versatile than
DM algorithms specifically written for flat files, since they can take
advantage of the structure inherent to relational databases
– While DM can benefit from SQL for data selection, transformation and
consolidation, it goes beyond what SQL could provide, such as
predicting, comparing, detecting deviations, etc 11
Data Mining: On What Kind of Data?
 Data warehouses
– is a repository of information collected from multiple sources,
stored under a unified schema, and that usually resides at a single
site
– Data warehouses are constructed via a process of data cleaning,
data integration, data transformation, data loading, and periodic
data refreshing
– To facilitate decision making, the data in a data warehouse are
organized around major subjects, such as customer, item, supplier,
and activity
– The data are stored to provide information from a historical
perspective and are typically summarized
 Transactional databases
– consists of a file where each record represents a transaction
– One typical data mining analysis on such data is the so-called market
basket analysis 12
Data Mining: On What Kind of Data?
 Advanced DB and information repositories
– Spatial databases: store geographical information like maps,
and global or regional positioning
• Such spatial databases present new challenges to data mining
algorithms
– Multimedia databases: include video, images, audio, and text
media
• It is characterized by its high dimensionality, which makes data
mining even more challenging
– WWW: is the most heterogeneous and dynamic repository
available
• Conceptually, the World Wide Web is comprised of three major
components the content of the Web, the structure of the Web , & the
usage of the web
• Data mining in the WWW, or web mining, is often divided into web
content mining, web structure mining and web usage mining 13
Data Mining Functionalities
 Data mining functionalities are used to specify the kind of
patterns to be found in data mining task
 Generally data mining task can be broadly classified as
– Descriptive (unsupervised)
– Predictive (supervised)
 Descriptive data mining task characterize the general
properties of the data in a database
 Predictive data mining task perform inference on the
current data in order to make prediction to the future
reference
– permits the value of one variable to be predicted from the known
values of other variables 14
Data Mining Functionalities
 The supervised predictive data mining functionalities includes
 Classification
 Regression
 Time series
 Prediction
 The unsupervised descriptive data mining functionalities includes
 Association rule discovery
 Clustering analysis
 Summarization
 Sequence discovery

15
Data Mining Functionalities:
Classification

 Classification is the process of finding a set of models that

describe and distinguish data classes for the purpose of being able
to use the model to predict the class of an object whose class is
unknown
– The derived class is based on training data set and can be represented in
various forms such as classification IF—THEN rule, decision tree,
mathematical formulae or neural networks
– Classification approaches normally use a training set where all objects are
already associated with known class labels
– The classification algorithm learns from the training set and builds a model
– The model is used to classify new objects
 There are different algorithms that are used for classification
purpose such as, decision tree, neural network, genetic algorithm,
naïve bayes, etc 16
Data Mining Functionalities
Cluster analysis
 Clustering is a DM technique that finds similarities between
data according to the characteristics found in the data and
group’s similar data objects into one cluster
 In cluster Analysis, class labels are unknown and a group of
data is given to be classified
 The objective of clustering is to distribute cases (people,
objects, events etc.) into groups, so that the degree of
association can be strong between members of the same
cluster (intra-class similarity) and weak between members of
different clusters (inter-class similarity) 17
17
Data Mining Functionalities
Cluster analysis
 Clustering tools assign groups of records to the same cluster if
they have something in common, making it easier to discover
meaningful patterns from the dataset
 Clustering often serves as a starting point for some supervised
DM techniques or modeling
 Generally, similar to classification, clustering is the organization
of data in classes
 However, unlike classification, in clustering, class labels are
unknown and it is up to the clustering algorithm to discover
acceptable classes 18
18
Data Mining Functionalities:
Association Rule Mining

 Association rule mining aims to extract interesting

correlations, frequent patterns, associations or casual
structures among sets of items in the transaction databases
or other data repositories
 It studies the frequency of items occurring together in
transactional databases, and based on a threshold called
support, identifies the frequent item sets
 Another threshold, confidence, which is the conditional
probability that an item appears in a transaction when
another item appears, is used to pinpoint association rules
 Association analysis is commonly used for market basket
analysis 19
19
20
Quiz 1
1. What is Data mining and what is it used for?
2. What is association rule mining technique? Give
two example association rule mining algorisms
3. What are the main reasons for DM to attract a
great deal of attention in the information
industry in recent years according to Han and
Kamber?
4. What do we call DM which is applied in WWW and what are the
three aspects of DM that can be applied on WWW

Data Mining and Warehousing Overview
No ratings yet
Data Mining and Warehousing Overview
25 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
55 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
38 pages
CHAPTER 1 and 2
No ratings yet
CHAPTER 1 and 2
61 pages
Introduction to Data Warehousing & Mining
No ratings yet
Introduction to Data Warehousing & Mining
54 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
60 pages
Understanding Data Mining Motivation
No ratings yet
Understanding Data Mining Motivation
4 pages
Overview of Data Mining Techniques
No ratings yet
Overview of Data Mining Techniques
46 pages
Understanding Data Mining Concepts
No ratings yet
Understanding Data Mining Concepts
32 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
17 pages
5 Data Mining
No ratings yet
5 Data Mining
127 pages
Data Mining: Issues and Motivations
No ratings yet
Data Mining: Issues and Motivations
23 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
38 pages
Data Mining Notes
No ratings yet
Data Mining Notes
27 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
59 pages
Data Mining Techniques Overview
No ratings yet
Data Mining Techniques Overview
37 pages
Data Warehousing & Mining Overview
No ratings yet
Data Warehousing & Mining Overview
91 pages
Understanding Data Mining Concepts
No ratings yet
Understanding Data Mining Concepts
21 pages
Introduction to Data-Analytic Thinking
No ratings yet
Introduction to Data-Analytic Thinking
30 pages
Data Mining Concepts and Techniques Guide
No ratings yet
Data Mining Concepts and Techniques Guide
47 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
40 pages
Key Challenges in Data Mining
No ratings yet
Key Challenges in Data Mining
3 pages
Data Mining and Warehousing Syllabus
No ratings yet
Data Mining and Warehousing Syllabus
27 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
39 pages
Data Mining Overview and Applications
100% (1)
Data Mining Overview and Applications
115 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
50 pages
Unit 1
No ratings yet
Unit 1
31 pages
Understanding Data Mining Fundamentals
No ratings yet
Understanding Data Mining Fundamentals
323 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
130 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
29 pages
Data Mining and Warehousing Overview
No ratings yet
Data Mining and Warehousing Overview
395 pages
Data Mining
No ratings yet
Data Mining
234 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
31 pages
Data Mining Concepts and Applications
No ratings yet
Data Mining Concepts and Applications
27 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
43 pages
Data Mining: Techniques and Applications
No ratings yet
Data Mining: Techniques and Applications
54 pages
Data Mining and Business Intelligence Overview
No ratings yet
Data Mining and Business Intelligence Overview
140 pages
Understanding Data Mining Processes
No ratings yet
Understanding Data Mining Processes
49 pages
Module 1
No ratings yet
Module 1
97 pages
Key Issues in Data Mining Explained
No ratings yet
Key Issues in Data Mining Explained
24 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
37 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
37 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
43 pages
Data Mining and Warehousing Overview
No ratings yet
Data Mining and Warehousing Overview
60 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
27 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
32 pages
Understanding Data Mining Techniques
No ratings yet
Understanding Data Mining Techniques
22 pages
Data Mining Concepts and Techniques
No ratings yet
Data Mining Concepts and Techniques
36 pages
Introduction to Data Mining Concepts
No ratings yet
Introduction to Data Mining Concepts
46 pages
Understanding Data Mining in DWDM
No ratings yet
Understanding Data Mining in DWDM
30 pages
Data Mining: Concepts and Applications
No ratings yet
Data Mining: Concepts and Applications
25 pages
Data Mining and Preprocessing Overview
No ratings yet
Data Mining and Preprocessing Overview
13 pages
Understanding Data Mining and Business Intelligence
No ratings yet
Understanding Data Mining and Business Intelligence
38 pages
Classification of Data Mining Techniques
No ratings yet
Classification of Data Mining Techniques
55 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
27 pages
Big Data Drug Recommendation System
No ratings yet
Big Data Drug Recommendation System
24 pages
Introduction To Probability
No ratings yet
Introduction To Probability
88 pages
Understanding Big Data: Key Concepts & Challenges
No ratings yet
Understanding Big Data: Key Concepts & Challenges
15 pages
AI, Data Mining & Statistical Tools Overview
No ratings yet
AI, Data Mining & Statistical Tools Overview
24 pages
Data Mining Concepts and Applications
100% (1)
Data Mining Concepts and Applications
46 pages
Efficient Mining of Closed Itemsets
No ratings yet
Efficient Mining of Closed Itemsets
8 pages
Unsupervised Learning and Clustering Techniques
No ratings yet
Unsupervised Learning and Clustering Techniques
31 pages
B.Tech Data Analytics Model Paper 2025-26
No ratings yet
B.Tech Data Analytics Model Paper 2025-26
2 pages
Data Mining and Knowledge Discovery Guide
No ratings yet
Data Mining and Knowledge Discovery Guide
17 pages
MBA Exam 2024: Business Analytics
No ratings yet
MBA Exam 2024: Business Analytics
6 pages
Yutaka Watanobe's Academic Impact
No ratings yet
Yutaka Watanobe's Academic Impact
20 pages
Data Mining: Concepts and Architecture
100% (3)
Data Mining: Concepts and Architecture
30 pages
Data Mining Assignment 2 Overview
No ratings yet
Data Mining Assignment 2 Overview
2 pages
Business Intelligence Exam Questions
No ratings yet
Business Intelligence Exam Questions
3 pages
Data Mining Findings & Business Tips
No ratings yet
Data Mining Findings & Business Tips
4 pages
Data Warehouse: OLAP Operations Explained
No ratings yet
Data Warehouse: OLAP Operations Explained
3 pages
Data Warehousing & Mining Syllabus JNTUK
No ratings yet
Data Warehousing & Mining Syllabus JNTUK
2 pages
Mining Massive Data Sets Course Guide
No ratings yet
Mining Massive Data Sets Course Guide
3 pages
Introduction to WEKA Tool and Features
No ratings yet
Introduction to WEKA Tool and Features
49 pages
May Jun 2025
No ratings yet
May Jun 2025
2 pages
Association Rule Mining Techniques
No ratings yet
Association Rule Mining Techniques
34 pages
Knowledge Representation in Data Mining
No ratings yet
Knowledge Representation in Data Mining
45 pages
Supervised vs Unsupervised Learning Guide
No ratings yet
Supervised vs Unsupervised Learning Guide
12 pages
Advanced Data Mining Course Overview
No ratings yet
Advanced Data Mining Course Overview
2 pages
Machine Learning Course Overview and Concepts
No ratings yet
Machine Learning Course Overview and Concepts
225 pages
Ensemble Techniques in AIML
No ratings yet
Ensemble Techniques in AIML
26 pages
Data Warehousing & Mining Question Bank
No ratings yet
Data Warehousing & Mining Question Bank
8 pages
Data Mining Lab Manual for CSE Students
No ratings yet
Data Mining Lab Manual for CSE Students
24 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
7 pages
Data Warehousing for Decision Support
No ratings yet
Data Warehousing for Decision Support
65 pages

Introduction to Data Mining Concepts

Uploaded by

Introduction to Data Mining Concepts

Uploaded by

Bahir DarInstitute of Technology

 To bridge the gap of analyzing large volume of data and

– business intelligence, etc.

 Classification is the process of finding a set of models that

 Association rule mining aims to extract interesting

You might also like