0% found this document useful (0 votes)

6 views18 pages

Data Mining Notes

Bayesian classification uses Bayes' theorem to predict class membership probabilities based on observed data. The naïve Bayesian classifier simplifies the process by assuming attribute independence, allowing for efficient computation of posterior probabilities. The document also covers k-means clustering, the EM algorithm, agglomerative clustering, and k-NN classification as additional methods for data classification and clustering.

Uploaded by

cse viji2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views18 pages

Data Mining Notes

Uploaded by

cse viji2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

BAYESIAN CLASSIFICATION

o Bayesian classifiers are statistical classifiers. They can predict class

membership probabilities, such as the probability that a given tuple belongs to a
particular class.

o Bayesian classification is based on Bayes’ theorem. Bayes’ theorem is named

after Thomas Bayes.

Bayes’ Theorem

o Bayes theorem is:

P (H|X) = P (X|H) P (H)

-------------------------------
P(X)

o Let X be a data tuple.

o X is considered “evidence.” It is described by measurements made on a set of n

attributes.

o Let H be some hypothesis, such as that the data tuple X belongs to a specified
class C.

o P (H|X) is the probability that the hypothesis H holds given the “evidence” or
observed data tuple X.

o P (H|X) is the posterior probability, or a posteriori probability, of H

conditioned on X.

o P (H) is the priori probability of H.

o P (X|H) is the posteriori probability of X conditioned on H.

o P(X) is the prior probability of X.

o Example :
Suppose the data tuples are customers described by the attributes age and
income, X is a 35-year old customer with an income of $40,000; H is the
hypothesis that our customer will buy a computer.
P (H|X) represents the probability that customer X will buy a computer given
that we know the customer’s age and income.

P (H) is the probability that any given customer will buy a computer,
regardless of age, income or any other information.

P (X|H) is the probability that a customer X, is 35 years old and earns

$40,000, given that we know that a customer will buy a computer.

P(X) is the probability that a person from our set of customers is 35 years old
and earns $40,000.

Example:

Consider a football game between two rival teams Team 0 and Team 1. Suppose
Team 0 wins 65% of the time and Team 1 wins the remaining matches. Among the
games won by Team 0, only 30% of them come from paying on Team 1’s football
field. On the other hand, 75% of the victories for team 1 are obtained while playing
at home. If Team 1 is to host the next match between the two teams, which team will
most likely emerge as the winner?

1. Probability Team 0 wins is P(Y=0)=0.65

2. Probability Team 1 wins is P(Y=1)=1-0.65 = 0.35
3. Probability Team 1 hosted the match, it wins is P(X=1|Y=0)=0.75
4. Probability Team 1 hosted the match won by team 0 is P(X=1|Y=0)=0.3

Compute P(Y=1|X=1) is the conditional probability that Team 1 wins the next match
it will be hosting, and compares it against P(Y=0|X=1).

Bayes Theorem,

P(Y=1|X=1) = P(X=1|Y=1) * P(Y = 1)

------------------------------------
P(X=1)

= 0.75 X 0.35 / 0.75 x0.35 + 0.3 X0.65 = 0.5738

P(Y=0|X =1) = 1-0.5738 = 0.4262

P(Y=1|X =1) > P(Y=0|X =1), Team 1 has a better chance than Team 0 of winning the
next match.

Naïve Bayesian Classification

The naïve Bayesian classifier, or simple Bayesian classifier, works as follows:

Step 1:
Let D be a training set of tuples and their associated class labels.
 Each tuple is represented by an n-dimensional attribute vector, X =
(x1, x2, : : : , xn), depicting n measurements made on the tuple from n
attributes, respectively, A1, A2, : : : , An.
Step 2:

Suppose that there are m classes, C1, C2, : : : , Cm. Given a tuple, X, the classifier
will predict that X belongs to the class having the highest posterior
probability, conditioned on X.

 The naïve Bayesian classifier predicts that tuple X belongs to the

class Ci if and only if,

 The class Ci for which P(CijX) is maximized is called the maximum

posteriori hypothesis.

Step 3:
As P(X) is constant for all classes, only P(XjCi)P(Ci) need be maximized. If the class
prior probabilities are not known, then it is commonly assumed that the classes are
equally likely, that is, P(C1) = P(C2) = … = P(Cm), and we would therefore maximize
P(X|Ci). Otherwise, we maximize P(X|Ci)P(Ci).

Step 4:
A simplified assumption: attributes are conditionally independent:
Where V are the data samples, vi is the value of attribute i on the sample and C j is
the j-th class.

 Greatly reduces the computation cost, only count the class

distribution.
 xk refers to the value of attribute Ak for tuple X.
 For each attribute, we look at whether the attribute is categorical or
continuous-valued.
o For instance, to compute P(X|Ci), we consider the following:
(a) If Ak is categorical, then P(xk|Ci) is the number of tuples
of class Ci in D having the value xk for Ak, divided by |
Ci,D|, the number of tuples of class Ci in D.

(b) If Ak is continuous-valued, it is assumed to have a

Gaussian distribution with a mean μ and standard
deviation s.
Step 5:
In order to predict the class label of X, P(X|Ci)P(Ci) is evaluated for each class Ci. The
classifier predicts that the class label of tuple X is the class Ci if and only if,

Advantages:

 Have the minimum error rate in comparison to all other classifiers.

 Easy to implement
 Good results obtained in most of the cases.
 They provide theoretical justification for other classifiers that do not
explicitly use Bayes theorem.
Disadvantages:

 Lack of available probability data.

 Inaccuracies in the assumption.
Example 1: Predicting a class label using naïve Bayesian classification.
The training data are described by the attributes age, income, student, and credit
rating.

The class label attribute, buys computer, has two distinct values (namely, {yes, no}).

Let, C1 correspond to the class buys computer = yes and

C2 correspond to buys computer = no.

The tuple is to classify is

X = (age = youth, income = medium, student = yes, credit rating = fair)

Solution:
Example 2:

Predicting a class label using naïve Bayesian classification. The training data set is
given below. The data tuples are described by the attributes Owns Home?, Married,
Gender and Employed. The class label attribute Risk Class has three distinct values.
Let C1 corresponds to the class A, and C2 corresponds to the class B and C3
corresponds to the class C.

The tuple is to classify is,

X = (Owns Home = Yes, Married = No, Gender = Female, Employed = Yes0

Owns Home Married Gender Employed Risk Class

Yes Yes Male Yes B

No No Female Yes A
Yes Yes Female Yes C
Yes No Male No B
No Yes Female Yes C

No No Female Yes A
No No Male No B
Yes No Female Yes A
No Yes Female Yes C
Yes Yes Female Yes C
Solution:

There are 10 samples and three classes.

Risk class A = 3

Risk class B = 3

Risk class C = 4

The prior probabilities are obtained by dividing these frequencies by the total
number in the training data,

P(A) = 3/10 = 0.3

P(B) = 3/10 = 0.3

P(C) = 4/10 = 0.4

To compute P(X/Ci) =P {yes, no, female, yes}/Ci) for each of the classes, the
conditional probabilities for each:

P(Owns Home = Yes/A) = 1/3 =0.33

P(Married = No/A) = 3/3 =1

P(Gender = Female/A) = 3/3 = 1

P(Employed = Yes/A) = 3/3 = 1

P(Owns Home = Yes/B) = 2/3 =0.67

P(Married = No/B) = 2/3 =0.67

P(Gender = Female/B) = 0/3 = 0

P(Employed = Yes/B) = 1/3 = 0.33

P(Owns Home = Yes/C) = 2/4 =0.5

P(Married = No/C) = 0/4 =0

P(Gender = Female/C) = 4/4 = 1

P(Employed = Yes/C) = 4/4 = 1

Using the above probabilities, we obtain

P(X/A)= P(Owns Home = Yes/A) X P(Married = No/A) x P(Gender = Female/A) X

P(Employed = Yes/A)

= 0.33 x 1 x 1 x 1 = 0.33

Similarly, P(X/B), P(X/C)

To find the class, G, that maximizes,

P(X/Ci)P(Ci), we compute,

P(X/A) P(A) = 0.33 X 0.3 = 0

P(X/B) P(B) =0 X 0.3 = 0

P(X/C) P(C) = 0 X 0.4 = 0.0

Therefore x is assigned to class A

K-MEANS ALGORITHM

The k-means algorithm, where each cluster is represented by the mean value of the
objects in the cluster.

Steps of the algorithm:

 Determine K centroids (randomly?)

 Iterate until stable (= no object move group)
o Determine the distance of each object to the centroids
o Group the object based on minimum distance
o When all objects have been assigned, recalculate the positions of the K
centroids

Centroid-Based Technique: The k-Means Method

Example:

Object attribut attribute 2

e1 (Y):
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine 5 4

Iteration 0:
Iteration 1:
EM (EXPECTATION-MAXIMIZATION) ALGORITHM

 EM algorithm is an iterative estimation algorithm that can derive the maximum

likelihood (ML) estimates in the presence of missing/hidden data (“incomplete
data”)
Example:

AGGLOMERATIVE CLUSTERING

There are two natural algorithms for clustering:

 In divisive clustering, the entire data set is regarded as a cluster, and then
clusters are recursively split to yield a good clustering.
 In agglomerative clustering, each data item is regarded as a cluster, and
clusters are recursively merged to yield a good clustering.

 Algorithm :
Make each point a separate cluster
Until the clustering is satisfactory
Merge the two clusters with the
smallest inter-cluster distance
end
AGNES (AGglomerative NESting)

Consider a data set of five objects, {a, b, c, d, e}.

AGNES places each object into a cluster of its own. The clusters are then
merged step-by-step according to some criterion.

Clusters C1 and C2 may be merged if an object in C1 and an object in C2 form

the minimum Euclidean distance between any two objects from different
clusters.

This is a single-linkage approach in that each cluster is represented by all of

the objects in the cluster, and the similarity between two clusters is measured
by the similarity of the closest pair of data points belonging to different
clusters.

DIANA (DIvisive ANAlysis),

The cluster merging process repeats until all of the objects are eventually
merged to form one cluster.
All of the objects are used to form one initial cluster.
The cluster is split according to some principle, such as the maximum
Euclidean distance between the closest neighboring objects in the cluster.
The cluster splitting process repeats until, eventually, each new cluster
contains only a single object.
CLASSIFICATION BY DECISION TREES

Example:
K-NN CLASSIFICATION

- Lazy Learning Algorithm

- Whenever we have a new point to classify, classify, we find its K nearest neighbors
from the training data.
- The distance is calculated using one of the following measures :
 Euclidean Distance
 Minkowski Distance
 Mahalanobis Distance

Algorithm
For each training example <x,f(x)>, add the example to the list of training_examples.
Given a query instance xq , Given a query instance x to be classified,
- Let x1,x2….xk denote the k instances from training_ examples that are
nearest to xq
- Return the class that represents the maximum of the k instances.

Machine Learning: Classification & Clustering
No ratings yet
Machine Learning: Classification & Clustering
44 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
27 pages
Bayesian Classifiers and Data Mining Techniques
No ratings yet
Bayesian Classifiers and Data Mining Techniques
7 pages
Bayesian Classification Techniques Explained
No ratings yet
Bayesian Classification Techniques Explained
22 pages
Mod 3 Aiml
No ratings yet
Mod 3 Aiml
84 pages
Naïve Bayes Classification Explained
No ratings yet
Naïve Bayes Classification Explained
22 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
15 pages
Naïve Bayesian Classification Explained
No ratings yet
Naïve Bayesian Classification Explained
17 pages
Bayesian Classification Techniques
No ratings yet
Bayesian Classification Techniques
16 pages
Naive Bayesian Classification Overview
No ratings yet
Naive Bayesian Classification Overview
16 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
19 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
Naive Bayesian Classifiers Overview
No ratings yet
Naive Bayesian Classifiers Overview
43 pages
Bayes Classification Explained
No ratings yet
Bayes Classification Explained
41 pages
Bayesian Classification
No ratings yet
Bayesian Classification
11 pages
Machine Learning: Classification Techniques
No ratings yet
Machine Learning: Classification Techniques
37 pages
Bayesian Classification of Computer Buyers
No ratings yet
Bayesian Classification of Computer Buyers
4 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
6 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
21 pages
Bayesian Classification Techniques Explained
No ratings yet
Bayesian Classification Techniques Explained
8 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
9 pages
Bayesian Classification Explained
No ratings yet
Bayesian Classification Explained
18 pages
K-Nearest Neighbors Overview
No ratings yet
K-Nearest Neighbors Overview
35 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
15 pages
Naive Bayes Work Type Prediction
No ratings yet
Naive Bayes Work Type Prediction
9 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
Understanding Bayesian Classification
No ratings yet
Understanding Bayesian Classification
25 pages
Naive Bayes Classification Explained
No ratings yet
Naive Bayes Classification Explained
18 pages
Bayes' Theorem in Data Classification
No ratings yet
Bayes' Theorem in Data Classification
10 pages
Naive Bayesian Classifier Overview
No ratings yet
Naive Bayesian Classifier Overview
21 pages
Bayesian Classification in Data Mining
No ratings yet
Bayesian Classification in Data Mining
7 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
169 pages
Bayesian Classification Methods Explained
No ratings yet
Bayesian Classification Methods Explained
46 pages
Naive Bayesian Classification Overview
No ratings yet
Naive Bayesian Classification Overview
15 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
23 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
26 pages
Bayesian Concept Learning Overview
No ratings yet
Bayesian Concept Learning Overview
40 pages
Naïve Bayes Classification Explained
No ratings yet
Naïve Bayes Classification Explained
23 pages
10 Naive Bayesian Classifier
No ratings yet
10 Naive Bayesian Classifier
20 pages
Decision Trees vs Naïve Bayes Classifiers
No ratings yet
Decision Trees vs Naïve Bayes Classifiers
15 pages
Understanding Naïve Bayes Classifiers
No ratings yet
Understanding Naïve Bayes Classifiers
31 pages
Bayesian Classification Overview
No ratings yet
Bayesian Classification Overview
66 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
80 pages
Probability and Bayesian Concepts Explained
No ratings yet
Probability and Bayesian Concepts Explained
48 pages
Naive Bayes Classifier Overview and Applications
No ratings yet
Naive Bayes Classifier Overview and Applications
23 pages
ML Lecture - 4 Classification (Naive Bayes)
No ratings yet
ML Lecture - 4 Classification (Naive Bayes)
34 pages
Naive Bayes Algorithm Course Overview
No ratings yet
Naive Bayes Algorithm Course Overview
8 pages
Bayesian Classification Techniques Explained
No ratings yet
Bayesian Classification Techniques Explained
24 pages
Naive Bayes Classification Basics
No ratings yet
Naive Bayes Classification Basics
24 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
49 pages
Bayes Classification Techniques Overview
No ratings yet
Bayes Classification Techniques Overview
30 pages
Data-Mining-lecture-Eight-Naïve Bayes Algorithm
No ratings yet
Data-Mining-lecture-Eight-Naïve Bayes Algorithm
24 pages
Bayesian Classification Techniques
No ratings yet
Bayesian Classification Techniques
7 pages
Naive Bayesian Classification Explained
No ratings yet
Naive Bayesian Classification Explained
48 pages
Bayesian Learning and Naïve Bayes Classifier
No ratings yet
Bayesian Learning and Naïve Bayes Classifier
40 pages
Bayesian Classification Techniques Explained
No ratings yet
Bayesian Classification Techniques Explained
22 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
19 pages
Understanding DSA Selection Sort
No ratings yet
Understanding DSA Selection Sort
9 pages
APCSP 2024 Free Response Questions Guide
No ratings yet
APCSP 2024 Free Response Questions Guide
2 pages
LPP Objective Function Insights
No ratings yet
LPP Objective Function Insights
2 pages
Prelim Exam Results for UGRD-CYBS6101
No ratings yet
Prelim Exam Results for UGRD-CYBS6101
17 pages
Device-Independent Randomness Expansion
No ratings yet
Device-Independent Randomness Expansion
7 pages
Understanding Heap Data Structures
No ratings yet
Understanding Heap Data Structures
14 pages
Introduction to AI Search Algorithms
No ratings yet
Introduction to AI Search Algorithms
48 pages
Recursion Techniques and Examples
No ratings yet
Recursion Techniques and Examples
23 pages
BFS Algorithm Implementation in Python
No ratings yet
BFS Algorithm Implementation in Python
5 pages
Turing Test, Agents, and AI Algorithms
No ratings yet
Turing Test, Agents, and AI Algorithms
8 pages
B.Tech Question Bank: Algorithms 2025-26
No ratings yet
B.Tech Question Bank: Algorithms 2025-26
3 pages
Properties of Functions and Graphs
No ratings yet
Properties of Functions and Graphs
4 pages
CM Exercise1 2026
No ratings yet
CM Exercise1 2026
2 pages
3-Bit Odd Parity Error Generator Design
No ratings yet
3-Bit Odd Parity Error Generator Design
33 pages
CRC-32 Checksum Implementations in C
No ratings yet
CRC-32 Checksum Implementations in C
13 pages
CAT 1 Attempt Review Summary
No ratings yet
CAT 1 Attempt Review Summary
8 pages
Meta-Learning for Robust Quantum Control
No ratings yet
Meta-Learning for Robust Quantum Control
10 pages
Canny Edge Detection Techniques Explained
No ratings yet
Canny Edge Detection Techniques Explained
58 pages
Global State Recording Algorithms
No ratings yet
Global State Recording Algorithms
14 pages
Lower Bounds on ReLU Network Depth
No ratings yet
Lower Bounds on ReLU Network Depth
13 pages
C Programming Assignment Overview
No ratings yet
C Programming Assignment Overview
2 pages
STPM 2021 PDPC Semester 3 Marking Scheme
No ratings yet
STPM 2021 PDPC Semester 3 Marking Scheme
5 pages
Book Reviews: Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
No ratings yet
Book Reviews: Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp
3 pages
RPA Flowcharts vs Sequences Explained
No ratings yet
RPA Flowcharts vs Sequences Explained
28 pages
Eceg2102 CM Notes - Ch123
No ratings yet
Eceg2102 CM Notes - Ch123
46 pages
Python Dictionary Exercises and Functions
No ratings yet
Python Dictionary Exercises and Functions
3 pages
Introduction to Data Compression
No ratings yet
Introduction to Data Compression
32 pages
Essential Coding Interview Problems
No ratings yet
Essential Coding Interview Problems
2 pages
Linked List and Queue Algorithms Guide
No ratings yet
Linked List and Queue Algorithms Guide
2 pages
Understanding Algorithmic Thinking
No ratings yet
Understanding Algorithmic Thinking
24 pages

Data Mining Notes

Uploaded by

Data Mining Notes

Uploaded by

BAYESIAN CLASSIFICATION

o Bayesian classifiers are statistical classifiers. They can predict class

o Bayesian classification is based on Bayes’ theorem. Bayes’ theorem is named

o Bayes theorem is:

P (H|X) = P (X|H) P (H)

o Let X be a data tuple.

o X is considered “evidence.” It is described by measurements made on a set of n

o P (H|X) is the posterior probability, or a posteriori probability, of H

o P (H) is the priori probability of H.

o P (X|H) is the posteriori probability of X conditioned on H.

o P(X) is the prior probability of X.

P (X|H) is the probability that a customer X, is 35 years old and earns

1. Probability Team 0 wins is P(Y=0)=0.65

P(Y=1|X=1) = P(X=1|Y=1) * P(Y = 1)

= 0.75 X 0.35 / 0.75 x0.35 + 0.3 X0.65 = 0.5738

P(Y=0|X =1) = 1-0.5738 = 0.4262

Naïve Bayesian Classification

The naïve Bayesian classifier, or simple Bayesian classifier, works as follows:

 The naïve Bayesian classifier predicts that tuple X belongs to the

 The class Ci for which P(CijX) is maximized is called the maximum

 Greatly reduces the computation cost, only count the class

(b) If Ak is continuous-valued, it is assumed to have a

 Have the minimum error rate in comparison to all other classifiers.

 Lack of available probability data.

Let, C1 correspond to the class buys computer = yes and

C2 correspond to buys computer = no.

The tuple is to classify is

X = (age = youth, income = medium, student = yes, credit rating = fair)

The tuple is to classify is,

Owns Home Married Gender Employed Risk Class

Yes Yes Male Yes B

There are 10 samples and three classes.

P(A) = 3/10 = 0.3

P(B) = 3/10 = 0.3

P(C) = 4/10 = 0.4

P(Owns Home = Yes/A) = 1/3 =0.33

P(Married = No/A) = 3/3 =1

P(Gender = Female/A) = 3/3 = 1

P(Employed = Yes/A) = 3/3 = 1

P(Owns Home = Yes/B) = 2/3 =0.67

P(Married = No/B) = 2/3 =0.67

P(Gender = Female/B) = 0/3 = 0

P(Employed = Yes/B) = 1/3 = 0.33

P(Owns Home = Yes/C) = 2/4 =0.5

P(Married = No/C) = 0/4 =0

P(Employed = Yes/C) = 4/4 = 1

Using the above probabilities, we obtain

P(X/A)= P(Owns Home = Yes/A) X P(Married = No/A) x P(Gender = Female/A) X

Similarly, P(X/B), P(X/C)

To find the class, G, that maximizes,

P(X/A) P(A) = 0.33 X 0.3 = 0

P(X/B) P(B) =0 X 0.3 = 0

P(X/C) P(C) = 0 X 0.4 = 0.0

Therefore x is assigned to class A

Steps of the algorithm:

 Determine K centroids (randomly?)

Centroid-Based Technique: The k-Means Method

Object attribut attribute 2

 EM algorithm is an iterative estimation algorithm that can derive the maximum

There are two natural algorithms for clustering:

Consider a data set of five objects, {a, b, c, d, e}.

Clusters C1 and C2 may be merged if an object in C1 and an object in C2 form

This is a single-linkage approach in that each cluster is represented by all of

DIANA (DIvisive ANAlysis),

- Lazy Learning Algorithm

You might also like