Classification
• Classification : is a form of data analysis that
extracts models describing important data
classes
– Models: predict categorical (discrete, unordered)
class labels
• E.g classification model to categorize bank loan
application as either safe or risky
• Marketing manager needs to know if customer would
buy new computer or not
• Medical Researcher may want to know which class of
treatment to be given for the patient
Prediction
• Prediction: marketing manager needs to
predict how much the customer is likely to
spend on items
– Numeric prediction: model constructed predicts a
continuous – valued function as opposed to a
class label
Classification and Prediction
Classification vs Prediction
Decision Tree Induction
• It is the learning of decision trees from
class-labeled training tuples
– Decision tree: is a tree structure where each
internal node (non leaf node) denotes a test on
an attribute
– each branch represents an outcome of the test
– Each leaf node holds a class label
Decision Tree - Example
ID3
• Algorithm
• The core algorithm for building decision trees
called ID3 by J. R. Quinlan which employs a
top-down, greedy search through the space of
possible branches with no backtracking.
• ID3 uses Entropy and Information Gain to
construct a decision tree.
Attribute Selection Measures
• Information Gain
• Gain Ratio
• Gini Index
Information Gain Method
Example
Information Gain Sequence
• IG (Age) = 0.246 bits
• IG (Student)=0.151 bits
• IG (Credit_rating)=0.048 bits
• IG (Income)=0.029 bits
Final Induction Tree based on Age
(Splitting Factor)
Final Decision Tree
Gain Ratio Method
• It splits the training data set D, into v
partitions corresponding to the v outcomes of
a data set for each subset of each attribute
• Constructs the best possible value with
respect to Gain(D) and Split(Attribute) ratio
Gain Ratio Method
Gain Ratio - Example
Gini Index Method
• It measures the impurity of D (transactions)
• It performs the binary split with respect to
each attribute value for all subset of classes
– Based on total number of combinations possible
to be made
• Reduction in impurity
– Constructs the best subset combination among
the attribute that gives best possible higher value
in terms of reduction in impurity
Gini Index Method
Gini Index -Example
• We can clearly see that IG(S, Outlook) has the
highest information gain of 0.246, hence we
chose Outlook attribute as the root node. At
this point, the decision tree looks like.
• Now that we’ve used Outlook,
• we’ve got three of them remaining Humidity,
Temperature, and Wind.
• we had three possible values of Outlook: Sunny,
Overcast, Rain.
• Where the Overcast node already ended up having
leaf node ‘Yes’, so we’re left with two subtrees to
compute: Sunny and Rain.
Bayes’ Classification Methods
• Bayesian Classifiers: are statistical classifiers
– they can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class
– Based on Bayes’ theorem
• Posterior probability
– Checks the class type based on the given predictor
» E.g class yes or no
• Likelihood
– Checks how many times given predictor value yes from total yes class
• Class Prior probability
– Checks how many total times yes out of total tuple D
• Predictor Prior Probability
– Checks how many total times no out of total tuple D
Naïve Bayes Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
37
Naïve Bayes Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore,
38 X belongs to class (“buys_computer = yes”)
Rule-Based Classification
• Using IF-THEN Rules for Classification
– IF condition (antecedent) THEN conclusion
(consequent)
– E.g IF age=youth AND student = yes THEN
buys_computer = yes
– R1: (age=youth)^ (student = yes)=
(buys_computer=yes)
– Coverage: total number of tuples covered by Rule
– Accuracy: out of the covered tuples how many are
correct
Coverage & Accuracy
Coverage & Accuracy - Example
• R1: (age=youth)^ (student = yes)=
(buys_computer=yes)
Rule based Classification to predict
class label
• R1: (age=youth)^ (student = yes)=
(buys_computer=yes)
More than one Rule Situation – more
classes
• Conflict Resolution Strategy: to resolve the conflict
when more than one rules are satisfied by the tuple X
– Size Ordering technique
• Highest priority is given to that rule that has toughest
requirements i.e rule with more antecedent (more attributes
)size
– Rule Ordering
• Class based
– Based on the priority of class
– Rule that gives the priority class is selected
• Rule based
– Based on priority list of rules
– Rule priority is checked based on accuracy, coverage or antecedent
Rule without class satisfying tuple X
• Default Rule: can be set up to specify a
default class
– May be the class in Majority
– Or the majority class of the tuples that were not
covered by any rule
Rule Extraction from a Decision Tree
Classification: Advanced Methods
Bayesian Belief Networks
• Bayesian Belief Network: it specifies joint
conditional probability distributions
– it provides a graphical model of causal
relationships
– It is defined by two components
• A directed acyclic graph
– Each arc represents a probabilistic dependence; if arc is drawn
from a node Y to a node Z, then Y is a parent or immediate
predecessor & Z is a descendant
• A set of conditional probability tables
Bayesian Belief Network -
Example
Bayesian Belief Network - Analysis
• In the diagram it shows that lung cancer
– is influenced by a person’s family history
– smoker or not
Classification by Backpropagation
• Backpropagation: is a neural network learning
algorithm
– Multilayer Feed-Forward Neural Network
• It iteratively learns a set of weights for prediction of the
class label of tuples
• It consists of an input layer, one or more hidden layers,
and an output layer
Multilayer Feed-Forward Neural
Network
Multilayer Feed-Forward Neural
Network with Weights
Net Input & Output Calculation
Error Backpropagation Calculation
Updated Weights & Biases Calculation
Backpropagation – Example (learning
rate is 0.9)
Net Input, Output & Backpropagated
Error Calculation
Updated Weights & Biases Values
Calculation (learning rate is 0.9)