0% found this document useful (0 votes)
8 views58 pages

Classification and Prediction Techniques

The document discusses classification and prediction in data analysis, highlighting the use of decision trees, Bayesian classifiers, and rule-based classification methods. It explains various algorithms like ID3 for decision tree induction and Naïve Bayes for class membership probabilities. Additionally, it covers advanced methods such as Bayesian Belief Networks and neural network algorithms like backpropagation for classification tasks.

Uploaded by

dge363qfh
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views58 pages

Classification and Prediction Techniques

The document discusses classification and prediction in data analysis, highlighting the use of decision trees, Bayesian classifiers, and rule-based classification methods. It explains various algorithms like ID3 for decision tree induction and Naïve Bayes for class membership probabilities. Additionally, it covers advanced methods such as Bayesian Belief Networks and neural network algorithms like backpropagation for classification tasks.

Uploaded by

dge363qfh
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Classification

• Classification : is a form of data analysis that


extracts models describing important data
classes
– Models: predict categorical (discrete, unordered)
class labels
• E.g classification model to categorize bank loan
application as either safe or risky
• Marketing manager needs to know if customer would
buy new computer or not
• Medical Researcher may want to know which class of
treatment to be given for the patient
Prediction
• Prediction: marketing manager needs to
predict how much the customer is likely to
spend on items

– Numeric prediction: model constructed predicts a


continuous – valued function as opposed to a
class label
Classification and Prediction
Classification vs Prediction
Decision Tree Induction
• It is the learning of decision trees from
class-labeled training tuples
– Decision tree: is a tree structure where each
internal node (non leaf node) denotes a test on
an attribute
– each branch represents an outcome of the test
– Each leaf node holds a class label
Decision Tree - Example
ID3
• Algorithm
• The core algorithm for building decision trees
called ID3 by J. R. Quinlan which employs a
top-down, greedy search through the space of
possible branches with no backtracking.
• ID3 uses Entropy and Information Gain to
construct a decision tree.
Attribute Selection Measures
• Information Gain

• Gain Ratio

• Gini Index
Information Gain Method
Example
Information Gain Sequence
• IG (Age) = 0.246 bits
• IG (Student)=0.151 bits
• IG (Credit_rating)=0.048 bits
• IG (Income)=0.029 bits
Final Induction Tree based on Age
(Splitting Factor)
Final Decision Tree
Gain Ratio Method
• It splits the training data set D, into v
partitions corresponding to the v outcomes of
a data set for each subset of each attribute
• Constructs the best possible value with
respect to Gain(D) and Split(Attribute) ratio
Gain Ratio Method
Gain Ratio - Example
Gini Index Method
• It measures the impurity of D (transactions)
• It performs the binary split with respect to
each attribute value for all subset of classes
– Based on total number of combinations possible
to be made
• Reduction in impurity
– Constructs the best subset combination among
the attribute that gives best possible higher value
in terms of reduction in impurity
Gini Index Method
Gini Index -Example
• We can clearly see that IG(S, Outlook) has the
highest information gain of 0.246, hence we
chose Outlook attribute as the root node. At
this point, the decision tree looks like.
• Now that we’ve used Outlook,
• we’ve got three of them remaining Humidity,
Temperature, and Wind.

• we had three possible values of Outlook: Sunny,


Overcast, Rain.
• Where the Overcast node already ended up having
leaf node ‘Yes’, so we’re left with two subtrees to
compute: Sunny and Rain.
Bayes’ Classification Methods
• Bayesian Classifiers: are statistical classifiers
– they can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class
– Based on Bayes’ theorem
• Posterior probability
– Checks the class type based on the given predictor
» E.g class yes or no
• Likelihood
– Checks how many times given predictor value yes from total yes class
• Class Prior probability
– Checks how many total times yes out of total tuple D
• Predictor Prior Probability
– Checks how many total times no out of total tuple D
Naïve Bayes Classifier: Training Dataset

Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’

Data to be classified:
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)

37
Naïve Bayes Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore,
38 X belongs to class (“buys_computer = yes”)
Rule-Based Classification
• Using IF-THEN Rules for Classification
– IF condition (antecedent) THEN conclusion
(consequent)
– E.g IF age=youth AND student = yes THEN
buys_computer = yes
– R1: (age=youth)^ (student = yes)=
(buys_computer=yes)
– Coverage: total number of tuples covered by Rule
– Accuracy: out of the covered tuples how many are
correct
Coverage & Accuracy
Coverage & Accuracy - Example
• R1: (age=youth)^ (student = yes)=
(buys_computer=yes)
Rule based Classification to predict
class label
• R1: (age=youth)^ (student = yes)=
(buys_computer=yes)
More than one Rule Situation – more
classes
• Conflict Resolution Strategy: to resolve the conflict
when more than one rules are satisfied by the tuple X
– Size Ordering technique
• Highest priority is given to that rule that has toughest
requirements i.e rule with more antecedent (more attributes
)size
– Rule Ordering
• Class based
– Based on the priority of class
– Rule that gives the priority class is selected
• Rule based
– Based on priority list of rules
– Rule priority is checked based on accuracy, coverage or antecedent
Rule without class satisfying tuple X
• Default Rule: can be set up to specify a
default class
– May be the class in Majority
– Or the majority class of the tuples that were not
covered by any rule
Rule Extraction from a Decision Tree
Classification: Advanced Methods
Bayesian Belief Networks
• Bayesian Belief Network: it specifies joint
conditional probability distributions
– it provides a graphical model of causal
relationships
– It is defined by two components
• A directed acyclic graph
– Each arc represents a probabilistic dependence; if arc is drawn
from a node Y to a node Z, then Y is a parent or immediate
predecessor & Z is a descendant
• A set of conditional probability tables
Bayesian Belief Network -
Example
Bayesian Belief Network - Analysis
• In the diagram it shows that lung cancer
– is influenced by a person’s family history
– smoker or not
Classification by Backpropagation
• Backpropagation: is a neural network learning
algorithm
– Multilayer Feed-Forward Neural Network
• It iteratively learns a set of weights for prediction of the
class label of tuples
• It consists of an input layer, one or more hidden layers,
and an output layer
Multilayer Feed-Forward Neural
Network
Multilayer Feed-Forward Neural
Network with Weights
Net Input & Output Calculation
Error Backpropagation Calculation
Updated Weights & Biases Calculation
Backpropagation – Example (learning
rate is 0.9)
Net Input, Output & Backpropagated
Error Calculation
Updated Weights & Biases Values
Calculation (learning rate is 0.9)

You might also like