0% found this document useful (0 votes)
28 views28 pages

Decision Tree Classification Overview

Decision Trees are a supervised machine learning algorithm used for classification and regression, structured as a tree with decision nodes and leaf nodes representing outcomes. The ID3 algorithm is a classic method for constructing Decision Trees, utilizing information gain to select the best features for splitting data. Overfitting can occur if the tree becomes too complex, and techniques like pre-pruning and post-pruning are used to mitigate this issue.

Uploaded by

nihar44203
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views28 pages

Decision Tree Classification Overview

Decision Trees are a supervised machine learning algorithm used for classification and regression, structured as a tree with decision nodes and leaf nodes representing outcomes. The ID3 algorithm is a classic method for constructing Decision Trees, utilizing information gain to select the best features for splitting data. Overfitting can occur if the tree becomes too complex, and techniques like pre-pruning and post-pruning are used to mitigate this issue.

Uploaded by

nihar44203
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Classification : Decision Tree

Decision Trees are a popular supervised machine learning algorithm used for
both classification and regression tasks. They work by learning simple
decision rules inferred from the data features to predict a target label.
• A Decision Tree is structured as a tree where each node represents a
decision based on a feature of the data, and each branch represents an
outcome of that decision.
• The root node is the topmost node, representing the first decision.

• The internal nodes are decision points, splitting based on feature values.

• Leaf nodes are the endpoints representing the final prediction or output.
Classification : Decision Tree
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Basic algorithm (a greedy algorithm):

• Tree is constructed in a top-down recursive divide-and-conquer manner


• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are discretized in advance)
• Examples are partitioned recursively based on selected attributes
• Test attributes are selected based on a heuristic or statistical measure (e.g., information
gain)
Classification : Decision Tree
ID3 (Iterative Dichotomiser 3) is a classic algorithm used to construct a
Decision Tree, primarily for classification tasks. It was developed by Ross
Quinlan and serves as one of the earliest algorithms in Decision Tree-based
models.
• D3 aims to create a Decision Tree by selecting the most significant feature
at each step that best separates the data for classification.
• It does this by using information gain (based on entropy) as the criterion to
determine the best feature to split on.

Conditions for stopping partitioning:


• All records have the same target class (pure leaf node).
• No more features to split on (terminate as a leaf node).
• Reaching a specified maximum depth or minimum samples threshold to
prevent overfitting.
Classification : Decision Tree
Attribute Selection Measure: age income student credit_rating buys_computer
Information Gain <=30 high no fair no
<=30 high no excellent no
= set of data instances
31…40 high no fair yes
= number of data instances (14)
>40 medium no fair yes
>40 low yes fair yes
=the probability that an arbitrary tuple in
belongs to class >40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
Example: class yes is 1 <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Expected information (entropy) age income student credit_rating buys_computer
needed to classify a tuple in D: <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
For the given data
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Information needed (after using A to age income student credit_rating buys_computer
split D into v partitions) <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
Example
>40 low yes fair yes
>40 low yes excellent no
means “age <=30” has 5 out of 14
31…40 low yes excellent yes
samples, with 2 yes’es and 3 no’s. <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Information needed (after using A to age income student credit_rating buys_computer
split D into v partitions) <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
Example
>40 low yes fair yes
>40 low yes excellent no
means “age <=30” has 5 out of 14
31…40 low yes excellent yes
samples, with 2 yes’es and 3 no’s. <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Information needed (after using A to age income student credit_rating buys_computer
split D into v partitions) <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
Example
>40 low yes fair yes
>40 low yes excellent no
means “age <=30” has 5 out of 14
31…40 low yes excellent yes
samples, with 2 yes’es and 3 no’s. <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Information needed (after using A to age income student credit_rating buys_computer
split D into v partitions) <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
Similarly
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Information needed (after using A to age income student credit_rating buys_computer
split D into v partitions) <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
Similarly
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Information needed (after using A to age income student credit_rating buys_computer
split D into v partitions) <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
Similarly
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Information gained by branching on age income student credit_rating buys_computer
attribute A <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
Similarly, >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Information gained by branching on age income student credit_rating buys_computer
attribute A <=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
Similarly, >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Classification : Decision Tree
Next, repeat the same
So, the first split is on the age attribute process with the other
nodes with smaller dataset.
Classification : Decision Tree
Next split is on student at node <=30

All are yes


All are no So, no further split here
So, no further split here
Classification : Decision Tree
No split is required at node 31..40

All are yes


So, no further split here
Classification : Decision Tree
Classification : Decision Tree
Next split is on credit_rating at node >40
Classification : Decision Tree
Classification : Decision Tree
Classify a new instance
(age: <=30, income: low, student: yes, credit_rating: fair) = buys_computer?
Classification : Decision Tree
Exercise
Classification : Decision Tree

[Link]
1YsHFA55DZ5iQBNq7S11qoS-_WVYtxUYH?usp=sharing
Classification : Decision Tree
Computing Information-Gain for Continuous-Valued Attributes
• Let attribute A be a continuous-valued attribute
• Must determine the best split point for A
• Sort the value A in increasing order
• Typically, the midpoint between each pair of adjacent values is
considered as a possible split point
• is the midpoint between the values of and
• The point with the minimum expected information requirement
for A is selected as the split-point for A
• Split:
• D1 is the set of tuples in D satisfying A ≤ split-point, and D2 is the set
of tuples in D satisfying A > split-point
Classification : Decision Tree
Gain Ratio for Attribute Selection (C4.5)
• Information gain measure is biased towards attributes with many values
• C4.5 (a successor of ID3) uses gain ratio to overcome the problem
(normalization to information gain) age income student credit_rating buys_computer
<=30 high no fair no
• GainRatio(A) = Gain(A)/SplitInfo(A) <=30 high no excellent no
31…40 high no fair yes
v | Dj | | Dj |
SplitInfo A ( D )  
>40 medium no fair yes
log 2 ( ) >40 low yes fair yes
j 1 |D| |D| >40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
• gain_ratio(income) = 0.029/1.557 = 0.019 31…40
31…40
medium
high
no
yes
excellent
fair
yes
yes
>40 medium no excellent no
Classification : Decision Tree
Gini Index (CART-Classification and Regression Trees)
Information Gain and Gain Ratio can sometimes exhibit a bias toward
features with many unique values. For example, if a feature has many unique
categories, Information Gain might prefer it because it creates more distinct
subsets.
To address this, Gain Ratio (an adaptation of Information Gain) is used in
algorithms like C4.5 to counter this bias, but this additional step adds
complexity.
Gini Index, on the other hand, is less sensitive to the number of categories in a
feature, reducing the need for an additional adjustment like Gain Ratio.
Classification : Decision Tree
Overfitting in Decision Trees
• Overfitting happens when a Decision Tree model becomes too complex,
fitting too closely to the noise and specific details of the training data,
resulting in poor generalization to new data.
• Signs of Overfitting: A decision tree that overfits will often have a high
accuracy on the training set but performs poorly on the test set.
• Causes of Overfitting:
• Allowing the tree to grow too deep, with many nodes and branches.
• Having very specific branches (splits) that capture the peculiarities of the
training data rather than general patterns.
• Consequences: Overfitting leads to a lack of model generalization, where
the model captures random fluctuations instead of meaningful trends.
Classification : Decision Tree
Pre-Pruning (Early Stopping):
This approach stops the tree from growing once it meets certain conditions,
such as reaching a maximum depth, minimum number of samples per leaf, or
minimum information gain threshold. This prevents the tree from developing
complex branches that might lead to

Post-Pruning (Cost Complexity Pruning):


This technique first grows the tree to its full depth, then prunes back branches
that provide minimal value. Post-pruning can be based on measures like cross-
validation to determine the optimal structure that minimizes errors on new
data.

You might also like