Artificial Intelligence and Machine
Learning
Decision Tree – CART
Dr. Rajashree Nayak
DECISION TREE – CART
CART is an alternative decision tree building
algorithm which can handle both
classification and regression tasks.
This algorithm uses a new metric named gini
index to create decision points for
classification tasks.
Select the feature that has lower Gini index
for splitting.
DECISION TREE – CART
DECISION TREE – CART
DECISION TREE – CART
DECISION TREE – CART
DECISION TREE – CART
DECISION TREE – CART
OBSERVATION
ID3 is the most common conventional
decision tree algorithm but it has
bottlenecks. Attributes must be nominal
values, dataset must not include
missing data, and finally the algorithm tend
to fall into overfitting.
WHEN TO STOP SPLITTING?
Usually, real-world datasets have a large
number of features, which will result in a
large number of splits, which in turn gives
a huge tree.
Such trees take time to build and can lead
to overfitting. That means the tree will give
very good accuracy on the training dataset
but will give bad accuracy in test data.
HYPERPARAMETER TUNING
There are many ways to tackle this problem through
hyperparameter tuning.
We can set the maximum depth of our decision tree
using the max_depth parameter. The more the value
of max_depth, the more complex your tree will
be.
Another way is to set the minimum number of
samples for each spilt. It is denoted
by min_samples_split. Here we specify the minimum
number of samples required to do a spilt.
That means if a node has less than 10 samples
then using this parameter, we can stop the further
splitting of this node and make it a leaf node.
DECISION TREE – PROS AND CONS
OVERFITTING
A hypothesis h is said to overfit the training data if there is
another hypothesis h’, such that h has a smaller error than
h’ on the training data but h has larger error on the test data
than h’.
accuracy
On training
On testing
Complexity of tree
OVERFITTING
Outlook
Outlook = Sunny,
Temp = Hot
Humidity = Normal Sunny Overcast Rain
Wind = Strong 1,2,8,9,113,7,12,13 4,5,6,10,14
label: NO 2+,3- 4+,0- 3+,2-
this example doesn’t exist in the Humidity Yes Wind
tree
High Normal Strong Weak
No Yes No Yes
OVERFITTING
This can always be
Outlook
done – may fit noise
or other
coincidental
• Outlook = Sunny, regularities Rain
Sunny Overcast
• Temp = Hot
1,2,8,9,113,7,12,13 4,5,6,10,14
• Humidity = Normal
2+,3- 4+,0- 3+,2-
• Wind = Strong
Humidity Yes Wind
• label: NO
• this example doesn’t exist in the
tree High Normal Strong Weak
No Wind No Yes
Strong Weak
No Yes
REASONS FOR OVERFITTING
Too much variance in the training data
Training data is not a representative sample
of the instance space
We split on features that are actually irrelevant
Too much noise in the training data
Noise = some feature values or class labels are
incorrect
We learn to predict the noise
In both cases, it is a result of our will to minimize the
empirical error when we learn, and the ability to do it
(with DTs)
PREVENTING OVERFITTING
PRUNING
Pruning is another method that can help us avoid
overfitting. It helps in improving the performance
of the Decision tree by cutting the nodes or
sub-nodes which are not significant.
Additionally, it removes the branches which
have very low importance.
There are mainly 2 ways for pruning:
Pre-pruning – we can stop growing the tree
earlier, which means we can prune/remove/cut a
node if it has low importance while growing the
tree.
Post-pruning – once our tree is built to its
depth, we can start pruning the nodes based on
their significance.
DECISION TREE TO RANDOM FOREST
Instead of applying decision tree algorithm on all dataset,
dataset would be separated into subsets and decision tree
algorithm will be applied to these subsets
Decision would be made by the highest number of subset
results
GRAPHICAL ABSTRACT
HOW DOES IT WORK ?
HOW DOES IT WORK ?
HOW DOES IT WORK ?
HOW DOES IT WORK ?
HOW DOES IT WORK ?
ADVANTAGES