Lecture 5-Decision Tree Based Learning

This document provides an overview of decision tree-based learning, including their construction, handling of missing values, and various algorithms such as CART and C4.5. It covers essential concepts like information gain, entropy, and pruning techniques to avoid overfitting. Additionally, it discusses specialized trees like random forests and evolutionary trees, along with implementation methods using different programming languages.

Uploaded by

nzimasesiphesihle19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

Lecture 5-Decision Tree Based Learning

Uploaded by

nzimasesiphesihle19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

DECISION TREE BASED

LEARNING

CSC 518
Lecture notes – Chapter 5
LEARNING OUTCOMES
On completion of this topic, learners should be able to: -
a) Identify Decision trees
b) Construct a Decision tree
c) Handle missing values
d) Choose the appropriate attribute(s), Information gain and Entropy,
Gini, index, Gain ratio
e) Identify the termination Criteria / Pruning Decision trees
f) Induce Decision trees – Decision tree algorithms, CART, C4.5, Greedy
Decision trees.
g) Identify benefits of Decision trees, Specialized trees, Oblique trees,
Random forests, Evolutionary trees, Hellinger trees
h) Implement Decision trees Using Mahout, Using R, Using Spark, Using
Python (scikit-learn), Using Julia
DECISION TREES

Decision trees are known to be one of the most powerful and widely used
modeling techniques in the field of Machine learning. Decision trees
naturally induce rules that can be used in data classification and prediction.
Following is an example of a rule definition derived from building a
Decision tree:
If (laptop model is x) and (manufactured by y) and (is z years old) and (with
some owners being k) then (the battery life is n hours).
TERMINOLOGY

Decision trees classify instances by representing in a tree structure starting from the
root to a leaf. Most importantly, at a high level, there are two representations of a
Decision tree—a node and an arc that connects nodes. To make a decision, the flow
starts at the root nodes, navigates to the arcs until it has reached a leaf node, and then
makes a decision. Each node of the tree denotes testing of an attribute, and the
branches denote the possible values that the attribute can take.
Some of the characteristics of a Decision tree representation:
▪ Every non-leaf node (for example, a decision node) denotes a representation of the
attribute value
▪ Every branch denotes the rest of the value representation
▪ Every leaf (or terminal) node represents the value of the target attribute
▪ The starting node is called the root node
PURPOSE AND USES

Decision trees are used for classification and regression.

Two types of trees are used in this context:
1. Classification trees
2. Regression trees
❑ Classification trees are used to classify the given data set into categories. To
use classification trees, the response of the target variable needs to be a
categorical value such as yes/no, true/false.
❑ On the other hand, regression trees are used to address prediction
requirements and are always used when the target or response variable is a
numeric or discrete value such as stock value, commodity and price.
CONSTRUCTING A DECISION TREE
Core rules for building Decision trees:
➢ We usually start building Decision trees with one attribute, split the data based on the
attribute, and continue with the same process for other attributes.
➢ There can be many Decision trees for the given problem.
➢ The depth of the tree is directly proportional to the number of attributes chosen.
➢ There needs to be a Termination Criteria that will determine when to stop further
building the tree. In the case of no termination criteria, the model will result in the
over-fitting of the data.
➢ Finally, the output is always in the form of simple rule(s) that can be stored and
applied to different datasets for classification and/or prediction.
One of the reasons why Decision trees are preferred in the field of Machine learning is
because of their robustness to errors; they can be used when there are some unknown
values in the training datasets too (for example, the data for income is not available for all
the records).
HANDLING MISSING VALUES
One of the interesting ways of assigning values to some unknowns is to see that the most common value in terms
of occurrence is assigned and in some cases they can belong to the same class, if possible we should bring it closer
to accuracy.
There is another probabilistic way of doing this where the prediction is distributed proportionately:
Assign a probability pi for every value vi of x.
Now, assign the fraction pi of x to each of the descendants. These probabilities can be estimated again based on
the observed frequencies of the various values for A, among the examples at node n.
For example, let's consider a Boolean attribute A. Let there be 10 values for A out of which three have a value of True and
the rest 7 have a value of False. So, the probability of A(x) = True is 0.3, and the probability that A(x) = False is 0.7.
➢ A fractional 0.3 of this is distributed down the branch for A = True, and a fractional 0.7 is distributed down the
other.
➢ These probability values are used for computing the information gain, and can be used if a second missing
attribute value needs to be tested.
➢ The same methodology can be applied in the case of learning when we need to fill any unknowns for the new
branches.
➢ The C4.5 algorithm uses this mechanism for filling the missing values.
CONSIDERATIONS FOR CONSTRUCTING DECISION TREES
The key to constructing Decision trees is knowing where to split them. To do this, we need to be clear on the following:
1. Which attribute to start and which attribute to apply subsequently?
2. When do we stop building the Decision tree (that is avoid over-fitting)?
Choosing the appropriate attribute(s)
There are three different ways to identify the best-suited attributes:
1. Information Gain and Entropy
2. Gini index
3. Gain ratio
Information gain and Entropy
This entity is used in an algorithm known as C4.5. Entropy is a measure of uncertainty in the data. Let us take an intuitive approach to understand the concepts
of Information gain and Entropy. In the original version of the C5.0 and C4.5 algorithms (ID3), a root node was chosen on the basis of how much of the total
Entropy was reduced if this node was chosen. This is called information gain.
Information gain = Entropy of the system before split - Entropy of the system after split

Gini index
Gini index is a general splitting criterion. Gini Index is used to measure the probability of two random items belonging to the same class. In the case of a real
dataset, this probability value is 1. The Gini measure of a node is the sum of the squares of the proportions of the classes.
A node with two classes each has a score of 0.52 + 0.52 = 0.5. This is because the probability of picking the same class at random is 1 out of 2.
Gain ratio
Another improvement in C4.5 compared to ID3 is that the factor that decides the attribute is the gain ratio. The gain ratio is the ratio of information gain and
information content. The attribute that gives the maximum amount of gain ratio is the attribute that is used to split it. The gain ratio requires measure for
Information content. Information content is defined as -f i log2 f i . Note that here, we do not take the value of the dependent variable into account.
TERMINATION CRITERIA / PRUNING DECISION TREES
There are many ways of avoiding over-fitting in Decision tree learning.
The Following are the two different cases:
1. One case where the Decision tree is terminated for growth way before a perfect classification of the training data is done
2. Another case where the over-fitting of data is done and then the tree is pruned to recover
Following are a couple of approaches to find the correct tree size:
1. Identify a separate and different dataset to that of the target training data set to be used, and evaluate the correctness of post-
pruning nodes in the tree. This is a common approach and is called training and validation set approach.
2. Instead of having a subset of data in the training set, use up all the data in the training set, and apply probabilistic methods to
check if pruning a particular node has any likelihood to produce any improvement over and above the training dataset. Use all the
available data for training. For example, the chi-square test can be used to check this probability. Reduced-Error-Pruning (D): We
prune at a node by removing the subtree that is rooted at the node. We make that node a leaf.
Following are the steps of the Rule Post-Pruning process:
1. Construct a Decision tree from the training set by growing it until there is an obvious over-fitting seen.
2. Generate rules from the constructed Decision tree with every path, starting from the root node to a particular leaf node mapping to
a rule.
3. Apply pruning to each rule for removing identified preconditions and help improve the probabilistic accuracy.
4. Next, use the pruned rules in the order of their increased accuracy on the subsequent instances.
Following are the advantages of rule-based pruning and its need for converting into rules:
➢ Improving the readability of the rules
➢ A consistent testing can be done at both the root and leaf level nodes
➢ There is a clear decision that can be made of either removing the decision node or retaining it
INDUCING DECISION TREES – DECISION TREE ALGORITHMS
There are many Decision tree inducing methods. Among all the methods, C4.5 and CART are the most adopted or popular
ones.
CART
▪ CART stands for Classification and Regression Trees (Breiman et al., 1984).
▪ CART creates binary trees. This means there are always two branches that can emerge from a given node. The philosophy of
the CART algorithm is to follow a goodness criterion, which is all about choosing the best possible partition.
▪ Moreover, as the tree grows, a cost-complexity pruning mechanism is adopted. CART uses the Gini index to select
appropriate attributes or the splitting criteria.
▪ Using CART, the prior probability distribution can be provided. We can generate Regression trees using CART that in turn
help in predicting real numbers against a class. The prediction is done by applying the weighted mean for the node. CART
identifies splits that minimize the prediction squared error (that is, the least-squared deviation).
C4.5
➢ Similar to CART, C4.5 is a Decision tree algorithm with a primary difference that it can generate more than binary trees,
which means support for multiway splits.
➢ For attribute selection, C4.5 uses the information gain measure. As explained in the previous section, an attribute with the
largest information gain (or the lowest Entropy reduction) value helps to achieve closer to accurate classification with the
least quantity of data. One of the key drawbacks of C4.5 is the need for large memory and CPU capacity for generating rules.
➢ The C5.0 algorithm is a commercial version of C4.5 that was presented in 1997. C4.5 is an evolution of the ID3 algorithm.
The gain ratio measure is used for identifying the splitting criteria.
➢ The splitting process stops when the number of splits reaches a boundary condition definition that acts as a threshold. Post
this growing phase of the tree, pruning is done, and an error-based pruning method is followed.
GREEDY DECISION TREES

A vital characteristic of Decision trees is that they are Greedy! A greedy algorithm targets achieving optimal
solutions globally by achieving local optimums at every stage. Though the global optimum is not always
guaranteed, the local optimums help in achieving global optimum to a maximum extent.
Every node is greedily searched to reach the local optimum, and the possibility of getting stuck at achieving
local optima is high. Most of the time, targeting local optima might help in providing a good enough
solution.
Benefits of Decision trees
Some of the advantages of using Decision trees are listed here:
➢ Decision trees are fast and easy to build and require little experimentation
➢ They are robust
➢ They are easy to understand and interpret
➢ Decision trees do not require complex data preparation
➢ They can handle both categorical and numerical data
➢ They are supported using statistical models for validation
➢ They can handle highly dimensional data and also operate large datasets
SPECIALIZED TREES
1. Oblique trees
12
Oblique trees are used in cases where the data is extremely complex. If the attributes are x1, x2, AND x3…xn, then the C4.5 and CART tests the criteria as
x1>some value or x2< some other value, and so on. The goal in such cases is to find an attribute to test at each node.
2. Random forests
▪ These specialized trees are used when there are too many dimensions. The basic premise of the curse of dimensionality is that high dimensional data
brings in complexity. With more dimensions and features, the possibility of errors is also high. In the case of Random forests, the application of boosting
is about how single tree methods are brought together to see a boost in the result regarding accuracy. A Random forest extends Decision trees by
including more number of Decision trees. These Decision trees are built by a combination of random selection of data (samples) and a random selection
of a subset of attributes. Another variable input required for the making of multiple Decision trees are random subsets of the attributes.
▪ Since each tree is built using random dataset and random variable set, these trees are called Random trees. Moreover, many such Random trees define a
Random forest. The result of a Random tree is based on two radical beliefs. One is that each of the trees make an accurate prediction for maximum part
of the data. Second, mistakes are encountered at different places. So, on an average, a poll of results is taken across the Decision trees to conclude a result.
There are not enough observations to get good estimates, which leads to sparsity issues. There are two important causes for exponential increase on spatial
density, one, is increase in dimensionality and the other is increase in the equidistant points in data. Most of the data is in the tails. Random forests are a
vital extension of the Decision trees that are very simple to understand and are extremely efficient, particularly when one is dealing with high
dimensional spaces. For prediction, a new sample is pushed down the tree. A new label of the training sample is assigned to the terminal node, where it
ends up. This procedure is iterated over all the trees in the group, and the average vote of all trees is reported as the Random forest prediction.
3. Evolutionary trees
When achieving the global optima seems almost impossible, Evolutionary trees are used. As you learned, Decision trees are greedy. So sometimes, we may be
constructing much bigger trees just because we are stuck in local optima. So, if your tree length is just too much, try oblique trees or evolutionary trees. The
concept of evolutionary trees is originated from a very exciting concept called genetic algorithms. Instead of mathematically computing the best attribute at
every node, an Evolutionary tree randomly picks a node at each point and creates a tree. It then iterates and creates a collection of trees (forest). Now, it
identifies the best trees in the forest for the data. It then creates the next generation of the forest by combining these trees randomly. Evolutionary trees, on
the other hand, choose a radically different top node and produce a much shorter tree, which has the same efficiency. Evolutionary algorithms take more time
to compute.
4. Hellinger trees
There have been attempts to identify impurity measures that are less sensitive to the distribution of dependent variable values than Entropy or Gini index. A
very recent paper suggested Hellinger distance as a measure of impurity that does not depend on the distribution of the target variable.
IMPLEMENTING DECISION TREES
Using Mahout
Refer to the folder .../mahout/chapter5/decisiontreeexample/.
Refer to the folder.../mahout/chapter5/randomforestexample/.
Using R
Refer to the folder .../r/chapter5/decisiontreeexample/.
Refer to the folder .../r/chapter5/randomforestexample/.
Using Spark
Refer to the folder .../spark/chapter5/decisiontreeexample/.
Refer to the folder .../spark/chapter5/randomforestexample/.
Using Python (scikit-learn)
Refer to the folder .../python scikitlearn/chapter5/decisiontreeexample/.
Refer to the folder .../python scikitlearn/chapter5/randomforestexample/.
Using Julia
Refer to the folder .../julia/chapter5/decisiontreeexample/.
Refer to the folder .../julia/chapter5/randomforestexample/
THANK YOU

Decision Trees in Classification Systems
No ratings yet
Decision Trees in Classification Systems
25 pages
Understanding Decision Trees in Predictive Modeling
No ratings yet
Understanding Decision Trees in Predictive Modeling
27 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
43 pages
Decision Trees and Probabilistic Models
No ratings yet
Decision Trees and Probabilistic Models
25 pages
Understanding Decision Tree Learning
No ratings yet
Understanding Decision Tree Learning
21 pages
Decision Trees in Machine Learning Explained
No ratings yet
Decision Trees in Machine Learning Explained
13 pages
DWDM Unit4 Notes
No ratings yet
DWDM Unit4 Notes
9 pages
Understanding Decision Trees in Data Mining
No ratings yet
Understanding Decision Trees in Data Mining
13 pages
ML Unit 3
No ratings yet
ML Unit 3
35 pages
DTSC Sem4 Eng
No ratings yet
DTSC Sem4 Eng
100 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
42 pages
Understanding CART: Decision Trees Explained
No ratings yet
Understanding CART: Decision Trees Explained
25 pages
Decision Tree Algorithm Overview
No ratings yet
Decision Tree Algorithm Overview
63 pages
Understanding Decision Trees and Algorithms
No ratings yet
Understanding Decision Trees and Algorithms
15 pages
Decision Tree Uodated Final
No ratings yet
Decision Tree Uodated Final
28 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
26 pages
Understanding Classification and Decision Trees
No ratings yet
Understanding Classification and Decision Trees
24 pages
ML Unit-IV Complete
No ratings yet
ML Unit-IV Complete
23 pages
Decision Trees: Overview and Examples
No ratings yet
Decision Trees: Overview and Examples
22 pages
Decision Tree Classification Explained
No ratings yet
Decision Tree Classification Explained
13 pages
Data Classification and Decision Trees
No ratings yet
Data Classification and Decision Trees
34 pages
Decision Tree Algorithms in Machine Learning
No ratings yet
Decision Tree Algorithms in Machine Learning
54 pages
Decision Tree Induction Explained
No ratings yet
Decision Tree Induction Explained
6 pages
5-Classification Part 2
No ratings yet
5-Classification Part 2
39 pages
Data Mining Course Material
No ratings yet
Data Mining Course Material
36 pages
Decision Trees for Banknote Classification
67% (3)
Decision Trees for Banknote Classification
14 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
52 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
141 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
11 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
37 pages
Building and Optimizing Decision Trees
No ratings yet
Building and Optimizing Decision Trees
16 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
67 pages
Unit 3
No ratings yet
Unit 3
32 pages
Unit 3
No ratings yet
Unit 3
36 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
82 pages
Supervised Classifiers in Education Data Mining
No ratings yet
Supervised Classifiers in Education Data Mining
40 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
13 pages
Decision Trees and Ensemble Methods
No ratings yet
Decision Trees and Ensemble Methods
69 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
46 pages
Decision Tree Implementation Guide
No ratings yet
Decision Tree Implementation Guide
3 pages
Understanding Decision Tree Learning
No ratings yet
Understanding Decision Tree Learning
16 pages
Decision Tree Splitting Techniques
No ratings yet
Decision Tree Splitting Techniques
56 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
33 pages
Decision Tree Algorithm Explained
No ratings yet
Decision Tree Algorithm Explained
14 pages
Overview of Decision Tree Algorithms
No ratings yet
Overview of Decision Tree Algorithms
47 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
31 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
17 pages
Decision Tree Learning and Prediction Techniques
No ratings yet
Decision Tree Learning and Prediction Techniques
19 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
11 pages
Machine Learning: Decision Tree Overview
No ratings yet
Machine Learning: Decision Tree Overview
24 pages
Decision Tree Analysis in AI & ML
No ratings yet
Decision Tree Analysis in AI & ML
29 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
11 pages
Unit 3
No ratings yet
Unit 3
80 pages
Unit 4 DWM
No ratings yet
Unit 4 DWM
31 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
24 pages
Pa Unit-3 Part1
No ratings yet
Pa Unit-3 Part1
49 pages
Item Image Management in Company Folder
No ratings yet
Item Image Management in Company Folder
8 pages
Biology Exam: Cell Structure & Function
No ratings yet
Biology Exam: Cell Structure & Function
21 pages
Philippine Airlines E-Ticket Receipt
50% (4)
Philippine Airlines E-Ticket Receipt
3 pages
Splinting Teeth After Trauma: Guidelines
No ratings yet
Splinting Teeth After Trauma: Guidelines
73 pages
Mathematical Modeling of Cyber Resilience
No ratings yet
Mathematical Modeling of Cyber Resilience
7 pages
July 2025 Current Affairs Overview
No ratings yet
July 2025 Current Affairs Overview
1 page
Year 7 Holiday Homework
No ratings yet
Year 7 Holiday Homework
6 pages
Matthew Aucoin: Composer and Conductor
No ratings yet
Matthew Aucoin: Composer and Conductor
5 pages
Accounting Principles and Practices Overview
No ratings yet
Accounting Principles and Practices Overview
20 pages
Effective Organizational Communication
No ratings yet
Effective Organizational Communication
18 pages
00 Performance Grades Qual Spec Generic Parts 230728
No ratings yet
00 Performance Grades Qual Spec Generic Parts 230728
28 pages
Understanding Parallel Circuits
No ratings yet
Understanding Parallel Circuits
3 pages
Introduction to Museology Concepts
No ratings yet
Introduction to Museology Concepts
32 pages
Class 9 Registration Form Dhanbad
No ratings yet
Class 9 Registration Form Dhanbad
3 pages
Hadoop 2.8.0 Installation on Windows 10
No ratings yet
Hadoop 2.8.0 Installation on Windows 10
4 pages
Domestic Travel Insurance Policy Terms
No ratings yet
Domestic Travel Insurance Policy Terms
3 pages
Effective Engineering Presentation Techniques
No ratings yet
Effective Engineering Presentation Techniques
30 pages
Yealink SIP-T48G User Guide V80 60
No ratings yet
Yealink SIP-T48G User Guide V80 60
230 pages
Digital Workflow in Implant Dentistry
No ratings yet
Digital Workflow in Implant Dentistry
4 pages
Essential ACT Math Formulas Guide
100% (1)
Essential ACT Math Formulas Guide
12 pages
Basic Derivatives Practice Worksheet
No ratings yet
Basic Derivatives Practice Worksheet
2 pages
TED Talks for Business English Teaching
No ratings yet
TED Talks for Business English Teaching
18 pages
Knee Joint Stability and Structure
No ratings yet
Knee Joint Stability and Structure
16 pages
ASR9000 Upgrade Procedure 431 PDF
No ratings yet
ASR9000 Upgrade Procedure 431 PDF
12 pages
High-Yield Topics for UPSC EPFO/APFC Exam
No ratings yet
High-Yield Topics for UPSC EPFO/APFC Exam
3 pages
Systemair Saudi Arabia Company Profile
100% (1)
Systemair Saudi Arabia Company Profile
417 pages
Bio Data and Labour Clearance Requests
No ratings yet
Bio Data and Labour Clearance Requests
10 pages
Dorian Scale Basics: 5 Boxes Guide
No ratings yet
Dorian Scale Basics: 5 Boxes Guide
1 page
Levi Ackerman: A Comprehensive Guide
No ratings yet
Levi Ackerman: A Comprehensive Guide
1 page
Document Control in Quality Assurance
No ratings yet
Document Control in Quality Assurance
2 pages

Lecture 5-Decision Tree Based Learning

Uploaded by

Lecture 5-Decision Tree Based Learning

Uploaded by

DECISION TREE BASED

Decision trees are used for classification and regression.

You might also like