0% found this document useful (0 votes)

6 views43 pages

Lecture 3

The document provides an overview of Decision Trees, a machine learning tool used for decision support that models decisions and their potential outcomes. It discusses the structure of decision trees, the process of learning from data, and methods for selecting the best attributes for splitting data. Additionally, it covers concepts such as information gain and the ID3 algorithm for constructing decision trees.

Uploaded by

q842163675

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views43 pages

Lecture 3

Uploaded by

q842163675

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning

Decision Trees

Zhibin Yu
2021.08.25
Contents
Introduction to Decision Tree

Algorithm

Overfitting

Summary
Part I: Introduction to Decision Tree
What is Decision Tree
● decision tree is a decision support tool that uses a tree-like
model of decisions and their possible consequences, including
chance event outcomes, resource costs, and utility. It is one way
to display an algorithm that only contains conditional control
statements.
A possible decision tree
● Each internal node: test one attribute 𝑋𝑖
● Each branch from a node: selects one value for 𝑋𝑖
● Each leaf node: predict 𝑌 or 𝑝(𝑌 | 𝑥 ∈ 𝑙𝑒𝑎𝑓 )
Sample Dataset
● Columns denote features Xi
● Rows denote labeled instances < 𝒙𝑖, 𝑦𝑖 >
● Class label denotes whether a tennis game was played

< 𝒙𝑖, 𝑦𝑖 >

Function Approximation
● Problem Setting
● Set of possible instances X
● Set of possible labels Y
● Unknown target function 𝑓: 𝑋 → 𝑌
● Set of function hypotheses 𝐻 = {h h: 𝑋 → 𝑌}

● Input: Training examples of unknown target function 𝑓

{⟨𝒙𝑖, 𝑦𝑖⟩} ={⟨𝒙1, 𝑦1⟩,…, ⟨𝒙𝑛, 𝑦𝑛⟩}
n
𝑖=1
● Output: Hypothesis h ∈ 𝐻 that best approximates 𝑓
Decision Tree
● A possible decision tree for the data:

• What prediction would we make for

<outlook=sunny, temperature=hot, humidity=high, wind=weak> ?
Decision Tree
● If features are continuous, internal nodes can test the value of a
feature against a threshold
Decision Tree Learning
● Problem Setting:
● Set of possible instances X
● each instance x in X is a feature vector
● e.g., <Humidity=low, Wind=weak, Outlook=rain, Temp=hot>
● Unknown target function 𝑓: 𝑋 → 𝑌
● Y is discrete valued
● Set of function hypotheses 𝐻 = {h h: 𝑋 → 𝑌 }
● each hypothesis h is a decision tree
● trees sorts x to leaf, which assigns y
Stages of (Batch) Machine Learning
𝑋, 𝑌 = {⟨𝒙𝒊, 𝒚𝒊⟩}
𝑛
●Given: labeled training data 𝑖=1
● Assumes each 𝑥𝑖~𝐷(𝑥) with 𝑦𝑖 = 𝑓𝑡𝑎𝑟𝑔𝑒𝑡(𝑥𝑖)
● Train the model:
● model ← [Link](X, Y )

● Apply the model to new data:

● Given: new unlabeled instance x ⇠ D(X )
● 𝑦𝑝𝑟𝑒𝑑𝑖𝑐𝐼𝑜𝑛 ← 𝑚𝑜𝑑𝑒𝑙 . 𝑝𝑟𝑒𝑑𝑖𝑐𝑡(𝑥)
Example Application:A Tree to Predict
Caesarean Section Risk
Decision Tree Induced Partition
Decision Tree – Decision Boundary
● Decision trees divide the feature space into axis- parallel
(hyper-)rectangles
● Each rectangular region is labeled with one label
● or a probability distribution over labels
Expressiveness
● Decision trees can represent any Boolean function of the
input attributes
● In the worst case, the tree will require exponentially many
nodes

Truth table row →path to leaf

Expressiveness
● Decision trees have a variable-sized hypothesis space
● As the #nodes (or depth) increases, the hypothesis space grows
● Depth 1 (“decision stump”): can represent any Boolean function of one
feature
● Depth 2: any Boolean fn of two features; some involving three features (e.g.,
(𝑥1 ∩ 𝑥2) ∪ ( ¬𝑥1 ∩ ¬𝑥3))
Another Example: Restaurant
● Model a patron’s decision of whether to wait for a table at a
restaurant
Another Example: Restaurant

Is this the best decision tree?

Preference bias: Ockham’s Razor
● Principle stated by William of Ockham (1285-1347)
● entities are not to be multiplied beyond necessity
● AKA Occam’s Razor, Law of Economy, or Law of Parsimony

Idea: The simplest consistent explanation is the

●best!
Therefore, the smallest decision tree that correctly classifies all
of the training examples is best
● Finding the provably smallest decision tree is NP-hard
● ...So instead of constructing the absolute smallest tree consistent
with the training examples, construct one that is pretty small
ID-3 Decision Tree
● node = root of decision tree
● Main loop:
● A ← the “best” decision attribute for the next node.
● Assign A as decision attribute for node.
● For each value of A, create a new descendant of node.
● Sort training examples to leaf nodes.
● If training examples are perfectly classified, stop. Else, recurse
over new leaf nodes.

How do we choose which attribute is best?

Choosing the Best Attribute
● Key problem: choosing which attribute to split a given set of examples
● Random: Select any attribute at random
● Least-Values: Choose the attribute with the smallest number of possible
values
● Most-Values: Choose the attribute with the largest number of possible
values
● Max-Gain: Choose the attribute that has the largest expected information
gain
● i.e., attribute that results in smallest expected size of subtrees rooted at its
children
● The ID3 algorithm uses the Max-Gain method of selecting the best
attribute
Choosing an Attribute
● Idea: a good attribute splits the examples into subsets that are
(ideally) “all positive” or “all negative”
● Which split is more informative: Patrons? or Type?
ID3-induced Decision Tree
Compare the Two Decision Trees
Information Gain
● Which test is more informative?
Split over whether Split over whether
Balance exceeds 50K applicant is employed

Less or equal 50K Over 50K Unemployed Employed

Information Gain
● Impurity/Entropy (informal)
● Measures the level of impurity in a group of examples
Information Gain
● Impurity/Entropy (informal)
● Measures the level of impurity in a group of examples
Information Gain
# of
● Entropy H(X) of a random variable X
possible
values for X

● H(X) is the expected number of bits needed to encode a

randomly drawn value of X (under most efficient code)
● Why? Information theory:
● Most efficient code assigns −𝑙𝑜𝑔2𝑃 (𝑋 = 𝑖)bits to encode the message
𝑋=𝑖
● So, expected number of bits to code one random X is:
2-Class Cases:
● What is the entropy of a group in which all examples belong
to the same class?
● 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = − 1log21 = 0
● not a good training set for learning

● What is the entropy of a group with 50% in either class?

● 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = − 0 . 5log20 . 5–0 . 5log20 . 5 = 1
● good training set for learning
Sample Entropy
Information Gain
● We want to determine which attribute in a given set of training feature
vectors is most useful for discriminating between the classes to be
learned.

● Information gain tells us how important a given attribute of the feature

vectors is.

● We will use it to decide the ordering of attributes in the nodes of a

decision tree.
From Entropy to Information Gain
● Entropy H(X) of a random variable X

● Specific conditional entropy H(X|Y=v) of X given Y=v

● Conditional entropy H(X|Y) of X given Y :

● Mutual information (aka Information Gain) of X and Y :

Information Gain
● Information Gain is the mutual information between input attribute A
and target variable Y

● Information Gain is the expected reduction in entropy of target

variable Y for data sample S, due to sorting on variable A
Calculating Information Gain

Information Gain= 0.996 - 0.615 = 0.38

Entropy-Based Automatic Decision Tree
Construction

● Quinlan suggested information gain in his ID3

system and later the gain ratio, both based on
entropy.
Example
Selecting the next attribute
Selecting the next attribute
Slide by Tom Mitchell
Which Tree Should We Output?
● ID3 performs heuristic search through space of
decision trees
● It stops at smallest acceptable tree. Why?

Occam’s razor: prefer the

simplest hypothesis that
fits the data
Which Tree Should We Output?
• ID3 performs heuristic
search through space of
decision trees
• It stops at smallest
acceptable tree. Why?

Occam’s razor: prefer the

simplest hypothesis that
fits the data
ID-3 Decision Tree Pseudo Code
● ID3 (Examples, Target_Attribute, Attributes)
● Create a root node for the tree
● If all examples are positive, Return the single-node tree Root, with label = +
● If all examples are negative, Return the single-node tree Root, with label = -.
● If number of predicting attributes is empty, then Return the single node tree Root,
● with label = most common value of the target attribute in the examples.
● Otherwise Begin
● A ← The Attribute that best classifies examples.
● Decision Tree attribute for Root = A.
● For each possible value, vi, of A,
● Add a new tree branch below Root, corresponding to the test A = vi.
● Let Examples(vi) be the subset of examples that have the value vi for A
● If Examples(vi) is empty
● Then below this new branch add a leaf node with label = most common target value in the examples
● Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A})
● End
● Return Root
Questions?

CSE 445: Decision Tree Basics
No ratings yet
CSE 445: Decision Tree Basics
107 pages
Decision Tree
No ratings yet
Decision Tree
63 pages
07 Supervised Learning - Decision Trees
No ratings yet
07 Supervised Learning - Decision Trees
29 pages
Decision Trees: Overview and Learning
No ratings yet
Decision Trees: Overview and Learning
118 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
46 pages
Decision Tree Learning by Mahesh Huddar
No ratings yet
Decision Tree Learning by Mahesh Huddar
21 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
18 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
22 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
79 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
12 pages
Lect9 Decision Tree
No ratings yet
Lect9 Decision Tree
39 pages
Module - 2 - Decision Tree Learning F.0
No ratings yet
Module - 2 - Decision Tree Learning F.0
20 pages
Google Sheets Decision Tree Guide
No ratings yet
Google Sheets Decision Tree Guide
20 pages
Decision Trees and Occam's Razor
No ratings yet
Decision Trees and Occam's Razor
45 pages
Induction of Decision Trees Explained
No ratings yet
Induction of Decision Trees Explained
14 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Decision Tree Induction for Classification
No ratings yet
Decision Tree Induction for Classification
55 pages
ID3 Decision Tree Algorithm Implementation
No ratings yet
ID3 Decision Tree Algorithm Implementation
20 pages
ID3 Decision Tree Implementation in Java
No ratings yet
ID3 Decision Tree Implementation in Java
20 pages
Candidate Splits in Regression Trees
No ratings yet
Candidate Splits in Regression Trees
101 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
102 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
23 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
28 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
33 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
118 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
70 pages
Supervised Learning Overview at MKAU
No ratings yet
Supervised Learning Overview at MKAU
129 pages
Decision Trees and Random Forests Explained
No ratings yet
Decision Trees and Random Forests Explained
46 pages
Inductive Bias in Decision Trees
No ratings yet
Inductive Bias in Decision Trees
78 pages
Decision Tree Learning and ID3 Algorithm
No ratings yet
Decision Tree Learning and ID3 Algorithm
15 pages
CS361 - Applied ML - Lec07 Supervised Machine Learning (Decision Trees Via ID3)
No ratings yet
CS361 - Applied ML - Lec07 Supervised Machine Learning (Decision Trees Via ID3)
56 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
38 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
61 pages
Module2 Part2 Decision Trees-1
No ratings yet
Module2 Part2 Decision Trees-1
63 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
42 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
7 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
68 pages
Decision Trees in Data Science Explained
No ratings yet
Decision Trees in Data Science Explained
34 pages
Classification Techniques in Machine Learning
No ratings yet
Classification Techniques in Machine Learning
41 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
61 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
65 pages
Decision Trees and Random Forests Overview
No ratings yet
Decision Trees and Random Forests Overview
63 pages
Decision Tree Algorithm Overview
No ratings yet
Decision Tree Algorithm Overview
16 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
42 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
46 pages
Learning Decision Trees with ID3
No ratings yet
Learning Decision Trees with ID3
87 pages
Decision Tree Learning Methods Explained
No ratings yet
Decision Tree Learning Methods Explained
43 pages
Module3 - Classification - Google Slides
No ratings yet
Module3 - Classification - Google Slides
42 pages
Decision Tree Classifier Overview
No ratings yet
Decision Tree Classifier Overview
36 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
60 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
21 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
28 pages
Decision Trees and ID3 Algorithm Overview
No ratings yet
Decision Trees and ID3 Algorithm Overview
15 pages
Hypothesis Space in Decision Trees
No ratings yet
Hypothesis Space in Decision Trees
20 pages
Supervised Learning Algorithms Overview
No ratings yet
Supervised Learning Algorithms Overview
85 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
27 pages
Decision Tree Analysis in Machine Learning
No ratings yet
Decision Tree Analysis in Machine Learning
8 pages
Understanding Decision Trees in Deep Learning
No ratings yet
Understanding Decision Trees in Deep Learning
45 pages
Understanding Decision Trees and Random Forests
No ratings yet
Understanding Decision Trees and Random Forests
32 pages
Nursery Admission Info 2025-26
No ratings yet
Nursery Admission Info 2025-26
2 pages
Cybersecurity Pricing Packages for Betting Platforms
No ratings yet
Cybersecurity Pricing Packages for Betting Platforms
4 pages
Ubiquitous Computing Applications Overview
No ratings yet
Ubiquitous Computing Applications Overview
34 pages
Neapay Com
No ratings yet
Neapay Com
2 pages
CN Unit 1
No ratings yet
CN Unit 1
25 pages
Real vs Virtual Concurrency Explained
No ratings yet
Real vs Virtual Concurrency Explained
8 pages
Data Analyst & Scientist Profile Summary
No ratings yet
Data Analyst & Scientist Profile Summary
1 page
HGM9510 Genset Controller User Manual
No ratings yet
HGM9510 Genset Controller User Manual
64 pages
OWASP Testing Guide 4.2 Checklist
No ratings yet
OWASP Testing Guide 4.2 Checklist
1 page
Mercedes e Class Sedan 2024 July w214 Mbux Operators Manual 1
No ratings yet
Mercedes e Class Sedan 2024 July w214 Mbux Operators Manual 1
1,053 pages
Verilog Hardware Modeling Assignment 1
No ratings yet
Verilog Hardware Modeling Assignment 1
6 pages
Pxie 6343 - Specifications - 2026 01 27 08 23 43
No ratings yet
Pxie 6343 - Specifications - 2026 01 27 08 23 43
23 pages
AI Tools and Resources Overview
No ratings yet
AI Tools and Resources Overview
15 pages
Honeywell Dolphin ct50 Handheld Computer Data Sheet PDF
No ratings yet
Honeywell Dolphin ct50 Handheld Computer Data Sheet PDF
2 pages
Erik Wilde, Cesare Pautasso - REST - From Research To Practice
No ratings yet
Erik Wilde, Cesare Pautasso - REST - From Research To Practice
524 pages
PFX Certificate Export - Certificate Utility
No ratings yet
PFX Certificate Export - Certificate Utility
4 pages
Ootbi: Top Storage for Veeam Security
No ratings yet
Ootbi: Top Storage for Veeam Security
13 pages
Understanding Authentication and Authorization
No ratings yet
Understanding Authentication and Authorization
3 pages
AutoCAD Civil 3D 2015 Course in Urdu
No ratings yet
AutoCAD Civil 3D 2015 Course in Urdu
2 pages
Student Enrollment System Setup
No ratings yet
Student Enrollment System Setup
3 pages
Routerboard SXT G - 2Hnd: Quick Setup Guide and Warranty Information
No ratings yet
Routerboard SXT G - 2Hnd: Quick Setup Guide and Warranty Information
4 pages
Fibonacci and Lucas Numbers With Applications Pure and Applied Mathematics Vol 2 Thomas Koshy Ebook Chapter Bundle
100% (2)
Fibonacci and Lucas Numbers With Applications Pure and Applied Mathematics Vol 2 Thomas Koshy Ebook Chapter Bundle
42 pages
Front-end Development Learning Map
No ratings yet
Front-end Development Learning Map
17 pages
C++ Programming Practical Exercises
No ratings yet
C++ Programming Practical Exercises
8 pages
Reinforcement Learning Question Bank
No ratings yet
Reinforcement Learning Question Bank
5 pages
Panel View 5510 12 Inch
No ratings yet
Panel View 5510 12 Inch
12 pages
HT8XX Analog Telephone Security Guide
No ratings yet
HT8XX Analog Telephone Security Guide
17 pages
COGNOS Guidelines and Best Practices
No ratings yet
COGNOS Guidelines and Best Practices
21 pages
Web Design and Development I Overview
No ratings yet
Web Design and Development I Overview
69 pages
Android Shutdown Log: No Power Events
No ratings yet
Android Shutdown Log: No Power Events
4 pages

Lecture 3

Uploaded by

Lecture 3

Uploaded by

Machine Learning

< 𝒙𝑖, 𝑦𝑖 >

● Input: Training examples of unknown target function 𝑓

• What prediction would we make for

● Apply the model to new data:

Truth table row →path to leaf

Is this the best decision tree?

Idea: The simplest consistent explanation is the

How do we choose which attribute is best?

Less or equal 50K Over 50K Unemployed Employed

● H(X) is the expected number of bits needed to encode a

● What is the entropy of a group with 50% in either class?

● Information gain tells us how important a given attribute of the feature

● We will use it to decide the ordering of attributes in the nodes of a

● Specific conditional entropy H(X|Y=v) of X given Y=v

● Conditional entropy H(X|Y) of X given Y :

● Mutual information (aka Information Gain) of X and Y :

● Information Gain is the expected reduction in entropy of target

Information Gain= 0.996 - 0.615 = 0.38

● Quinlan suggested information gain in his ID3

Occam’s razor: prefer the

Occam’s razor: prefer the

You might also like