Lect9 Decision Tree

The document provides an overview of decision trees for classification, detailing their structure, generation process, and application in various scenarios. It explains the phases of tree construction and pruning, the algorithm used for creating decision trees, and methods for selecting attributes based on information gain. Additionally, it discusses the importance of avoiding overfitting and enhancements to improve decision tree induction.

Uploaded by

SIDDHARTH DASH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views39 pages

Lect9 Decision Tree

Uploaded by

SIDDHARTH DASH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Classification-

Decision Tree
Classification by Decision Tree
Induction
• Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
– The topmost node in the tree is the root node.
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree
Decision Tree for PlayTennis
Decision trees classify instances or examples by starting at the root of the tree
and moving through it until a leaf node.

Outlook (1) Which to start? (root)

Sunny Overcast Rain

Humidity Each internal node (2) Which node to

tests an attribute proceed?

High Normal Each branch corresponds to an

attribute value node
No Yes Each leaf node assigns a classification
(3) When to stop/ come to conclusion?
When to consider Decision Trees

• Instances describable by attribute-value pairs

• Target function is discrete valued
• Disjunctive hypothesis may be required
• Possibly noisy training data
• Missing attribute values
• Examples:
– Medical diagnosis
– Credit risk analysis
– Object classification for robot manipulator (Tan
1993)
Basket ball data
What we know ?
• The game will be away, at 9pm, and that Joe will play
center on offense…

• A classification problem
• Generalizing the learned rule to new examples
• What you don't know, of course, is who will win this game.
• Of course, it is reasonable to assume that this future game will
resemble the past games. Note, however, there are no previous games
that match these specific values -- ie, no previous game was exactly
[Where=Away, When=9pm, FredStarts=No, JoeOffense=Center,
JoeDefends=Forward, OppC=Tall].

We therefore need to generalize -- by using the known examples to infer

the likely outcome of this new situation. But how?
Use a Decision Tree to determine who should win the game

As we did not indicate the outcome of this game we call this an

"unlabeled instance"; the goal of a classifier is finding the class label for
such unlabeled instances.
An instance that also includes the outcome is called a "labeled instance" ---
eg, the first row of the table

corresponds to the labeled instance

Decision Trees
In general, a decision tree is a tree structure; see left-hard
figure below.
Example of a Decision Tree

Tid Refund Marital Taxable

Status Income Cheat
Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO MarSt
4 Yes Married 120K No Single, Divorced Married
5 No Divorced 95K Yes
TaxInc NO
6 No Married 60K No
< 80K > 80K
7 Yes Divorced 220K No
8 No Single 85K Yes NO YES
9 No Married 75K No
10 No Single 90K Yes
10

Training Data Model: Decision Tree

Apply Model to Test Data
Test Data
Start at the root of tree Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K