0% found this document useful (0 votes)

11 views11 pages

33 Decision Tree Notes

The document provides comprehensive notes on the Decision Tree algorithm, detailing its structure, types (classification and regression), and key concepts such as entropy, information gain, and Gini impurity. It outlines the decision tree construction algorithm, stopping criteria, advantages and disadvantages, and its role in ensemble methods like Random Forest and Boosting. Additionally, it includes implementation details in Python using scikit-learn, highlighting important classes and hyperparameters.

Uploaded by

swarup.bigshot01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views11 pages

33 Decision Tree Notes

Uploaded by

swarup.bigshot01

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Comprehensive Notes on Decision Tree Algorithm

Based on Krish Naik Day 5 Live Session and Intellipaat Decision Tree Lectures

Introduction
Decision Trees are supervised learning models used for both classification and
regression tasks. They model decisions and their possible consequences in a
tree-like structure. Internal nodes represent tests on features, branches repre-
sent decision rules, and leaf nodes represent outcomes (class labels or predicted
values).
These notes synthesize explanations from:
• Krish Naik – Live Day 5: Ensemble Techniques (Bagging, Boosting, Ran-
dom Forest, AdaBoost, XGBoost), where Decision Trees appear as base
learners and Random Forest is explained as a bagging-based ensemble of
trees.
• Intellipaat – Decision Tree Algorithm lectures (concepts of entropy, in-
formation gain, Gini impurity, and practical implementation in Python /
R).

Basic Intuition of Decision Trees

Real-world Motivation
A decision tree mimics human decision-making: at each step you ask a question
about the input features and branch according to the answer until you reach a
decision.
Examples:
• Telecom Churn: Predict whether a customer will churn based on fea-
tures like gender, tenure, and monthly charges.
• Restaurant Dessert Prediction: Predict whether a customer will order
dessert based on age and whether they ordered a main course.
• Play Tennis: Predict whether a person will play tennis based on weather
conditions like outlook, temperature, humidity, and wind.

1
In each case, the decision tree splits the feature space into regions where the
target is as homogeneous as possible.

Terminology
• Root Node: The top-most node of the tree. It represents the first feature
and condition used to split the dataset. Choosing this node is a key
optimization problem (based on Information Gain or Gini decrease).
• Internal / Decision Node: A node that splits into further branches
based on a test on a feature (e.g. Price < 90).
• Leaf / Terminal Node: A node with no further children. It represents
a final prediction: a class label (for classification) or a real value (for
regression).
• Branch: The edge connecting two nodes, representing the outcome of a
test (e.g. Yes, No, or an interval condition like x < 5).
• Subtree: Any node together with all its descendants can be viewed as a
subtree.
• Depth of Tree: Length of the longest path from the root node to any
leaf node.

Types of Decision Trees

Classification Trees
Used when the target variable is categorical (e.g. churn vs no-churn, play vs
not-play, drug A/B/C). The objective is to maximize the purity of class labels
in each leaf.
The most common impurity measures for classification are:
• Entropy
• Gini impurity

Regression Trees
Used when the target variable is continuous (e.g., price, sales, salary, house
value). Instead of measuring class impurity, we typically minimize variance or
squared error in each leaf.
Common criteria for regression trees:
• Mean Squared Error (MSE)
• Mean Absolute Error (MAE)

2
Core Idea: Recursive Partitioning of Feature
Space
The decision tree recursively partitions the feature space:
1. Start with the entire training dataset at the root node.
2. At each node, consider a set of possible splits of the data using feature
thresholds.
3. Choose the split that most reduces impurity (for classification) or error
(for regression).
4. Recurse on each child node until a stopping criterion is met (e.g., maxi-
mum depth, minimum samples per leaf, or pure node).
Each path from root to leaf corresponds to a conjunction of conditions and thus
defines a decision rule (e.g., Outlook = Sunny & Humidity = High & Windy
= False ⇒ Don’t Play).

Mathematical Foundations: Entropy, Informa-

tion Gain, and Gini Impurity
Entropy
Entropy measures the amount of disorder or uncertainty in a set of labels. For
a node containing examples of 𝐾 classes, with class probabilities 𝑝1 , 𝑝2 , … , 𝑝𝐾 ,
the entropy is:
𝐾
𝐻(𝑆) = − ∑ 𝑝𝑖 log2 𝑝𝑖 .
𝑖=1

Properties:
• 𝐻(𝑆) = 0 when all samples belong to a single class (node is pure).
• 𝐻(𝑆) is maximized when all classes are equally likely, i.e. 𝑝𝑖 = 1/𝐾 for
all 𝑖.

Information Gain
Information Gain (IG) measures how much a feature 𝐹 reduces entropy when
we split the node 𝑆 based on 𝐹 .
Let the original node be 𝑆 with entropy 𝐻(𝑆). Suppose feature 𝐹 splits 𝑆 into
subsets 𝑆1 , 𝑆2 , … , 𝑆𝑚 (for example, for Outlook we might have Sunny, Overcast,
and Rainy). Then the Information Gain of splitting on feature 𝐹 is:
𝑚
|𝑆𝑗 |
Gain(𝑆, 𝐹 ) = 𝐻(𝑆) − ∑ 𝐻(𝑆𝑗 ).
𝑗=1
|𝑆|

3
Interpretation:
• 𝐻(𝑆): entropy before the split (parent node).
|𝑆𝑗 |
• ∑𝑗 𝐻(𝑆𝑗 ): weighted average entropy after the split (children nodes).
|𝑆|
• Higher Gain(𝑆, 𝐹 ) means greater reduction in uncertainty, so 𝐹 is a better
splitting feature at node 𝑆.

Example: Play Tennis Dataset

Consider the classical tennis dataset with target Play having values Yes or No.
Suppose at the root we have 9 Yes and 5 No out of 14 samples.
Entropy at root:
9 9 5 5
𝐻(𝑆) = − log2 − log2 ≈ 0.94.
14 14 14 14

If we split by Outlook with categories Sunny, Overcast, and Rain, we com-

pute entropy for each subset and then compute Information Gain. The feature
(Outlook, Temperature, Humidity, Wind) with the highest gain becomes the
root node.

Gini Impurity
Gini impurity is an alternative measure of impurity used primarily in the CART
(Classification and Regression Trees) algorithm. For a node with class probabil-
ities 𝑝1 , … , 𝑝𝐾 :
𝐾
𝐺(𝑆) = 1 − ∑ 𝑝𝑖2 .
𝑖=1

Properties:
• 𝐺(𝑆) = 0 when the node is pure (only one class present).
• Gini impurity is maximized when the classes are equally likely.
• The range of Gini impurity is [0, 1 − 1/𝐾].
Splitting criterion using Gini:
𝑚
|𝑆𝑗 |
Δ𝐺(𝑆, 𝐹 ) = 𝐺(𝑆) − ∑ 𝐺(𝑆𝑗 ).
𝑗=1
|𝑆|

The feature and threshold that maximizes Δ𝐺 is chosen at each split.

4
Entropy vs. Gini Impurity
• Both are measures of node impurity; in practice they often lead to very
similar trees.
• Entropy involves 𝑝 log 𝑝 and has an information-theoretic interpretation.
• Gini is computationally simpler (no logarithms) and is the default in many
libraries (e.g. scikit-learn).

Decision Tree Construction Algorithm

The high-level algorithm (Top-Down Induction of Decision Trees) is as follows:
1. Input: Training set 𝐷 with features 𝑋1 , … , 𝑋𝑑 and target 𝑌 .
2. If all examples in 𝐷 belong to the same class 𝑐, create a leaf node labeled
𝑐 and stop.
3. If 𝐷 is empty or no features remain, create a leaf node labeled with the
majority class in the parent node and stop.
4. Otherwise:
(a) For each candidate feature 𝐹 (and possible split thresholds, if nu-
meric), compute the impurity reduction (Information Gain or Gini
decrease).
(b) Select feature 𝐹 ∗ and threshold 𝑡∗ that maximize impurity reduction.
(c) Create a decision node that splits on 𝐹 ∗ (e.g., 𝐹 ∗ ≤ 𝑡∗ vs 𝐹 ∗ > 𝑡∗ or
categorical branches).
(d) Recurse on each child subset 𝐷𝑗 to build subtrees.
This greedy algorithm does not guarantee a globally optimal tree, but in practice
works well.

Root Node Selection

From the Intellipaat explanation, the key interview-ready statement is:
“In decision trees, the root node is selected based on the concept
of entropy or Information Gain (or equivalently, by minimizing Gini
impurity). The feature with the highest Information Gain (or highest
Gini decrease) is chosen as the root.”
Formally:
• Compute entropy 𝐻(𝑆) at the root using the target distribution.
• For each feature 𝐹 , compute Gain(𝑆, 𝐹 ).

5
• Choose 𝐹 ∗ = arg max𝐹 Gain(𝑆, 𝐹 ).
Subsequent internal nodes are chosen similarly but on the respective subsets.

Stopping Criteria and Pruning

Stopping Criteria (Pre-pruning)
To prevent the tree from growing too deep (and overfitting), we may stop split-
ting when:
• Node depth reaches a maximum (max_depth).
• Number of samples at a node is below a threshold (min_samples_split
or min_samples_leaf).
• Impurity decrease from any split is below a minimum threshold.
• Node is already suﬀiciently pure (e.g., above a purity threshold).

Post-pruning (Cost-Complexity Pruning)

Krish Naik’s R example demonstrates post-pruning:
• First, grow a full tree (possibly overfitted) on the training data.
• Use cross-validation to compute misclassification error for different subtree
sizes.
• Plot tree size vs. error and identify the subtree size with minimal error
(or where error stops decreasing significantly).
• Prune back the full tree to this optimal size (using functions like
[Link] in R or cost-complexity pruning in CART).
Conceptually, cost-complexity pruning minimizes:

𝑅𝛼 (𝑇 ) = 𝑅(𝑇 ) + 𝛼|𝑇 |,

where 𝑅(𝑇 ) is the empirical error of tree 𝑇 , |𝑇 | is the number of leaf nodes, and
𝛼 ≥ 0 is a complexity penalty. Increasing 𝛼 prefers smaller trees.

Decision Trees for Regression

For regression, the idea is similar but with continuous targets:
• At each node, choose the split that minimizes the sum of squared errors
(SSE) or variance in the child nodes.
• The prediction at a leaf is typically the mean (or median) of target values
in that leaf.

6
Split quality (using MSE) can be defined as:

|𝑆𝑗 |
ΔMSE(𝑆, 𝐹 ) = Var(𝑆) − ∑ Var(𝑆𝑗 ).
𝑗
|𝑆|

Advantages and Disadvantages

Advantages
• Interpretability: Easy to visualize and interpret; paths from root to leaf
correspond to explicit decision rules.
• Handles Mixed Data Types: Can handle both numerical and categor-
ical features without much preprocessing.
• Non-linear Decision Boundaries: Can capture complex, non-linear
relationships by hierarchical splitting.
• Little Data Preparation: Requires less feature scaling or normalization
compared to models like SVM or logistic regression.
• Versatility: Works for both classification and regression tasks (Decision-
TreeClassifier and DecisionTreeRegressor).

Disadvantages
• Overfitting: Unpruned trees tend to overfit training data, learning noise
and leading to poor generalization.
• Instability: Small changes in data can lead to very different tree struc-
tures (high variance model).
• Bias towards Features with More Levels: With some criteria, fea-
tures with many distinct values may be favored, which can lead to subop-
timal splits.
• Limited Expressiveness for Very Complex Problems: A single tree
may not perform well on highly complex tasks or high-dimensional contin-
uous spaces; ensembles like Random Forest and Gradient Boosted Trees
perform better.
Most disadvantages are mitigated by ensemble methods, where trees are used
as base learners (weak learners) and combined into a stronger model.

Decision Trees in Ensemble Methods

From Krish Naik’s Day 5 session:

7
Bagging and Random Forest
• Bagging (Bootstrap Aggregation):
– Draw multiple bootstrap samples (with replacement) from the train-
ing data.
– Train a separate model (e.g., decision tree) on each bootstrap sample
in parallel.
– For classification: combine predictions by majority voting.
– For regression: combine predictions by averaging.
• Random Forest:
– A bagging method where each tree is trained on a bootstrap sample.
– At each split, only a random subset of features is considered, intro-
ducing additional randomness.
– Reduces correlation between trees and often improves generalization
compared to plain bagging.
In Random Forest, decision trees are used as base learners and are typically
grown deep without pruning (high variance individually, but variance is reduced
by averaging).

Boosting (AdaBoost, Gradient Boosting, XGBoost)

• Boosting: Sequentially combine many weak learners (often shallow deci-
sion trees called decision stumps) into a strong learner.
• AdaBoost:
– Start with uniform weights on training samples.
– Train a weak learner (e.g., depth-1 tree) and compute its error.
– Increase weights of misclassified points and train the next weak
learner focused on those hard examples.
– Final prediction is a weighted sum (or vote) of all weak learners.
• Gradient Boosting / XGBoost:
– Fit successive trees to the residual errors (negative gradients) of pre-
vious trees.
– Optimizes a differentiable loss function using gradient descent in func-
tion space.

8
Implementation Details in Python (scikit-learn)
Key Classes
• [Link]
• [Link]

Important Hyperparameters
For DecisionTreeClassifier:
• criterion: "gini" (default) or "entropy" (or "log_loss" in newer ver-
sions).
• max_depth: Maximum depth of the tree.
• min_samples_split: Minimum number of samples required to split an
internal node.
• min_samples_leaf: Minimum samples required at a leaf node.
• max_features: Number of features to consider when looking for the best
split.
• random_state: Seed for reproducibility.

Typical Workflow (Classification)

1. Load data using pandas.
2. Handle missing values, duplicates, and encode categorical variables (e.g.,
using LabelEncoder or one-hot encoding).
3. Split into features 𝑋 and target 𝑦.
4. Perform train–test split: train_test_split(X, y, test_size=0.2,
random_state=...).
5. Initialize classifier, e.g.:
from [Link] import DecisionTreeClassifier
clf = DecisionTreeClassifier(criterion='gini', max_depth=None,
random_state=0)
6. Fit model: [Link](X_train, y_train).
7. Predict on test data: y_pred = [Link](X_test).
8. Evaluate performance using accuracy, confusion matrix, F1-score, etc.
9. Optionally visualize the tree using [Link].plot_tree or export to
Graphviz.

9
Example Sketch
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import DecisionTreeClassifier, plot_tree
from [Link] import accuracy_score

# 1. Load data
df = pd.read_csv('drug_data.csv') # example from Intellipaat

# 2. Preprocess (encode categorical features, handle missing values)

# ... (LabelEncoder or OneHotEncoder for 'Sex', 'BP', 'Cholesterol', 'Drug')

X = [Link]('Drug', axis=1)
y = df['Drug']

# 3. Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=0
)

# 4. Create and train classifier

clf = DecisionTreeClassifier(criterion='gini', random_state=0)
[Link](X_train, y_train)

# 5. Predict and evaluate

y_pred = [Link](X_test)
print('Accuracy:', accuracy_score(y_test, y_pred))

# 6. Visualize the tree

import [Link] as plt
[Link](figsize=(12, 8))
plot_tree(clf, filled=True, feature_names=[Link], class_names=clf.classes_)
[Link]()

Common Interview Points

• How is the root node selected in a decision tree?
– By choosing the feature (and threshold) that maximizes Information
Gain (entropy-based) or Gini impurity decrease at the root.
• Difference between entropy and Gini impurity?
– Both measure impurity; entropy comes from information theory, Gini
is simpler computationally. Both often lead to similar splits.

10
• Why do decision trees overfit?
– They can grow deep and memorize training data, capturing noise and
variance. Unpruned trees have low bias but very high variance.
• How to prevent overfitting?
– Pre-pruning (max_depth, min_samples_leaf, impurity thresholds)
and post-pruning (cost-complexity pruning). Also, use ensembles
(Random Forest, Gradient Boosting).
• Can decision trees handle both regression and classification?
– Yes. For regression, the split criterion is based on variance/MSE; leaf
prediction is often the mean of targets.
• What are weak learners in boosting?
– Typically shallow decision trees (e.g. depth-1 or depth-2) that indi-
vidually have performance slightly better than random but combine
to form a strong learner.

Summary
Decision Trees are fundamental models in machine learning, both as standalone
interpretable models and as building blocks of powerful ensembles like Ran-
dom Forests and Gradient Boosted Trees. Understanding entropy, Information
Gain, Gini impurity, and pruning is essential not only for theoretical mastery
but also for practical tuning and interpretation. These notes consolidate con-
ceptual explanations, mathematical foundations, and implementation patterns
as presented in Krish Naik’s ensemble-learning session (where trees act as base
learners) and Intellipaat’s detailed decision tree tutorials.

Decision Tree Guide
No ratings yet
Decision Tree Guide
20 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
40 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
42 pages
Understanding Decision Tree Algorithms
No ratings yet
Understanding Decision Tree Algorithms
21 pages
Machine Learning: Decision Tree Overview
No ratings yet
Machine Learning: Decision Tree Overview
24 pages
Understanding Decision Trees in Data Science
No ratings yet
Understanding Decision Trees in Data Science
61 pages
Understanding Confusion Matrix & KNN
No ratings yet
Understanding Confusion Matrix & KNN
31 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
4 pages
Data Mining Course Material
No ratings yet
Data Mining Course Material
36 pages
Aiml
No ratings yet
Aiml
16 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
37 pages
Machine Learning Unit III Notes
No ratings yet
Machine Learning Unit III Notes
46 pages
Understanding Decision Trees in Machine Learning
No ratings yet
Understanding Decision Trees in Machine Learning
39 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
5 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
8 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
26 pages
Decision Tree Classification Explained
No ratings yet
Decision Tree Classification Explained
15 pages
Decision Tree Classifier Tutorial With Python
No ratings yet
Decision Tree Classifier Tutorial With Python
16 pages
Decision Trees and Probabilistic Models
No ratings yet
Decision Trees and Probabilistic Models
25 pages
Nonlinear Models in Supervised Learning
No ratings yet
Nonlinear Models in Supervised Learning
30 pages
Tree-Based Models - Decision Trees. Random Forests
No ratings yet
Tree-Based Models - Decision Trees. Random Forests
73 pages
L7 Supervised Learning Decision Tree
No ratings yet
L7 Supervised Learning Decision Tree
27 pages
A Decision Tree Is A Supervised Learning Algorithm
No ratings yet
A Decision Tree Is A Supervised Learning Algorithm
9 pages
Understanding Decision Tree Classification
No ratings yet
Understanding Decision Tree Classification
25 pages
Understanding Decision Trees and Algorithms
No ratings yet
Understanding Decision Trees and Algorithms
34 pages
Understanding Decision Tree Algorithms
No ratings yet
Understanding Decision Tree Algorithms
5 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
31 pages
Decision Tree Complexity and Splits
No ratings yet
Decision Tree Complexity and Splits
18 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
13 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
11 pages
Decision Tree Algorithm Overview
No ratings yet
Decision Tree Algorithm Overview
63 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
10 pages
Lecture 9 - Decision Tree
No ratings yet
Lecture 9 - Decision Tree
17 pages
Decision Tree Dr. Iman Sayed
No ratings yet
Decision Tree Dr. Iman Sayed
22 pages
K-Fold CV: Pros and Cons Explained
No ratings yet
K-Fold CV: Pros and Cons Explained
30 pages
ML Unit 3
No ratings yet
ML Unit 3
35 pages
Decision Tree Algorithm Explained
No ratings yet
Decision Tree Algorithm Explained
14 pages
Understanding Decision Trees in Data Analysis
No ratings yet
Understanding Decision Trees in Data Analysis
9 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
40 pages
Supervised Learning: Nonlinear Models Overview
No ratings yet
Supervised Learning: Nonlinear Models Overview
30 pages
Overview of Decision Tree Algorithms
No ratings yet
Overview of Decision Tree Algorithms
19 pages
Decision Trees in Machine Learning Explained
No ratings yet
Decision Trees in Machine Learning Explained
13 pages
Understanding Decision Trees in Machine Learning
No ratings yet
Understanding Decision Trees in Machine Learning
9 pages
Decision Tree Algorithm Overview
No ratings yet
Decision Tree Algorithm Overview
17 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
14 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
DTree Notes English
No ratings yet
DTree Notes English
20 pages
Supervised Learning: Nonlinear Models
No ratings yet
Supervised Learning: Nonlinear Models
40 pages
Python Decision Tree Pruning Guide
No ratings yet
Python Decision Tree Pruning Guide
5 pages
Unit 4 DWM
No ratings yet
Unit 4 DWM
31 pages
DTSC Sem4 Eng
No ratings yet
DTSC Sem4 Eng
100 pages
Sec 6 - Decision Trees
No ratings yet
Sec 6 - Decision Trees
26 pages
Data Mining: Classifiers and Decision Trees
No ratings yet
Data Mining: Classifiers and Decision Trees
83 pages
Decision Tree Learning and Prediction Techniques
No ratings yet
Decision Tree Learning and Prediction Techniques
19 pages
Decision Tree Uodated Final
No ratings yet
Decision Tree Uodated Final
28 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
15 pages
Decision Tree
No ratings yet
Decision Tree
10 pages
Unit IIDecision Tree
No ratings yet
Unit IIDecision Tree
30 pages
Probability Theory and Statistics Course
No ratings yet
Probability Theory and Statistics Course
3 pages
Type II Error Probability Calculation Guide
No ratings yet
Type II Error Probability Calculation Guide
4 pages
Autocorrelation in Linear Regression
No ratings yet
Autocorrelation in Linear Regression
18 pages
Ithaca Rainfall Analysis and Statistics
No ratings yet
Ithaca Rainfall Analysis and Statistics
14 pages
Research Data Presentation Techniques
No ratings yet
Research Data Presentation Techniques
34 pages
TMDB Box Office Revenue Prediction
No ratings yet
TMDB Box Office Revenue Prediction
7 pages
Normal and Binomial Questions
No ratings yet
Normal and Binomial Questions
9 pages
Business Statistics Assignment Overview
No ratings yet
Business Statistics Assignment Overview
5 pages
Box and Whisker Plot Template
No ratings yet
Box and Whisker Plot Template
10 pages
How to Analyze Correlation Effectively
No ratings yet
How to Analyze Correlation Effectively
2 pages
Cox PH Model for Time-Dependent Variables
No ratings yet
Cox PH Model for Time-Dependent Variables
31 pages
Understanding Binomial and Normal Distributions
No ratings yet
Understanding Binomial and Normal Distributions
8 pages
Statistical Process Control Overview
No ratings yet
Statistical Process Control Overview
0 pages
Variogram Model Types and Fitting
No ratings yet
Variogram Model Types and Fitting
17 pages
Unit 8: Probability & Statistics Assessment
0% (1)
Unit 8: Probability & Statistics Assessment
5 pages
MCA Projects in Mobile App Development & AI
No ratings yet
MCA Projects in Mobile App Development & AI
5 pages
Cost of Living and Price Index Analysis
No ratings yet
Cost of Living and Price Index Analysis
98 pages
Standard Deviation and Variance Explained
No ratings yet
Standard Deviation and Variance Explained
9 pages
Edexcel AS Level Statistics Exam Paper
No ratings yet
Edexcel AS Level Statistics Exam Paper
12 pages
WTP Analysis for PDAM Tirta Raharja
No ratings yet
WTP Analysis for PDAM Tirta Raharja
12 pages
Stationary vs Nonstationary Time Series
No ratings yet
Stationary vs Nonstationary Time Series
6 pages
Heteroskedasticity: (Chapter 10.1 - 10.4, 10.7)
No ratings yet
Heteroskedasticity: (Chapter 10.1 - 10.4, 10.7)
83 pages
KDS Lab: Data Analysis Techniques
No ratings yet
KDS Lab: Data Analysis Techniques
1 page
Central Limit Theorem Explained
No ratings yet
Central Limit Theorem Explained
22 pages
Hypothesis Testing for Population Mean
No ratings yet
Hypothesis Testing for Population Mean
12 pages
ASTSA Package for Time Series Analysis
No ratings yet
ASTSA Package for Time Series Analysis
55 pages
Measures of Central Tendency Answers
100% (2)
Measures of Central Tendency Answers
11 pages
Missing Data DAGS R448-Reprint
No ratings yet
Missing Data DAGS R448-Reprint
12 pages
2015-2016 ECON1003 Practice Exam
No ratings yet
2015-2016 ECON1003 Practice Exam
7 pages

33 Decision Tree Notes

Uploaded by

33 Decision Tree Notes

Uploaded by

Comprehensive Notes on Decision Tree Algorithm

Basic Intuition of Decision Trees

Types of Decision Trees

Mathematical Foundations: Entropy, Informa-

Example: Play Tennis Dataset

If we split by Outlook with categories Sunny, Overcast, and Rain, we com-

The feature and threshold that maximizes Δ𝐺 is chosen at each split.

Decision Tree Construction Algorithm

Root Node Selection

Stopping Criteria and Pruning

Post-pruning (Cost-Complexity Pruning)

Decision Trees for Regression

Advantages and Disadvantages

Decision Trees in Ensemble Methods

Boosting (AdaBoost, Gradient Boosting, XGBoost)

Typical Workflow (Classification)

# 2. Preprocess (encode categorical features, handle missing values)

# 4. Create and train classifier

# 5. Predict and evaluate

# 6. Visualize the tree

Common Interview Points

You might also like