0% found this document useful (0 votes)

10 views60 pages

Week 10

The document discusses ensemble methods in machine learning, focusing on decision trees and random forests. It explains the concepts of bagging and boosting techniques, their advantages and disadvantages, and how they improve predictive performance. Additionally, it covers the structure and building of decision trees, including algorithms like ID3 and CART, along with their applications in classification tasks.

Uploaded by

smishra8094

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views60 pages

Week 10

Uploaded by

smishra8094

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Ensemble Methods, Decision Tree

and Random forest

Prof. Gajendra P.S. Raghava

Head, Center for Computational Biology

Web Site: [Link]

These slides were created with using various resources so

no claim of authorship on any slide
Ensemble learning
Ensembles of Classifiers

 Idea
 Combine the classifiers to improve the performance
 Ensembles of Classifiers
 Combine the classification results from different
classifiers to produce the final output
 Unweighted voting
 Weighted voting
Build Ensemble Classifiers
• Basic idea:
Build different “experts”, and let them vote
• Advantages:
Improve predictive performance
Other types of classifiers can be directly included
Easy to implement
No too much parameter tuning
• Disadvantages:
The combined classifier is not so transparent (black box)
Not a compact representation
Ensemble-based methods
 Classifier combination/Aggregator is a general concept
 Creation of large set of classifiers, here called the “base
learners”
 Building large number of base classifiers
 These sets and classifies should be diverse
 Aim to maximize accuracy using combination
 These methods will be:
 Bagging techniques (Random Forest)
 Boosting techniques (AdaBoost)
Ensemble Techniques
Bagging Techniques Boosting Techniques

Bootstrap Aggregation • AdaBoost (Adaptive Boosting)

• Random Forest • Gradient Boosting Machines
• Extra-Tree • XGBoost (Extreme Gradient Boosting)
(Extremely Randomized Trees) • CatBoost (Category Boosting)
• LightGBM

Bagging techniques are a type of ensemble learning method that involves training multiple models in
parallel and then combining their predictions to improve the overall performance and robustness of the
predictive model.
Boosting is an ensemble learning technique that works by sequentially training a series of weak models,
each trying to correct the errors made by its predecessors.
Bias/Variance Tradeoff
 Minimize variance Low Variance High Variance
– Bagging .

– Random Forests Low Bias ... . . . .

• Minimize bias
.
– Functional Gradient Descent .
High Bias .... .
– Boosting .
– Ensemble Selection
Bias: Measure closeness between predicted and actual value, higher difference higher
biasness.
Variance: Predicted values are scattered or dense; variance in predicted values
Bagging
 Bagging = Bootstrap + aggregating
 It uses bootstrap resampling to generate L different training sets
 On the L training sets it trains L base learners
 During testing it aggregates the L learners by taking their average
 In case of classification, it uses majority voting
 The diversity or complementarity of the base learners is not controlled
 The ensemble model is almost always better than the unique base learners
if the base learners are unstable (which means that a small change in the
training dataset may cause a large change in the result of the training)
Bootstrap resampling
 Suppose we have a training set with n samples
 Create L different training sets
 Bootstrap resampling takes random samples
 Randomness is required to obtain different sets for L rounds
 Allowing replacement is required
 As the L training sets are different, the result of the training over
these set will be different
 Works better with unstable learners (e.g. neural nets, decision
trees)
 Not really effective with stable learners (e.g. k-NN, SVM)
Aggregation methods
 There are several methods to combine (aggregate) the outputs of the various classifiers
 When the output is a class label:
 Majority voting
 Weighted majority voting (e.g we can weight each classifier
by ts reliability (which also has to be estimated somehow, of course…)
 When the output is numeric (e.g. a probability
estimate for each class ci):
 We can combine the dj scores by taking their (weighted)
mean, product, minimum, maximum, …
 Stacking
 Instead of using the above simple aggregation rules, we can
train yet another classifier on the output values of the base
classifiers
Decision Tree for PlayTennis

Outlook

Sunny Overcast Rain

Humidity Each internal node tests an attribute

High Normal Each branch corresponds to an

attribute value node
No Yes Each leaf node assigns a classification
16
When to consider Decision Trees
 Instances describable by attribute-value pairs
 Target function is discrete valued
 Disjunctive hypothesis may be required
 Possibly noisy training data
 Missing attribute values
 Examples:
 Medical diagnosis
 Credit risk analysis
 Object classification for robot manipulator (Tan 1993) 17
Converting a Tree to Rules
Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

R1: If (Outlook=Sunny)  (Humidity=High) Then PlayTennis=No

R2: If (Outlook=Sunny)  (Humidity=Normal) Then PlayTennis=Yes
R3: If (Outlook=Overcast) Then PlayTennis=Yes
R4: If (Outlook=Rain)  (Wind=Strong) Then PlayTennis=No
R5: If (Outlook=Rain)  (Wind=Weak) Then PlayTennis=Yes 18
Basic Concept
 A Decision Tree is an important data structure known to
solve many computational problems
Binary Decision Tree
A B C f
0 0 0 m0
0 0 1 m1
0 1 0 m2
0 1 1 m3
1 0 0 m4
1 0 1 m5
1 1 0 m6
1 1 1 m7

19
Basic Concept
 In last example, we have considered a decision tree where
values of any attribute if binary only. Decision tree is also
possible where attributes are of continuous data type
Decision Tree with numeric data

20
Some Characteristics
 Decision tree may be n-array, n ≥ 2.

 There is a special node called root node.

 All nodes drawn with circle (ellipse) are called internal nodes.
 All nodes drawn with rectangle boxes are called terminal nodes or leaf nodes.
 Edges of a node represent the outcome for a value of the node.
 In a path, a node with same label is never repeated.
 Decision tree is not unique, as different ordering of internal nodes can give
different decision tree.

21
Decision Tree and Classification Task
 Decision tree helps us to classify data.
 Internal nodes are some attribute

 Edges are the values of attributes

 External nodes are the outcome of classification

 Such a classification is, in fact, made by posing questions starting from the root
node to each terminal node.

22
Decision Tree and Classification Task
Vertebrate Classification
Name Body Temperature Skin Cover Gives Birth Aquatic Aerial Has Legs Hibernates Class
Creature Creature

Human Warm hair yes no no yes no Mammal

Python Cold scales no no no no yes Reptile
Salmon Cold scales no yes no no no Fish
Whale Warm hair yes yes no no no Mammal
Frog Cold none no semi no yes yes Amphibian
Komodo Cold scales no no no yes no Reptile
Bat Warm hair yes no yes yes yes Mammal
Pigeon Warm feathers no no yes yes no Bird
Cat Warm fur yes no no yes no Mammal
Leopard Cold scales yes yes no no no Fish
Turtle Cold scales no semi no yes no Reptile
Penguin Warm feathers no semi no yes no Bird
Porcupine Warm quills yes no no yes yes Mammal
Eel Cold scales no yes no no no Fish
Salamander Cold none no semi no yes yes Amphibian

23
What are the class label of Dragon and Shark?
Decision Tree and Classification Task
Vertebrate Classification
 Suppose, a new species is discovered as follows.
Name Body Skin Gives Aquatic Aerial Has Hibernates Class
Temperature Cover Birth Creature Creature Legs

Gila Monster cold scale no no no yes yes

?
 Decision Tree that can be inducted based on the data is as follows.

24
Building Decision Tree
 Many decision tree can be constructed from a dataset
 Some of the tree may not be optimum
 Some of them may give inaccurate result
 Two approaches are known
 Greedy strategy
 A top-down recursive divide-and-conquer
 Modification of greedy strategy
 ID3
 C4.5
 CART, etc.
Unsupervised decision trees are structurally similar to hierarchical clustering
25
Built Decision Tree Algorithm
 Algorithm BuiltDT
 Input: D : Training data set
 Output: T : Decision tree
Steps
1. If all tuples in D belongs to the same class Cj
Add a leaf node labeled as Cj
Return // Termination condition
2. Select an attribute Ai (so that it is not selected twice in the same branch)
3. Partition D = { D1, D2, …, Dp} based on p different values of Ai in D
4. For each Dk ϵ D
Create a node and add an edge between D and Dk with label as the Ai’s attribute value in Dk
5. For each Dk ϵ D
BuildTD(Dk) // Recursive call
6. Stop 26
Node Splitting in BuildDT Algorithm
 BuildDT algorithm must provides a method for expressing an attribute test condition and
corresponding outcome for different attribute type

 Case: Binary attribute

 This is the simplest case of node splitting
 The test condition for a binary attribute generates only two outcomes

27
Node Splitting in BuildDT Algorithm
 Case: Nominal attribute
 Since a nominal attribute can have many values, its test condition can be expressed
in two ways:
 A multi-way split
 A binary split

 Muti-way split: Outcome depends on the number of distinct values for the
corresponding attribute

 Binary splitting by grouping attribute values

CS 40003: Data Analytics 28

Node Splitting in BuildDT Algorithm
 Case: Ordinal attribute
 It also can be expressed in two ways:
 A multi-way split
 A binary split

 Muti-way split: It is same as in the case of nominal attribute

 Binary splitting attribute values should be grouped maintaining the order property of the attribute
values

29
Node Splitting in BuildDT Algorithm
 Case: Numerical attribute
 For numeric attribute (with discrete or continuous values), a test condition can be expressed
as a comparison set

 Binary outcome: A >v or A ≤ v

 In this case, decision tree induction must consider all possible split positions

 Range query : vi ≤ A < vi+1 for i = 1, 2, …, q (if q number of ranges are chosen)

 Here, q should be decided a priori

 For a numeric attribute, decision tree induction is a combinatorial optimization problem

30
Illustration : BuildDT Algorithm
Person Gender Height Class Attributes:
1 F 1.6 S Gender = {Male(M), Female (F)} // Binary attribute
2 M 2.0 M Height = {1.5, …, 2.5} // Continuous
3 F 1.9 M attribute
4 F 1.88 M
5 F 1.7 S Class = {Short (S), Medium (M), Tall (T)}
6 M 1.85 M
7 F 1.6 S
8 M 1.7 S
9 M 2.2 T Given a person, we are to test in which class s/he
10 M 2.1 T belongs
11 F 1.8 M
12 M 1.95 M
13 F 1.9 M
14 F 1.8 M
15 F 1.75 S
31
Illustration : BuildDT Algorithm
 To built a decision tree, we can select an attribute in two different orderings:
<Gender, Height> or <Height, Gender>

 Further, for each ordering, we can choose different ways of splitting

 Different instances are shown in the following.

 Approach 1 : <Gender, Height>

CS 40003: Data Analytics 32

Illustration : BuildDT Algorithm

33
Illustration : BuildDT Algorithm
 Approach 2 : <Height, Gender>

34
Concept of Entropy

More ordered Less ordered

less entropy higher entropy
More organized or Less organized or
ordered (less probable) disordered (more probable)

35
Information Gain

Examples (Gini Index) Examples (Entropy)

Let a bag have 4 red balls Let a bag have 4 red balls
Gini Index (Impurity) = 1 – 1 = 0 Entropy = -1 log2 (1) = 0
Purity is 100% Let a bag have 2 red and 2 blue balls
Let a bag have 2 red and 2 blue balls Entropy = -1/2 log2 (1/2) - 1/2 log2 (1/2)
Gini Index = 1 – (1/4 + 1/4) = 0.5 =1
Let a bag have 3 red and 1 blue balls Let a bag have 3 red and 1 blue balls
Gini Index = 1 – (1/16 + 9/16) = 6/16 Entropy = 0.811
ID3: Decision Tree Induction Algorithms

 Iterative Dichotomizer 3 for decision trees,

 In ID3, each node corresponds to a splitting attribute and each arc is a possible value of that
attribute.

 At each node, the splitting attribute is selected to be the most informative

 In ID3, entropy is used to measure how informative a node is.
 It is observed that splitting on any attribute has the property that average entropy of the resulting training subsets
will be less than or equal to that of the previous training set.

 ID3 algorithm defines a measurement of a splitting called Information Gain to determine the goodness of a
split.
 The attribute with the largest value of information gain is chosen as the splitting attribute and

 it partitions into a number of smaller training sets based on the distinct values of attribute under split.

37
CART Algorithm
 It is observed that information gain measure used in ID3 is biased towards test with many
outcomes, that is, it prefers to select attributes having a large number of values.

 L. Breiman, J. Friedman, R. Olshen and C. Stone in 1984 proposed an algorithm to build a

binary decision tree also called CART decision tree.
 CART stands for Classification and Regression Tree
 In fact, invented independently at the same time as ID3 (1984).
 ID3 and CART are two cornerstone algorithms spawned a flurry of work on decision tree induction.

 CART is a technique that generates a binary decision tree; That is, unlike ID3, in CART, for
each node only two children is created.

 ID3 uses Information gain as a measure to select the best attribute to be splitted, whereas CART
do the same but using another measurement called Gini index . It is also known as Gini Index
of Diversity and is denote as 𝛾.

38
Gini Index of Diversity
Gini Index

Suppose, D is a training set with size |D| and 𝐶 = 𝑐1, 𝑐2, … , 𝑐𝑘 be the set of k classifications and 𝐴 =
𝑎1 , 𝑎2 , … , 𝑎𝑚 be any attribute with m different values of it. Like entropy measure in ID3, CART proposes
Gini Index (denoted by G) as the measure of impurity of D. It can be defined as follows.
𝑘
𝐺 𝐷 = 1 − ෍ 𝑝𝑖2
𝑖=1
where 𝑝𝑖 is the probability that a tuple in D belongs to class 𝑐𝑖 and 𝑝𝑖 can be estimated as
|𝐶𝑖 , 𝐷|
𝑝𝑖 =
𝐷
where |𝐶𝑖 , 𝐷| denotes the number of tuples in D with class 𝑐𝑖 .

39
Algorithm C 4.5
 J. Ross Quinlan, a researcher in machine learning developed a decision tree induction algorithm
in 1984 known as ID3 (Iterative Dichotometer 3).

 Quinlan later presented C4.5, a successor of ID3, addressing some limitations in ID3.

 ID3 uses information gain measure, which is, in fact biased towards splitting attribute having a
large number of outcomes.

 For example, if an attribute has distinct values for all tuples, then it would result in a large
number of partitions, each one containing just one tuple.
 In such a case, note that each partition is pure, and hence the purity measure of the partition, that is
𝐸𝐴 𝐷 = 0

40
Algorithm C4.5 : Introduction
Limitation of ID3

In the following, each tuple belongs to a unique class. The splitting on A is shown.

𝑛 𝑛
𝐷𝑗 1
𝐸𝐴 𝐷 = ෍ . 𝐸 𝐷𝑗 = ෍ .0 = 0
𝐷 𝐷
𝑗=1 𝑗=1

Thus, 𝛼 𝐴, 𝐷 = 𝐸 𝐷 − 𝐸𝐴 𝐷 is maximum in such a situation. 41

Algorithm: C 4.5 : Introduction
 Although, the previous situation is an extreme case, intuitively, we can infer
that ID3 favours splitting attributes having a large number of values
 compared to other attributes, which have a less variations in their values.

 Such a partition appears to be useless for classification.

 This type of problem is called overfitting problem.

Note:
Decision Tree Induction Algorithm ID3 may suffer from overfitting problem.

42
Algorithm: C 4.5 : Introduction
 The overfitting problem in ID3 is due to the measurement of information gain.

 In order to reduce the effect of the use of the bias due to the use of information gain, C4.5 uses a
different measure called Gain Ratio, denoted as 𝛽.

 Gain Ratio is a kind of normalization to information gain using a split information.

Notes on Decision Tree Induction algorithms
1. Optimal Decision Tree: Finding an optimal decision tree is an NP-complete problem. Hence, decision tree induction
algorithms employ a heuristic based approach to search for the best in a large search space. Majority of the algorithms
follow a greedy, top-down recursive divide-and-conquer strategy to build decision trees.

2. Missing data and noise: Decision tree induction algorithms are quite robust to the data set with missing values and
presence of noise. However, proper data pre-processing can be followed to nullify these discrepancies.

3. Redundant Attributes: The presence of redundant attributes does not adversely affect the accuracy of decision trees.
It is observed that if an attribute is chosen for splitting, then another attribute which is redundant is unlikely to chosen
for splitting.

4. Computational complexity: Decision tree induction algorithms are computationally inexpensive, in particular, when
the sizes of training sets are large, Moreover, once a decision tree is known, classifying a test record is extremely fast,
with a worst-case time complexity of O(d), where d is the maximum depth of the tree.

44
RANDOM FOREST
➢Random forest is a supervised learning algorithm can
be used for classification as well as regression.
➢It is mainly used for classification problems.
➢The "forest" it builds is an ensemble of decision trees,
usually trained with the “bagging” method.
➢The general idea of the bagging method is to
increases the overall performance.
Working of Random Forest Algorithm
 Step 1: First, start with the selection of random samples from a
given dataset.
 Step 2: Next, this algorithm will construct a decision tree for
every sample. Then it will get the prediction result from every
decision tree.
 Step 3: In this step, voting will be performed for every predicted
result.
 Step 4: At last, select the most voted prediction result as the
final prediction result.
Random Forest Classifier
Training Data

M features
N examples
Random Forest Classifier

M features
N examples

....…
Random Forest Classifier
Construct a decision tree

M features
N examples

....…
Random Forest Classifier
At each node in choosing the split feature
choose only among m<M features

M features
N examples

....…
Random Forest Classifier
Create decision tree
from each bootstrap sample

M features
N examples

....…
....…
Random Forest Classifier

M features
N examples

Take he
majority
vote

....…
....…
Important Points
➢Random forest has nearly the same hyperparameters as a decision tree or
a bagging classifier.
➢Random forest adds additional randomness to the model, while growing
the trees. Instead of searching for the most important feature while splitting
a node, it searches for the best feature among a random subset of
features. This results in a wide diversity that generally results in a better
model.
➢Therefore, in random forest, only a random subset of the features is taken
into consideration by the algorithm for splitting a node. You can even make
trees more random by additionally using random thresholds for each
feature rather than searching for the best possible thresholds.
Important Points
 Random forest is based decision trees that are created by randomly
splitting the data.
 Generating decision trees is also known as forest.
 Each decision tree is formed using feature selection indicators like
information gain.
 Each tree is dependent on an independent sample.
 Considering it to be a classification problem, then each tree computes
votes and the highest votes class is chosen.
 If it's regression, the average of all the tree's outputs is declared as
the result.
 It is the most powerful algorithm compared to all others.
FEATURE IMPORTANCE
 It is very easy to measure the relative importance of each feature on the
prediction.
 Sklearn provides a great tool for this that measures a feature's importance.
 Tree nodes that use that feature reduce impurity across all trees in the forest.
 By looking at the feature importance you can decide which features to possibly
drop
 This is important because a general rule in machine learning is that the more
features you have the more likely your model will suffer from overfitting and vice
versa.
 Random forests make use of Gini importance or MDI (Mean decrease impurity)
to compute the importance of each attribute.
 The amount of total decrease in node impurity is also called Gini importance.
 This is the method through which accuracy or model fit decreases when there is a
drop of feature.
Features and Advantages
The advantages of random forest are:
 One of the most accurate learning algorithms available.
 Runs efficiently on large databases.
 It can handle thousands of input variables.
 Identification of importance of variables in classification
 Highly effective in estimating missing data.
 It has methods for balancing error in class population unbalanced data sets.
 Generated forests can be saved for future use on other data.
 It computes proximities between pairs of cases (clustering, locating outliers)
 The capabilities can be used for unsupervised learning
 It offers an experimental method for detecting variable interactions.

58/14
Implementation in Python
 import numpy as np
 import [Link] as plt
 import pandas as pd
 from sklearn.model_selection import train_test_split X_train, X_test,
y_train, y_test = train_test_split(X, y, test_size = 0.30)
 from [Link] import RandomForestClassifier
 classifier = RandomForestClassifier(n_estimators = 50)
 [Link](X_train, y_train)
 y_pred = [Link](X_test)
Thank You

Decision Trees and Ensemble Learning Guide
No ratings yet
Decision Trees and Ensemble Learning Guide
47 pages
Decision Trees and Ensemble Learning Guide
No ratings yet
Decision Trees and Ensemble Learning Guide
70 pages
Decision Trees and Ensemble Methods Guide
No ratings yet
Decision Trees and Ensemble Methods Guide
13 pages
ML Unit 3.
No ratings yet
ML Unit 3.
35 pages
Decision Trees and Ensemble Learning - CART, Random Forest, Bagging, AdaBoost, and Gradient Boosti
No ratings yet
Decision Trees and Ensemble Learning - CART, Random Forest, Bagging, AdaBoost, and Gradient Boosti
21 pages
Decision Trees and Ensemble Learning Guide
No ratings yet
Decision Trees and Ensemble Learning Guide
46 pages
Unit - 3-1
No ratings yet
Unit - 3-1
12 pages
Decision Trees and Ensemble Methods Overview
No ratings yet
Decision Trees and Ensemble Methods Overview
14 pages
Decision Trees and Ensemble Learning Guide
No ratings yet
Decision Trees and Ensemble Learning Guide
40 pages
Topic 8
No ratings yet
Topic 8
36 pages
Understanding Decision Tree Classification
No ratings yet
Understanding Decision Tree Classification
6 pages
Understanding Decision Trees in Machine Learning
No ratings yet
Understanding Decision Trees in Machine Learning
14 pages
KNN and Decision Trees in Data Science
No ratings yet
KNN and Decision Trees in Data Science
52 pages
Tree and Probabilistic Models Overview
No ratings yet
Tree and Probabilistic Models Overview
46 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
32 pages
Classification and Prediction Methods
No ratings yet
Classification and Prediction Methods
64 pages
Decision Trees and Ensemble Methods
No ratings yet
Decision Trees and Ensemble Methods
69 pages
Understanding Ensemble Methods in ML
No ratings yet
Understanding Ensemble Methods in ML
19 pages
DSA 5102: Decision Trees & Ensembling
No ratings yet
DSA 5102: Decision Trees & Ensembling
34 pages
Decision Trees and Random Forests Overview
No ratings yet
Decision Trees and Random Forests Overview
55 pages
3 Module DWM
No ratings yet
3 Module DWM
16 pages
Classification and Prediction Techniques
No ratings yet
Classification and Prediction Techniques
93 pages
Decision Trees and Ensemble Learning Guide
No ratings yet
Decision Trees and Ensemble Learning Guide
13 pages
Chap4 Ensemble 28
No ratings yet
Chap4 Ensemble 28
27 pages
Bagging, Boosting, and Random Forests
No ratings yet
Bagging, Boosting, and Random Forests
27 pages
DM Module III 1 of 2
No ratings yet
DM Module III 1 of 2
30 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
56 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
72 pages
Decision Tree Ensemble Techniques
No ratings yet
Decision Tree Ensemble Techniques
38 pages
Overview of Ensemble Learning Methods
No ratings yet
Overview of Ensemble Learning Methods
49 pages
Decision Tree Induction Overview
No ratings yet
Decision Tree Induction Overview
74 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
Decision Trees and Random Forests Guide
No ratings yet
Decision Trees and Random Forests Guide
27 pages
Hypothesis Space in Decision Trees
No ratings yet
Hypothesis Space in Decision Trees
55 pages
Tree and Probabilistic Models in ML
No ratings yet
Tree and Probabilistic Models in ML
15 pages
Classification Concepts and Techniques
No ratings yet
Classification Concepts and Techniques
103 pages
Data Classification and Decision Trees
No ratings yet
Data Classification and Decision Trees
34 pages
Decision Tree Classification Process
No ratings yet
Decision Tree Classification Process
130 pages
Decision Trees in Data Mining Techniques
No ratings yet
Decision Trees in Data Mining Techniques
21 pages
Classification Methods in DWDM
No ratings yet
Classification Methods in DWDM
14 pages
Understanding CART: Decision Trees Explained
No ratings yet
Understanding CART: Decision Trees Explained
25 pages
Decision Tree Induction in DWDM
No ratings yet
Decision Tree Induction in DWDM
11 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
68 pages
Understanding Decision Tree Learning
No ratings yet
Understanding Decision Tree Learning
52 pages
Decision Tree Construction and Pruning
No ratings yet
Decision Tree Construction and Pruning
37 pages
Classification Techniques in Data Analysis
No ratings yet
Classification Techniques in Data Analysis
11 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
32 pages
Data Mining: Classification & Clustering Techniques
No ratings yet
Data Mining: Classification & Clustering Techniques
81 pages
Understanding Decision Tree Nodes
No ratings yet
Understanding Decision Tree Nodes
18 pages
Decision Tree Learning Overview
100% (1)
Decision Tree Learning Overview
41 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
11 pages
Understanding Bagging and Boosting
No ratings yet
Understanding Bagging and Boosting
32 pages
Decision Trees and Ensemble Methods Overview
No ratings yet
Decision Trees and Ensemble Methods Overview
26 pages
Chapter 03 Decision Tree
No ratings yet
Chapter 03 Decision Tree
43 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
34 pages
Decision Trees and Ensemble Learning Techniques
No ratings yet
Decision Trees and Ensemble Learning Techniques
32 pages
Data Mining: Decision Tree Techniques
No ratings yet
Data Mining: Decision Tree Techniques
8 pages
Decision Tree Learning Techniques
No ratings yet
Decision Tree Learning Techniques
26 pages
Lec 10
No ratings yet
Lec 10
27 pages
Interpretability & Fairness in AI Models
No ratings yet
Interpretability & Fairness in AI Models
273 pages
Cervical Cancer Prediction Analysis
No ratings yet
Cervical Cancer Prediction Analysis
4 pages
28 1 Paper
No ratings yet
28 1 Paper
5 pages
Energy-Efficient Wideband Antenna Design
No ratings yet
Energy-Efficient Wideband Antenna Design
3 pages
AI-Driven Medical Imaging for Disease Detection
No ratings yet
AI-Driven Medical Imaging for Disease Detection
6 pages
Intelligent Chatter Detection in Milling
No ratings yet
Intelligent Chatter Detection in Milling
24 pages
Inductive Learning in AI Explained
No ratings yet
Inductive Learning in AI Explained
45 pages
Duplicate Question Detection with NLP
No ratings yet
Duplicate Question Detection with NLP
4 pages
Parallel Computing for Machine Learning Efficiency
No ratings yet
Parallel Computing for Machine Learning Efficiency
17 pages
Final Exam: Simulation Models in IE
No ratings yet
Final Exam: Simulation Models in IE
8 pages
AI Micro-Project Proposal for ICT
No ratings yet
AI Micro-Project Proposal for ICT
17 pages
Conceição Etal RS2021 - SAR Oil Spill Detection System Through Random Forest Classifiers
No ratings yet
Conceição Etal RS2021 - SAR Oil Spill Detection System Through Random Forest Classifiers
22 pages
HSI-Drive v2.0: Enhanced Dataset for ADS
No ratings yet
HSI-Drive v2.0: Enhanced Dataset for ADS
8 pages
Android Malware Detection via ML Techniques
No ratings yet
Android Malware Detection via ML Techniques
6 pages
In-The-Wild Deepfake Detection Using CNN
No ratings yet
In-The-Wild Deepfake Detection Using CNN
6 pages
Sentiment Analysis in Nepali Texts
No ratings yet
Sentiment Analysis in Nepali Texts
5 pages
CardioPredict AI: Heart Disease Risk Model
No ratings yet
CardioPredict AI: Heart Disease Risk Model
36 pages
Double Descent in Particle Physics Data
No ratings yet
Double Descent in Particle Physics Data
6 pages
Data-Centric AI Tabular Data Synthesis With Deep Generative Models
No ratings yet
Data-Centric AI Tabular Data Synthesis With Deep Generative Models
287 pages
B.E Ece Batchno 181
No ratings yet
B.E Ece Batchno 181
41 pages
Midterm Exam Solutions for ML Course
No ratings yet
Midterm Exam Solutions for ML Course
11 pages
House Pricing Prediction with Python
No ratings yet
House Pricing Prediction with Python
39 pages
Brain Tumor Classification Techniques
No ratings yet
Brain Tumor Classification Techniques
16 pages
AI for Predicting Depression Crises
No ratings yet
AI for Predicting Depression Crises
11 pages
JPMorgan's $5.3M TalentReady Initiative
No ratings yet
JPMorgan's $5.3M TalentReady Initiative
11 pages
AI Approaches and Rational Agents Explained
No ratings yet
AI Approaches and Rational Agents Explained
21 pages
Iris Flower Species Classification AI
No ratings yet
Iris Flower Species Classification AI
12 pages
Bird Strike Data Analysis Project
No ratings yet
Bird Strike Data Analysis Project
10 pages
MachBank Credit Card Approval Insights
No ratings yet
MachBank Credit Card Approval Insights
21 pages
Applications and Concepts of AI
No ratings yet
Applications and Concepts of AI
18 pages

Week 10

Uploaded by

Week 10

Uploaded by

Ensemble Methods, Decision Tree

and Random forest

Prof. Gajendra P.S. Raghava

Web Site: [Link]

These slides were created with using various resources so

Bootstrap Aggregation • AdaBoost (Adaptive Boosting)

– Random Forests Low Bias ... . . . .

Sunny Overcast Rain

Humidity Each internal node tests an attribute

High Normal Each branch corresponds to an

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

R1: If (Outlook=Sunny)  (Humidity=High) Then PlayTennis=No

 There is a special node called root node.

 Edges are the values of attributes

 External nodes are the outcome of classification

Human Warm hair yes no no yes no Mammal

Gila Monster cold scale no no no yes yes

 Case: Binary attribute

 Binary splitting by grouping attribute values

CS 40003: Data Analytics 28

 Muti-way split: It is same as in the case of nominal attribute

 Binary outcome: A >v or A ≤ v

 Here, q should be decided a priori

 For a numeric attribute, decision tree induction is a combinatorial optimization problem

 Further, for each ordering, we can choose different ways of splitting

 Different instances are shown in the following.

 Approach 1 : <Gender, Height>

CS 40003: Data Analytics 32

More ordered Less ordered

Examples (Gini Index) Examples (Entropy)

 Iterative Dichotomizer 3 for decision trees,

 At each node, the splitting attribute is selected to be the most informative

 L. Breiman, J. Friedman, R. Olshen and C. Stone in 1984 proposed an algorithm to build a

Thus, 𝛼 𝐴, 𝐷 = 𝐸 𝐷 − 𝐸𝐴 𝐷 is maximum in such a situation. 41

 Such a partition appears to be useless for classification.

 This type of problem is called overfitting problem.

 Gain Ratio is a kind of normalization to information gain using a split information.

You might also like