Ensemble Methods: Bagging, Random Forests, Boosting

Uploaded by

Param Andharia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views14 pages

Ensemble Methods: Bagging, Random Forests, Boosting

Uploaded by

Param Andharia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Ensemble Methods

(Bagging, Random Forest, Boosting)

Bagging
• Bagging = Bootstrap Aggregating (A new application of the bootstrap)
• A general-purpose procedure for reducing the variance of a statistical
learning method
• In the Bootstrap, we replicate our dataset by sampling with replacement:
• Original dataset: 𝑥 = 𝑐(𝑥1 , 𝑥2 , . . . , 𝑥100 )
• Bootstrap samples:
𝑏𝑜𝑜𝑡1 = 𝑠𝑎𝑚𝑝𝑙𝑒(𝑥, 100, 𝑟𝑒𝑝𝑙𝑎𝑐𝑒 = 𝑇𝑟𝑢𝑒), … ,
𝑏𝑜𝑜𝑡𝐵 = 𝑠𝑎𝑚𝑝𝑙𝑒(𝑥, 100, 𝑟𝑒𝑝𝑙𝑎𝑐𝑒 = 𝑇𝑟𝑢𝑒).
• Build separate models (𝑓መ ∗1 𝑥 , 𝑓መ ∗2 𝑥 , … , 𝑓መ ∗𝐵 𝑥 ) using 𝐵 training sets
(𝑏𝑜𝑜𝑡1, … , 𝑏𝑜𝑜𝑡𝐵)
• Average them to obtain a single low-variance statistical learning model
𝐵
1
𝑓መ𝑏𝑎𝑔 𝑥 = ෍ 𝑓መ ∗𝑏 𝑥
𝐵
𝑏=1
Bagging for Trees
• For Regression Tree:
• Construct B regression trees using B bootstrapped training sets, and average
the resulting predictions
• Trees are grown deep, and are not pruned
• Each individual tree has high variance, but low bias ‘
• Averaging these B trees reduces the variance

• For Classification Tree:

• Record the class predicted by each of the B trees, and take a majority vote
• Average the class label proportions from each replicate tree and predict the
label with the highest average proportion
Test Error of a Bagged Model
• Cross-validation is one such method (might be computationally expensive)
• Key to bagging is that trees are repeatedly fit to bootstrapped subsets of the
observations
• Approximately, each bagged tree makes use of around two-thirds of the observations
• The remaining one-third of the observations are referred to as the out-of-bag (OOB)
observations.
• Predict the response for the ith observation using each of the trees in which that
observation was OOB
• B/3 predictions for the ith observation
• To obtain a single prediction for the ith observation, we can average these predicted
responses (if regression is the goal) or can take a majority vote (if classification is the
goal)
• An OOB prediction can be obtained in this way for each of the n observations
• Use OOB prediction to measure the OOB MSE or classification error
• The resulting OOB error is a valid estimate of the test error for the bagged model
Random Forests
• Bagging has a weakness: The trees produced by different Bootstrap
samples can be very similar.
• Random Forests:
• We fit a decision tree to different Bootstrap samples
• When growing the tree, we select a random sample of m < p predictors to
consider at each branch of the tree.
• This will lead to less similar trees with less correlated predictions.
• Finally, average the prediction of each tree
• A fresh sample of m predictors is taken at each split
• The number of predictors considered at each split is approximately equal to
the square root of the total number of predictors
Test error w.r.t number of trees

Bagging and random forest results for the Heart data. The test Results from random forests for the 15-class gene expression data
error (black and orange) is shown as a function of B, the set with p = 500 predictors. A single classification tree has an error
number of bootstrapped training sets used. Random forests rate of 45.7 %.
were applied with m = √p. The dashed line indicates the test
error resulting from a single classification tree. The green and
blue traces show the OOB error, which in this case is
considerably lower.
Boosting
• An approach for improving the predictions resulting from a decision tree
• Works quite similar to bagging except that the trees are grown sequentially
• Each tree is grown using information from previously grown trees
• Boosting does not involve bootstrap sampling; instead each tree is fit on a
modified version of the original data set.
• Given the current model, we fit a decision tree to the residuals from the
model
• We then add this new decision tree into the fitted function in order
to update the residuals
Boosting for Regression Trees
Tuning Parameters
• The number of trees B. Unlike bagging and random forests, boosting can
overfit if B is too large, although this overfitting tends to occur
slowly if at all. We use cross-validation to select B.
• The shrinkage parameter λ, a small positive number. This controls the
rate at which boosting learns. Typical values are 0.01 or 0.001, and
the right choice can depend on the problem. Very small λ can require
using a very large value of B in order to achieve good performance.
• The number d (interaction depth) of splits in each tree, which controls the
complexity of the boosted ensemble. Often d = 1 works well, in which case
each tree is a stump, consisting of a single split. In this case, the boosted
stump ensemble is fitting an additive model, since each term involves only
a single variable.
Results of Boosting
Results from performing boosting and random
forests on the 15-class gene expression data set in
order to predict cancer versus normal. The test
error is displayed as a function of the number of
trees. For the two boosted models, λ = 0.01. Depth-
1 trees slightly outperform depth-2 trees, and both
outperform the random forest, although the
standard errors are around 0.02, making none of
these differences significant. The test error rate for
a single tree is 24 %.
The Quest for Optimal
Margin Classifier

Bagging, Random Forests, and Boosting
No ratings yet
Bagging, Random Forests, and Boosting
53 pages
Tree-Based Ensemble Methods Explained
No ratings yet
Tree-Based Ensemble Methods Explained
13 pages
Bagging, Random Forests, and Boosting
No ratings yet
Bagging, Random Forests, and Boosting
48 pages
Repeated Cross Validation in Bagging
No ratings yet
Repeated Cross Validation in Bagging
68 pages
Ensemble
No ratings yet
Ensemble
67 pages
Boosted and Random Forest Methods
No ratings yet
Boosted and Random Forest Methods
34 pages
Ensemble Methods for Decision Trees
No ratings yet
Ensemble Methods for Decision Trees
6 pages
Understanding Bagging and Boosting
No ratings yet
Understanding Bagging and Boosting
32 pages
Decision Trees and Ensemble Learning Techniques
No ratings yet
Decision Trees and Ensemble Learning Techniques
104 pages
Decision Trees and Ensemble Methods
No ratings yet
Decision Trees and Ensemble Methods
2 pages
Tree-Based Methods in Bioinformatics
No ratings yet
Tree-Based Methods in Bioinformatics
32 pages
Understanding Random Forests in ML
No ratings yet
Understanding Random Forests in ML
25 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
32 pages
Ensembles 3
No ratings yet
Ensembles 3
104 pages
Ensemble Learning: Bagging vs. Boosting
No ratings yet
Ensemble Learning: Bagging vs. Boosting
7 pages
Random Forest and Ensemble Learning Guide
No ratings yet
Random Forest and Ensemble Learning Guide
68 pages
Evaluation, Advantages and Limitations
No ratings yet
Evaluation, Advantages and Limitations
18 pages
Understanding Ensemble Learning Methods
No ratings yet
Understanding Ensemble Learning Methods
32 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
7 pages
Ensemble Methods and Random Forest
No ratings yet
Ensemble Methods and Random Forest
18 pages
Tree-Based Methods in Machine Learning
No ratings yet
Tree-Based Methods in Machine Learning
27 pages
Understanding Bagging in Machine Learning
No ratings yet
Understanding Bagging in Machine Learning
6 pages
Random Forest: Boosting vs. Bagging
100% (1)
Random Forest: Boosting vs. Bagging
14 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
10 pages
Ensemble Methods: Boosted Trees & Random Forests
No ratings yet
Ensemble Methods: Boosted Trees & Random Forests
53 pages
Understanding Ensemble Methods in ML
No ratings yet
Understanding Ensemble Methods in ML
19 pages
Ensemble Learning - Bagging Boosting (Random Forest, AdaBoost, Gradient Boosting)
No ratings yet
Ensemble Learning - Bagging Boosting (Random Forest, AdaBoost, Gradient Boosting)
118 pages
Module 4
No ratings yet
Module 4
67 pages
Bagging and Random Forest Overview
No ratings yet
Bagging and Random Forest Overview
39 pages
Regression Trees and Random Forests Explained
No ratings yet
Regression Trees and Random Forests Explained
34 pages
Bagging and Boosting in Machine Learning
No ratings yet
Bagging and Boosting in Machine Learning
41 pages
Ensemble Methods: Bagging, Random Forest, Boosting
No ratings yet
Ensemble Methods: Bagging, Random Forest, Boosting
6 pages
Characteristics of Predictive Models
No ratings yet
Characteristics of Predictive Models
25 pages
Random Forests: Overview & Techniques
No ratings yet
Random Forests: Overview & Techniques
5 pages
Bagging Random Forest-V1
No ratings yet
Bagging Random Forest-V1
20 pages
Advanced Econometrics: Bagging & Boosting
No ratings yet
Advanced Econometrics: Bagging & Boosting
12 pages
Lecture 7 - Ensemble Learning - Part2
No ratings yet
Lecture 7 - Ensemble Learning - Part2
25 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
41 pages
Bagging vs Boosting in Machine Learning
No ratings yet
Bagging vs Boosting in Machine Learning
25 pages
Ensemble Learning Methods Explained
100% (1)
Ensemble Learning Methods Explained
24 pages
CS229 Ensembling Methods Overview
No ratings yet
CS229 Ensembling Methods Overview
7 pages
C2.2 Ensemble - Methods Slides4Beamer
No ratings yet
C2.2 Ensemble - Methods Slides4Beamer
71 pages
Bagging and Boosting Techniques in ML
No ratings yet
Bagging and Boosting Techniques in ML
15 pages
05b-Decision Trees-Linear Regression-Decision Trees
No ratings yet
05b-Decision Trees-Linear Regression-Decision Trees
34 pages
Analytical Methods and Ensemble Techniques
No ratings yet
Analytical Methods and Ensemble Techniques
18 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
77 pages
Ensemble Methods: Bagging & Boosting
No ratings yet
Ensemble Methods: Bagging & Boosting
66 pages
Ensemble Learning Techniques Overview
No ratings yet
Ensemble Learning Techniques Overview
78 pages
Week 7
No ratings yet
Week 7
28 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
39 pages
Tree
No ratings yet
Tree
9 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
18 pages
Understanding Ensemble Learning Techniques
No ratings yet
Understanding Ensemble Learning Techniques
13 pages
Ensemble Learning: Bagging & Boosting Techniques
No ratings yet
Ensemble Learning: Bagging & Boosting Techniques
20 pages
UNIT-IV Ensemble Methods
No ratings yet
UNIT-IV Ensemble Methods
21 pages
Bagging and Boosting Explained
No ratings yet
Bagging and Boosting Explained
3 pages
Bagging, Boosting, and Random Forests
No ratings yet
Bagging, Boosting, and Random Forests
27 pages
Understanding Transformer Architecture
No ratings yet
Understanding Transformer Architecture
4 pages
Machine Learning in Physics: 2018 Insights
No ratings yet
Machine Learning in Physics: 2018 Insights
94 pages
Neural Networks for Big Data Explained
No ratings yet
Neural Networks for Big Data Explained
8 pages
Deep Learning REcord
No ratings yet
Deep Learning REcord
23 pages
Bird Species Identification with CNN
No ratings yet
Bird Species Identification with CNN
7 pages
Survey of Vision Transformers
No ratings yet
Survey of Vision Transformers
23 pages
Introduction to Deep Learning Concepts
No ratings yet
Introduction to Deep Learning Concepts
14 pages
Implementing Backpropagation in ANN
No ratings yet
Implementing Backpropagation in ANN
4 pages
Data Mining and Data Warehousing Lab Report Kshitij Tripathi 23BAI10288
No ratings yet
Data Mining and Data Warehousing Lab Report Kshitij Tripathi 23BAI10288
8 pages
CNN Backpropagation Overview
No ratings yet
CNN Backpropagation Overview
18 pages
K-Means Clustering Explained
No ratings yet
K-Means Clustering Explained
98 pages
Seismic Facies Classification Using Supervised Convolutional Neural Networks and Semisupervised Generative Adversarial Networks
No ratings yet
Seismic Facies Classification Using Supervised Convolutional Neural Networks and Semisupervised Generative Adversarial Networks
12 pages
3-Month AI/ML Learning Plan
No ratings yet
3-Month AI/ML Learning Plan
6 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
23 pages
Denoising Autoencoders Explained
No ratings yet
Denoising Autoencoders Explained
2 pages
CNN Architectures for Leukemia Detection
No ratings yet
CNN Architectures for Leukemia Detection
2 pages
Ebookmeta - Com/?p 27477: Download PDF
100% (6)
Ebookmeta - Com/?p 27477: Download PDF
88 pages
Abc Machine Learning
No ratings yet
Abc Machine Learning
67 pages
Understanding the Perceptron Model
No ratings yet
Understanding the Perceptron Model
7 pages
Modified LM Algorithm for Neural Networks
No ratings yet
Modified LM Algorithm for Neural Networks
3 pages
CNN-Based Anomaly Detection System
No ratings yet
CNN-Based Anomaly Detection System
15 pages
Aircraft Recognition Using CNNs
No ratings yet
Aircraft Recognition Using CNNs
9 pages
Example of Association Rules in Data Mining
No ratings yet
Example of Association Rules in Data Mining
30 pages
Practical No1
No ratings yet
Practical No1
4 pages
RNNs and LSTMs in Deep Learning
No ratings yet
RNNs and LSTMs in Deep Learning
62 pages
Back-Propagation 1
No ratings yet
Back-Propagation 1
34 pages
KOM6110 Machine Learning Assignment 1
No ratings yet
KOM6110 Machine Learning Assignment 1
1 page
Convolution Types and Activation Functions
No ratings yet
Convolution Types and Activation Functions
4 pages
Machine Learning Exam Questions 2019
No ratings yet
Machine Learning Exam Questions 2019
2 pages
Overview of Modern CNN Architectures
No ratings yet
Overview of Modern CNN Architectures
32 pages

Ensemble Methods: Bagging, Random Forests, Boosting

Uploaded by

Ensemble Methods: Bagging, Random Forests, Boosting

Uploaded by

Ensemble Methods

(Bagging, Random Forest, Boosting)

• For Classification Tree:

You might also like