0% found this document useful (0 votes)
4 views14 pages

Ensemble Methods: Bagging, Random Forests, Boosting

Uploaded by

Param Andharia
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views14 pages

Ensemble Methods: Bagging, Random Forests, Boosting

Uploaded by

Param Andharia
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Ensemble Methods

(Bagging, Random Forest, Boosting)


Bagging
• Bagging = Bootstrap Aggregating (A new application of the bootstrap)
• A general-purpose procedure for reducing the variance of a statistical
learning method
• In the Bootstrap, we replicate our dataset by sampling with replacement:
• Original dataset: 𝑥 = 𝑐(𝑥1 , 𝑥2 , . . . , 𝑥100 )
• Bootstrap samples:
𝑏𝑜𝑜𝑡1 = 𝑠𝑎𝑚𝑝𝑙𝑒(𝑥, 100, 𝑟𝑒𝑝𝑙𝑎𝑐𝑒 = 𝑇𝑟𝑢𝑒), … ,
𝑏𝑜𝑜𝑡𝐵 = 𝑠𝑎𝑚𝑝𝑙𝑒(𝑥, 100, 𝑟𝑒𝑝𝑙𝑎𝑐𝑒 = 𝑇𝑟𝑢𝑒).
• Build separate models (𝑓መ ∗1 𝑥 , 𝑓መ ∗2 𝑥 , … , 𝑓መ ∗𝐵 𝑥 ) using 𝐵 training sets
(𝑏𝑜𝑜𝑡1, … , 𝑏𝑜𝑜𝑡𝐵)
• Average them to obtain a single low-variance statistical learning model
𝐵
1
𝑓መ𝑏𝑎𝑔 𝑥 = ෍ 𝑓መ ∗𝑏 𝑥
𝐵
𝑏=1
Bagging for Trees
• For Regression Tree:
• Construct B regression trees using B bootstrapped training sets, and average
the resulting predictions
• Trees are grown deep, and are not pruned
• Each individual tree has high variance, but low bias ‘
• Averaging these B trees reduces the variance

• For Classification Tree:


• Record the class predicted by each of the B trees, and take a majority vote
• Average the class label proportions from each replicate tree and predict the
label with the highest average proportion
Test Error of a Bagged Model
• Cross-validation is one such method (might be computationally expensive)
• Key to bagging is that trees are repeatedly fit to bootstrapped subsets of the
observations
• Approximately, each bagged tree makes use of around two-thirds of the observations
• The remaining one-third of the observations are referred to as the out-of-bag (OOB)
observations.
• Predict the response for the ith observation using each of the trees in which that
observation was OOB
• B/3 predictions for the ith observation
• To obtain a single prediction for the ith observation, we can average these predicted
responses (if regression is the goal) or can take a majority vote (if classification is the
goal)
• An OOB prediction can be obtained in this way for each of the n observations
• Use OOB prediction to measure the OOB MSE or classification error
• The resulting OOB error is a valid estimate of the test error for the bagged model
Random Forests
• Bagging has a weakness: The trees produced by different Bootstrap
samples can be very similar.
• Random Forests:
• We fit a decision tree to different Bootstrap samples
• When growing the tree, we select a random sample of m < p predictors to
consider at each branch of the tree.
• This will lead to less similar trees with less correlated predictions.
• Finally, average the prediction of each tree
• A fresh sample of m predictors is taken at each split
• The number of predictors considered at each split is approximately equal to
the square root of the total number of predictors
Test error w.r.t number of trees

Bagging and random forest results for the Heart data. The test Results from random forests for the 15-class gene expression data
error (black and orange) is shown as a function of B, the set with p = 500 predictors. A single classification tree has an error
number of bootstrapped training sets used. Random forests rate of 45.7 %.
were applied with m = √p. The dashed line indicates the test
error resulting from a single classification tree. The green and
blue traces show the OOB error, which in this case is
considerably lower.
Boosting
• An approach for improving the predictions resulting from a decision tree
• Works quite similar to bagging except that the trees are grown sequentially
• Each tree is grown using information from previously grown trees
• Boosting does not involve bootstrap sampling; instead each tree is fit on a
modified version of the original data set.
• Given the current model, we fit a decision tree to the residuals from the
model
• We then add this new decision tree into the fitted function in order
to update the residuals
Boosting for Regression Trees
Tuning Parameters
• The number of trees B. Unlike bagging and random forests, boosting can
overfit if B is too large, although this overfitting tends to occur
slowly if at all. We use cross-validation to select B.
• The shrinkage parameter λ, a small positive number. This controls the
rate at which boosting learns. Typical values are 0.01 or 0.001, and
the right choice can depend on the problem. Very small λ can require
using a very large value of B in order to achieve good performance.
• The number d (interaction depth) of splits in each tree, which controls the
complexity of the boosted ensemble. Often d = 1 works well, in which case
each tree is a stump, consisting of a single split. In this case, the boosted
stump ensemble is fitting an additive model, since each term involves only
a single variable.
Results of Boosting
Results from performing boosting and random
forests on the 15-class gene expression data set in
order to predict cancer versus normal. The test
error is displayed as a function of the number of
trees. For the two boosted models, λ = 0.01. Depth-
1 trees slightly outperform depth-2 trees, and both
outperform the random forest, although the
standard errors are around 0.02, making none of
these differences significant. The test error rate for
a single tree is 24 %.
The Quest for Optimal
Margin Classifier

You might also like