Ensemble Methods and Random Forest

Uploaded by

Ashu Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views18 pages

Ensemble Methods and Random Forest

Uploaded by

Ashu Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Ensemble Methods and Random

Forest
Introduction
• Random Forest is a widely-used machine learning
algorithm developed by Leo Breiman and Adele
Cutler, which combines the output of multiple
decision trees to reach a single result.
• It handles both classification and regression
problems.
• A Random Forest is like a group decision-making
team in machine learning. It combines the
opinions of many “trees” (individual models) to
make better predictions, creating a more robust
and accurate overall model.
Ensemble Learning
• Ensemble simply means combining multiple
models.
• Thus a collection of models is used to make
predictions rather than an individual model.
Types of Ensemble
• Ensemble uses two types of methods:
– Bagging
– Boosting
• Bagging - It creates a different training subset
from sample training data with replacement &
the final output is based on majority voting. For
example, Random Forest.
• Boosting-It combines weak learners into strong
learners by creating sequential models such that
the final model has the highest accuracy. For
example, ADA BOOST, XG BOOST.
Bagging
• Bagging, also known as Bootstrap Aggregation, serves as the ensemble technique
in the Random Forest algorithm. Here are the steps involved in Bagging:
• Selection of Subset: Bagging starts by choosing a random sample, or subset, from
the entire dataset.
• Bootstrap Sampling: Each model is then created from these samples, called
Bootstrap Samples, which are taken from the original data with replacement. This
process is known as row sampling.
• Bootstrapping: The step of row sampling with replacement is referred to as
bootstrapping.
• Independent Model Training: Each model is trained independently on its
corresponding Bootstrap Sample. This training process generates results for each
model.
• Majority Voting: The final output is determined by combining the results of all
models through majority voting. The most commonly predicted outcome among
the models is selected.
• Aggregation: This step, which involves combining all the results and generating the
final output based on majority voting, is known as aggregation.
Bagging Example
• Now let’s look at an example by breaking it down with
the help of the following figure. Here the bootstrap
sample is taken from actual data (Bootstrap sample 01,
Bootstrap sample 02, and Bootstrap sample 03) with a
replacement which means there is a high possibility
that each sample won’t contain unique data. The
model (Model 01, Model 02, and Model 03) obtained
from this bootstrap sample is trained independently.
Each model generates results as shown. Now the
Happy emoji has a majority when compared to the Sad
emoji. Thus based on majority voting final output is
obtained as Happy emoji.
Bagging
Boosting

• Boosting is one of the techniques that use the

concept of ensemble learning.
• A boosting algorithm combines multiple
simple models (also known as weak learners
or base estimators) to generate the final
output.
• It is done by building a model by using weak
models in series.
Adaboost
• There are several boosting algorithms;
AdaBoost was the first really successful
boosting algorithm that was developed for the
purpose of binary classification. AdaBoost is
an abbreviation for Adaptive Boosting and is a
prevalent boosting technique that combines
multiple “weak classifiers” into a single
“strong classifier.”
Other Boosting Algorithms
• GBM
• LightGBM
• XGBoost
• CatBoost
Working of Random Forest Algorithm

• Step 1: In the Random forest model, a subset of data

points and a subset of features is selected for
constructing each decision tree. Simply put, n random
records and m features are taken from the data set
having k number of records.
• Step 2: Individual decision trees are constructed for
each sample.
• Step 3: Each decision tree will generate an output.
• Step 4: Final output is considered based on Majority
Voting or Averaging for Classification and regression,
respectively.
Random Forest Example
• Consider the fruit basket as the data as shown in
the figure below. Now n number of samples are
taken from the fruit basket, and an individual
decision tree is constructed for each sample. Each
decision tree will generate an output, as shown in
the figure. The final output is considered based
on majority voting. In the below figure, you can
see that the majority decision tree gives output
as an apple when compared to a banana, so the
final output is taken as an apple.
Important Features of Random Forest

• Diversity: Not all attributes/variables/features are considered while

making an individual tree; each tree is different.
• Immune to the curse of dimensionality: Since each tree does not
consider all the features, the feature space is reduced.
• Parallelization: Each tree is created independently out of different
data and attributes. This means we can fully use the CPU to build
random forests.
• Train-Test split: In a random forest, we don’t have to segregate the
data for train and test as there will always be 30% of the data which
is not seen by the decision tree.
• Stability: Stability arises because the result is based on majority
voting/ averaging.
Difference Between Decision Tree and
Random Forest

Decision trees Random Forest

1. Random forests are created from
1. Decision trees normally suffer from the subsets of data, and the final output is
problem of overfitting if it’s allowed to based on average or majority ranking;
grow without any control. hence the problem of overfitting is taken
care of.
2. A single decision tree is faster in
2. It is comparatively slower.
computation.
3. When a data set with features is taken
3. Random forest randomly selects
as input by a decision tree, it will
observations, builds a decision tree, and
formulate some rules to make
takes the average result.
predictions.
Important Hyperparameters in
Random Forest
• Hyperparameters are used in random forests
to either enhance the performance and
predictive power of models or to make the
model faster.
Hyperparameters to Increase the
Predictive Power
• n_estimators: Number of trees the algorithm
builds before averaging the predictions.
• max_features: Maximum number of features
random forest considers splitting a node.
• mini_sample_leaf: Determines the minimum
number of leaves required to split an internal
node.
• criterion: How to split the node in each tree?
(Entropy/Gini impurity/Log Loss)
• max_leaf_nodes: Maximum leaf nodes in each
tree
Hyperparameters to Increase the
Speed
• n_jobs: it tells the engine how many processors it is
allowed to use. If the value is 1, it can use only one
processor, but if the value is -1, there is no limit.
• random_state: controls randomness of the sample.
The model will always produce the same results if it
has a definite value of random state and has been
given the same hyperparameters and training data.
• oob_score: OOB means out of the bag. It is a random
forest cross-validation method. In this, one-third of the
sample is not used to train the data; instead used to
evaluate its performance. These samples are called
out-of-bag samples.

Understanding Random Forest Algorithms
No ratings yet
Understanding Random Forest Algorithms
13 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
24 pages
Ensemble Learning: Bagging vs. Boosting
No ratings yet
Ensemble Learning: Bagging vs. Boosting
7 pages
Random Forest and Ensemble Learning Guide
No ratings yet
Random Forest and Ensemble Learning Guide
68 pages
Overview of Random Forest in ML
No ratings yet
Overview of Random Forest in ML
25 pages
Lec 5 CSE 471 Random Forest
No ratings yet
Lec 5 CSE 471 Random Forest
9 pages
Lec 5 CSE 471 Random Forest
No ratings yet
Lec 5 CSE 471 Random Forest
9 pages
Understanding Decision Trees and Random Forests
No ratings yet
Understanding Decision Trees and Random Forests
5 pages
Random Forest Algorithm in Machine Learning
No ratings yet
Random Forest Algorithm in Machine Learning
18 pages
Machine Learning Ensemble Techniques Explained
No ratings yet
Machine Learning Ensemble Techniques Explained
22 pages
U21ad401-Machine Learning Essentials (Unit III To V)
No ratings yet
U21ad401-Machine Learning Essentials (Unit III To V)
39 pages
Random Forest Algorithm Explained
No ratings yet
Random Forest Algorithm Explained
18 pages
Understanding Stacking and Random Forests
No ratings yet
Understanding Stacking and Random Forests
21 pages
Understanding Bagging and Boosting
No ratings yet
Understanding Bagging and Boosting
32 pages
Ensemble Learning: Random Forest Guide
No ratings yet
Ensemble Learning: Random Forest Guide
12 pages
Understanding Random Forests in ML
No ratings yet
Understanding Random Forests in ML
25 pages
Week 7
No ratings yet
Week 7
28 pages
Ensemble Learning and Random Forests Explained
No ratings yet
Ensemble Learning and Random Forests Explained
19 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
14 pages
Overview of Random Forest Algorithm
No ratings yet
Overview of Random Forest Algorithm
10 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
32 pages
Ensemble Machine Learning Techniques
No ratings yet
Ensemble Machine Learning Techniques
27 pages
Unit - 3
No ratings yet
Unit - 3
25 pages
Ensemble Learning - Reinforcement Learning (Mod-5)
No ratings yet
Ensemble Learning - Reinforcement Learning (Mod-5)
67 pages
Random Forest Miss Mice Forest 2024 Anjali For Students
No ratings yet
Random Forest Miss Mice Forest 2024 Anjali For Students
47 pages
Random Forest - Bagging
No ratings yet
Random Forest - Bagging
13 pages
Understanding Classification and Random Forest
No ratings yet
Understanding Classification and Random Forest
10 pages
Random Forest (Shivani) 27161
No ratings yet
Random Forest (Shivani) 27161
6 pages
Chap 7 ML
No ratings yet
Chap 7 ML
13 pages
Ensemble Learning and Random Forests Guide
No ratings yet
Ensemble Learning and Random Forests Guide
32 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
14 pages
Bagging and Random Forest Overview
No ratings yet
Bagging and Random Forest Overview
39 pages
Overview of Ensemble Learning Methods
No ratings yet
Overview of Ensemble Learning Methods
49 pages
Bagging vs Boosting in Ensemble Learning
No ratings yet
Bagging vs Boosting in Ensemble Learning
40 pages
Errors in Machine Learning Models
No ratings yet
Errors in Machine Learning Models
33 pages
Understanding Ensemble Learning Techniques
No ratings yet
Understanding Ensemble Learning Techniques
6 pages
Ensemble Learning and Random Forests Guide
No ratings yet
Ensemble Learning and Random Forests Guide
54 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
6 pages
Bagging, Boosting, and Random Forests
No ratings yet
Bagging, Boosting, and Random Forests
27 pages
Random Forests: Overview & Techniques
No ratings yet
Random Forests: Overview & Techniques
5 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
26 pages
Ensemble Learning Techniques Explained
No ratings yet
Ensemble Learning Techniques Explained
7 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
85 pages
Ensemble Learning and Random Forests Guide
No ratings yet
Ensemble Learning and Random Forests Guide
15 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
5 pages
Ensemble Learning Techniques Overview
No ratings yet
Ensemble Learning Techniques Overview
27 pages
Random Forest Algorithm Overview
No ratings yet
Random Forest Algorithm Overview
6 pages
Understanding Random Forests Explained
No ratings yet
Understanding Random Forests Explained
34 pages
Understanding Random Forests in ML
No ratings yet
Understanding Random Forests in ML
10 pages
Lecture 7 - Ensemble Learning - Part2
No ratings yet
Lecture 7 - Ensemble Learning - Part2
25 pages
Understanding Bagging in Machine Learning
No ratings yet
Understanding Bagging in Machine Learning
6 pages
Overview of Random Forest Algorithm
No ratings yet
Overview of Random Forest Algorithm
9 pages
Bagging and Random Forest Explained
100% (4)
Bagging and Random Forest Explained
23 pages
Ensemble Learning: Random Forests Explained
No ratings yet
Ensemble Learning: Random Forests Explained
11 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
7 pages
Understanding Random Forest in ML
No ratings yet
Understanding Random Forest in ML
4 pages
Bagging vs Pasting in Ensemble Learning
No ratings yet
Bagging vs Pasting in Ensemble Learning
63 pages
Principles of Multivariate Analysis
No ratings yet
Principles of Multivariate Analysis
6 pages
Google Cloud Machine Learning Guide
No ratings yet
Google Cloud Machine Learning Guide
59 pages
ML - Unit-6 KMeans
No ratings yet
ML - Unit-6 KMeans
20 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
59 pages
ECL Provisioning: IASB vs. FASB Standards
No ratings yet
ECL Provisioning: IASB vs. FASB Standards
13 pages
Data Science for Predictive Modeling
No ratings yet
Data Science for Predictive Modeling
4 pages
Machine Learning for Student Emotion Recognition
No ratings yet
Machine Learning for Student Emotion Recognition
14 pages
Overview of Convolutional Neural Networks
No ratings yet
Overview of Convolutional Neural Networks
11 pages
Salary Prediction & Employee Satisfaction App
No ratings yet
Salary Prediction & Employee Satisfaction App
72 pages
Logistic vs Linear Regression Explained
No ratings yet
Logistic vs Linear Regression Explained
29 pages
Practical Machine Learning With PyTorch
No ratings yet
Practical Machine Learning With PyTorch
6 pages
Benchmarking Anomaly Detection in Industry
No ratings yet
Benchmarking Anomaly Detection in Industry
6 pages
Enhanced Network Anomaly Detection Model
No ratings yet
Enhanced Network Anomaly Detection Model
11 pages
Survey Classification Methods Explained
No ratings yet
Survey Classification Methods Explained
4 pages
Bitcoin Price Prediction with ML Techniques
No ratings yet
Bitcoin Price Prediction with ML Techniques
60 pages
Data Presentation Methods Explained
No ratings yet
Data Presentation Methods Explained
41 pages
Diabetes Prediction Using Machine Learning
No ratings yet
Diabetes Prediction Using Machine Learning
16 pages
Drug Classification with Machine Learning
No ratings yet
Drug Classification with Machine Learning
7 pages
Neural Networks: Structure & Learning
No ratings yet
Neural Networks: Structure & Learning
25 pages
Logistic Regression Derivation and MLE
No ratings yet
Logistic Regression Derivation and MLE
4 pages
M.Tech in Data Science & Engineering
0% (1)
M.Tech in Data Science & Engineering
18 pages
Filipino Household Income Analysis Using ML
No ratings yet
Filipino Household Income Analysis Using ML
14 pages
Plant Leaf Recognition via Deep Learning
No ratings yet
Plant Leaf Recognition via Deep Learning
14 pages
Forest Fire Prediction Project Report
No ratings yet
Forest Fire Prediction Project Report
73 pages
Healthcare Chatbot Using Naïve Bayes
No ratings yet
Healthcare Chatbot Using Naïve Bayes
6 pages
Data Warehousing and Mining Course Guide
No ratings yet
Data Warehousing and Mining Course Guide
3 pages
Bidirectional RNNs in Deep Learning
No ratings yet
Bidirectional RNNs in Deep Learning
6 pages
Customer Churn Prediction Analysis
No ratings yet
Customer Churn Prediction Analysis
7 pages
Counterfactuals and Sensitivity Analysis in ML
No ratings yet
Counterfactuals and Sensitivity Analysis in ML
21 pages
Cross-Validation in Polynomial Fitting
No ratings yet
Cross-Validation in Polynomial Fitting
3 pages

Ensemble Methods and Random Forest

Uploaded by

Ensemble Methods and Random Forest

Uploaded by

Ensemble Methods and Random

• Boosting is one of the techniques that use the

• Step 1: In the Random forest model, a subset of data

• Diversity: Not all attributes/variables/features are considered while

Decision trees Random Forest

You might also like