UNIT 2
Naive Bayes Classifiers
What is Naive Bayes Classifiers?
Naïve Bayes algorithm is used for classification problems. It is highly used in
text classification. In text classification tasks, data contains high dimension (as
each word represent one feature in the data). It is used in spam filtering,
sentiment detection, rating classification etc. The advantage of using naïve
Bayes is its speed. It is fast and making prediction is easy with high dimension
of data.
This model predicts the probability of an instance belongs to a class with a
given set of feature value. It is a probabilistic classifier. It is because it assumes
that one feature in the model is independent of existence of another feature. In
other words, each feature contributes to the predictions with no relation between
each other. In real world, this condition satisfies rarely. It uses Bayes theorem in
the algorithm for training and prediction
Why it is Called Naive Bayes?
Bayes’ Theorem
Naive Bayes Theorem
Based on prior knowledge of conditions that may be related to an event,
Bayes theorem describes the probability of the event
conditional probability can be found this way
Assume we have a Hypothesis(H) and evidence(E),
According to Bayes theorem, the relationship between the probability of
the Hypothesis before getting the evidence represented as P(H) and the
probability of the hypothesis after getting the evidence represented
as P(H|E) is:
P(H|E) = P(E|H)*P(H)/P(E)
Prior probability = P(H) is the probability before getting the evidence
Posterior probability = P(H|E) is the probability after getting evidence
In general,
P(class|data) = (P(data|class) * P(class)) / P(data)
Naive Bayes Theorem Example
Assume we have to find the probability of the randomly picked card to be king
given that it is a face card.
There are 4 Kings in a Deck of Cards which implies that
P(King) = 4/52
as all the Kings are face Cards so
P(Face|King) = 1
there are 3 Face Cards in a Suit of 13 cards and there are 4 Suits in total so
P(Face) = 12/52
Therefore,
P(King|face) = P(face|king)*P(king)/P(face) = 1/3
Here are some details about Bayes classifiers:
Naive Bayes Classifier: A Simple Yet Powerful Algorithm
Naive Bayes is a probabilistic machine learning algorithm that's particularly
effective for classification tasks. It's based on Bayes' theorem with a strong
independence assumption between features. Despite this "naive" assumption,
Naive Bayes often performs surprisingly well in practice, especially with text
data.
How Naive Bayes Works
1. Bayes' Theorem:
o The core of Naive Bayes is Bayes' theorem, which relates
conditional probabilities:
P(A|B) = P(B|A) * P(A) / P(B)
o In the context of classification, this translates to:
o P(Class|Features) = P(Features|Class) * P(Class) / P(Features)
2. Naive Assumption:
o Naive Bayes assumes that all features are independent of each
other given the class. This simplifies the calculation of
probabilities:
o P(Features|Class) = P(Feature1|Class) * P(Feature2|Class) * ... *
P(FeatureN|Class)
3. Classification:
o To classify a new instance, Naive Bayes calculates the probability
of the instance belonging to each class.
o The class with the highest probability is assigned to the instance.
Types of Naive Bayes Classifiers
Gaussian Naive Bayes: Assumes that features are continuous and
normally distributed.
Multinomial Naive Bayes: Suitable for discrete features, often used for
text classification.
Bernoulli Naive Bayes: Similar to Multinomial but treats features as
binary (present or absent).
Advantages of Naive Bayes
Simplicity: Easy to understand and implement.
Efficiency: Fast training and prediction times.
Handles high-dimensional data: Works well with many features.
Effective for text classification: Often achieves high accuracy in tasks
like spam filtering and sentiment analysis.
Disadvantages of Naive Bayes
Naive assumption: The independence assumption may not always hold in
real-world data.
Zero-frequency problem: If a feature value doesn't appear in the training
data, its probability will be zero, affecting the overall probability.
Smoothing techniques like Laplace smoothing can help mitigate this
issue.
Applications of Naive Bayes
Text classification: Spam filtering, sentiment analysis, topic modeling
Image classification: Facial recognition, object detection
Medical diagnosis: Disease prediction
Recommendation systems: Product recommendations
Bayes Theorem Formula
The formula for the Bayes theorem can be written in a variety of ways. The
following is the most common version:
P(A ∣ B) = P(B ∣ A)P(A) / P(B)
P(A ∣ B) is the conditional probability of event A occurring, given that B is true.
P(B ∣ A) is the conditional probability of event B occurring, given that A is true.
P(A) and P(B) are the probabilities of A and B occurring independently of one
another.
Bayes Theorem Formula Solved Examples
Example 1. A certain disease affects 2% of the population. A diagnostic test for
the disease has an accuracy rate of 95% (i.e., the probability of testing positive
given that the disease is present is 0.95), and the false positive rate is 3% (i.e.,
the probability of testing positive given that the disease is absent is 0.03). If a
randomly selected person tests positive, what is the probability that they
actually have the disease?
Example 2. A box contains 3 fair coins and 1 biased coin with two heads. A
coin is randomly selected and tossed, and it shows heads. What is the
probability that the chosen coin is biased?
Example 3. A certain drug test correctly identifies drug users 98% of the time
and gives false negatives for 3% of non-drug users. If 1% of the population are
drug users, what is the probability that a person who tests positive is actually a
drug user?
Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional
training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which
can be described as:
o Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features. Such as
if the fruit is identified on the bases of color, shape, and taste, then red,
spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending
on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used
to determine the probability of a hypothesis with prior knowledge. It
depends on the conditional probability.
o The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed
event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the
evidence.
P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier:
Working of Naïve Bayes' Classifier can be understood with the help of the
below example:
Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we should
play or not on a particular day according to the weather conditions. So to solve
this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Likelihood table weather condition:
Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Types of Naïve Bayes Model:
There are three types of Naive Bayes Model, which are given below:
o Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of
discrete, then the model assumes that these values are sampled from the
Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the
data is multinomial distributed. It is primarily used for document
classification problems, it means a particular document belongs to which
category such as Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial
classifier, but the predictor variables are the independent Booleans
variables. Such as if a particular word is present or not in a document.
This model is also famous for document classification tasks.
Python Implementation of the Naïve Bayes algorithm:
Steps to implement:
o Data Pre-processing step
o Fitting Naive Bayes to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.
1) Data Pre-processing step:
In this step, we will pre-process/prepare the data so that we can use it efficiently
in our code. It is similar as we did in data-pre-processing. The code for this is
given below:
Importing the libraries
import numpy as nm
import [Link] as mtp
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('user_data.csv')
x = [Link][:, [2, 3]].values
y = [Link][:, 4].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25,
random_state = 0)
# Feature Scaling
from [Link] import StandardScaler
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = [Link](x_test)
In the above code, we have loaded the dataset into our program using "dataset =
pd.read_csv('user_data.csv'). The loaded dataset is divided into training and test
set, and then we have scaled the feature variable.
The output for the dataset is given as:
2) Fitting Naive Bayes to the Training Set:
After the pre-processing step, now we will fit the Naive Bayes model to the
Training set. Below is the code for it:
# Fitting Naive Bayes to the Training set
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
[Link](x_train, y_train)
In the above code, we have used the GaussianNB classifier to fit it to the
training dataset. We can also use other classifiers as per our requirement.
Output:
Out[6]: GaussianNB(priors=None, var_smoothing=1e-09)
3) Prediction of the test set result:
Now we will predict the test set result. For this, we will create a new predictor
variable y_pred, and will use the predict function to make the predictions.
# Predicting the Test set results
y_pred = [Link](x_test)
Output:
The above output shows the result for prediction vector y_pred and real vector
y_test. We can see that some predications are different from the real values,
which are the incorrect predictions.
4) Creating Confusion Matrix:
Now we will check the accuracy of the Naive Bayes classifier using the
Confusion matrix. Below is the code for it:
# Making the Confusion Matrix
from [Link] import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
Output:
As we can see in the above confusion matrix output, there are 7+3= 10 incorrect
predictions, and 65+25=90 correct predictions.
5) Visualizing the training set result:
Next we will visualize the training set result using Na�ve Bayes Classifier.
Below is the code for it:
# Visualising the Training set results
from [Link] import ListedColormap
x_set, y_set = x_train, y_train
X1, X2 = [Link]([Link](start = x_set[:, 0].min() - 1, stop =
x_set[:, 0].max() + 1, step = 0.01),
[Link](start = x_set[:, 1].min() - 1, stop = x_set[:,
1].max() + 1, step = 0.01))
[Link](X1, X2, [Link]([Link]([[Link](),
[Link]()]).T).reshape([Link]),
alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
[Link]([Link](), [Link]())
[Link]([Link](), [Link]())
for i, j in enumerate([Link](y_set)):
[Link](x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
[Link]('Naive Bayes (Training set)')
[Link]('Age')
[Link]('Estimated Salary')
[Link]()
[Link]()
Output:
In the above output we can see that the Na�ve Bayes classifier has segregated
the data points with the fine boundary. It is Gaussian curve as we have used
GaussianNB classifier in our code.
6) Visualizing the Test set result:
# Visualising the Test set results
from [Link] import ListedColormap
x_set, y_set = x_test, y_test
X1, X2 = [Link]([Link](start = x_set[:, 0].min() - 1, stop =
x_set[:, 0].max() + 1, step = 0.01),
[Link](start = x_set[:, 1].min() - 1, stop = x_set[:,
1].max() + 1, step = 0.01))
[Link](X1, X2, [Link]([Link]([[Link](),
[Link]()]).T).reshape([Link]),
alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
[Link]([Link](), [Link]())
[Link]([Link](), [Link]())
for i, j in enumerate([Link](y_set)):
[Link](x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
[Link]('Naive Bayes (test set)')
[Link]('Age')
[Link]('Estimated Salary')
[Link]()
[Link]()
Output:
Random Forest Hyperparameters
1. n_estimators
Random Forest is nothing but a set of trees. It is an extended version of the
Decision Tree in a very optimized way. One issue here might arise is how many
trees need to be created. n_estimator is the hyperparameter that defines the
number of trees to be used in the model. The tree can also be understood as the
sub-divisions.
By default: n_estimators=100
2. max_features
In order to train the Machine learning model, the given dataset should contain
multiple features/variables to predict the label/target. Max_features limits a
count to select the maximum features in each tree.
By default: max_features="sqrt" [available: ["sqrt", "log2", None}]
3. max_depth
A tree is incomplete without a split or child node. max_depth determines the
maximum number of splits each tree can take. If the max_depth is too low, the
model will be trained less and have a high bias, leading the model to underfit. In
the same way, if the max_depth is high, the model learns too much and leads to
high variance, leading the model to overfit.
By default: max_depth=None
4. max_leaf_nodes
We have a tree and know what max_depth is used for. Talking of a Tree, each
tree is used to split into multiple nodes. But how many divisions of nodes
should be done is specified by max_lead_nodes. max_leaf_nodes restricts the
growth of each tree.
By default: max_leaf_nodes = None; (takes an unlimited number of nodes)
5. max_sample
Apart from the features, we have a large set of training datasets. max_sample
determines how much of the dataset is given to each individual tree.
By default: max_sample = None; (this means [Link][0] is taken)
6. min_sample_split
Since ensemble algorithms are weak learners and are derived from strong
learners, Random Forest which is a Weak Learner depends on Decision Tree
decisions. min_sample_split determines the minimum number of decision tree
observations in any given node in order to split.
By default: min_sample_split = 2 (this means every node has 2 subnodes)
What is Attribute Selection Measures?
The tree node generated for partition D is labeled with the splitting criterion,
branches are increase for each result of the criterion, and the tuples are isolated
accordingly. There are three famous attribute selection measures including
information gain, gain ratio, and gini index.
Information gain − Information gain is used for deciding the best
features/attributes that render maximum data about a class. It follows the
method of entropy while aiming at reducing the level of entropy, starting from
the root node to the leaf nodes.
Let node N defines or hold the tuples of partition D. The attribute with the
largest information gain is selected as the splitting attribute for node N. This
attribute minimizes the data required to define the tuples in the resulting
subdivide and reflects the least randomness or “impurity” in these subdivide.
Gain ratio − The information gain measure is biased approaching tests with
several results. It can select attributes having a high number of values. For
instance, consider an attribute that facilitates as a unique identifier, including
product ID.
A split on product ID can result in a huge number of partitions, each one
including only one tuple. Because each partition is authentic, the data needed to
define data set D based on this partitioning would be Infoproduct_ID(D) = 0.
Gini index − The Gini index can be used in CART. The Gini index calculates
the impurity of D, a data partition or collection of training tuples, as
Decision Tree Classification Algorithm
Decision Tree is a Supervised learning technique that can be used for
both classification and Regression problems, but mostly it is preferred for
solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the
outcome.
In a Decision tree, there are two nodes, which are the Decision
Node and Leaf Node. Decision nodes are used to make any decision and
have multiple branches, whereas Leaf nodes are the output of those
decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of the
given dataset.
It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like
structure.
In order to build a tree, we use the CART algorithm, which stands
for Classification and Regression Tree algorithm.
A decision tree simply asks a question, and based on the answer
(Yes/No), it further split the tree into subtrees.
Below diagram explains the general structure of a decision tree:
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best
algorithm for the given dataset and problem is the main point to remember
while creating a machine learning model. Below are the two reasons for using
the Decision tree:
Decision Trees usually mimic human thinking ability while making a
decision, so it is easy to understand.
The logic behind the decision tree can be easily understood because it
shows a tree-like structure.
Decision Tree Terminologies
Root Node: Root node is from where the decision tree starts. It represents the
entire dataset, which further gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the
tree.
Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
How does the Decision Tree algorithm Work?
In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree. This algorithm compares the values of root
attribute with the record (real dataset) attribute and, based on the comparison,
follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further. It continues the process until it reaches the
leaf node of the tree. The complete process can be better understood using the
below algorithm:
Step-1: Begin the tree with the root node, says S, which contains the
complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best
attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the
decision tree starts with the root node (Salary attribute by ASM). The root node
splits further into the next decision node (distance from the office) and one leaf
node based on the corresponding labels. The next decision node further gets
split into one decision node (Cab facility) and one leaf node. Finally, the
decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
Advantages of the Decision Tree
It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
It can be very useful for solving decision-related problems.
It helps to think about all the possible outcomes for a problem.
There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree
The decision tree contains lots of layers, which makes it complex.
It may have an overfitting issue, which can be resolved using
the Random Forest algorithm.
For more class labels, the computational complexity of the decision tree
may increase.
why use decision tree algo?The decision tree algorithm is used in machine
learning because it's an effective way to make decisions by:
Laying out possible outcomes: Decision trees help developers analyze
the potential consequences of a decision.
Predicting outcomes: As the algorithm accesses more data, it can
predict outcomes for future data.
Producing accurate models: Decision trees can produce accurate and
interpretable models with minimal user intervention.
Handling data: Decision trees can handle both categorical and
numerical data.
Being fast: The decision tree algorithm is fast at both build time and
apply time.
Python Implementation of Decision Tree
Steps will also remain the same, which are given below:
o Data Pre-processing step
o Fitting a Decision-Tree algorithm to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.
1. Data Pre-Processing Step:
Below is the code for the pre-processing step:
# importing libraries
import numpy as nm
import [Link] as mtp
import pandas as pd
#importing datasets
data_set= pd.read_csv('user_data.csv')
#Extracting Independent and dependent Variable
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25,
random_state=0)
#feature Scaling
from [Link] import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
In the above code, we have pre-processed the data. Where we have loaded the
dataset, which is given as:
2. Fitting a Decision-Tree algorithm to the Training set
Now we will fit the model to the training set. For this, we will import the
DecisionTreeClassifier class from [Link] library. Below is the code for it:
#Fitting Decision Tree classifier to the training set
From [Link] import DecisionTreeClassifier
classifier= DecisionTreeClassifier(criterion='entropy', random_state=0)
[Link](x_train, y_train)
In the above code, we have created a classifier object, in which we have passed
two main parameters;
o "criterion='entropy': Criterion is used to measure the quality of split,
which is calculated by information gain given by entropy.
o random_state=0": For generating the random states.
Below is the output for this:
Advertisement
Out[8]:
DecisionTreeClassifier(class_weight=None, criterion='entropy',
max_depth=None,
max_features=None, max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=0, splitter='best')
3. Predicting the test result
Now we will predict the test set result. We will create a new prediction vector
y_pred. Below is the code for it:
#Predicting the test set result
y_pred= [Link](x_test)
Output:
In the below output image, the predicted output and real test output are given.
We can clearly see that there are some values in the prediction vector, which are
different from the real vector values. These are prediction errors.
Advertisement
4. Test accuracy of the result (Creation of Confusion matrix)
In the above output, we have seen that there were some incorrect predictions, so
if we want to know the number of correct and incorrect predictions, we need to
use the confusion matrix. Below is the code for it:
#Creating the Confusion matrix
from [Link] import confusion_matrix
cm= confusion_matrix(y_test, y_pred)
Output:
Advertisement
In the above output image, we can see the confusion matrix, which has 6+3= 9
incorrect predictions and62+29=91 correct predictions. Therefore, we can say
that compared to other classification models, the Decision Tree classifier made
a good prediction.
5. Visualizing the training set result:
Here we will visualize the training set result. To visualize the training set result
we will plot a graph for the decision tree classifier. The classifier will predict
yes or No for the users who have either Purchased or Not purchased the SUV
car as we did in Logistic Regression. Below is the code for it:
#Visulaizing the trianing set result
from [Link] import ListedColormap
x_set, y_set = x_train, y_train
x1, x2 = [Link]([Link](start = x_set[:, 0].min() - 1, stop =
x_set[:, 0].max() + 1, step =0.01),
[Link](start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step =
0.01))
[Link](x1, x2, [Link]([Link]([[Link](),
[Link]()]).T).reshape([Link]),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
[Link]([Link](), [Link]())
[Link]([Link](), [Link]())
fori, j in enumerate([Link](y_set)):
[Link](x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
[Link]('Decision Tree Algorithm (Training set)')
[Link]('Age')
[Link]('Estimated Salary')
[Link]()
[Link]()
Output:
The above output is completely different from the rest classification models. It
has both vertical and horizontal lines that are splitting the dataset according to
the age and estimated salary variable.
As we can see, the tree is trying to capture each dataset, which is the case of
overfitting.
6. Visualizing the test set result:
Visualization of test set result will be similar to the visualization of the training
set except that the training set will be replaced with the test set.
#Visulaizing the test set result
from [Link] import ListedColormap
x_set, y_set = x_test, y_test
x1, x2 = [Link]([Link](start = x_set[:, 0].min() - 1, stop =
x_set[:, 0].max() + 1, step =0.01),
[Link](start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step =
0.01))
[Link](x1, x2, [Link]([Link]([[Link](),
[Link]()]).T).reshape([Link]),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
[Link]([Link](), [Link]())
[Link]([Link](), [Link]())
fori, j in enumerate([Link](y_set)):
[Link](x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
[Link]('Decision Tree Algorithm(Test set)')
[Link]('Age')
[Link]('Estimated Salary')
[Link]()
[Link]()
Advertisement
Output:
Random Forest Algorithm
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble learning,
which is a process of combining multiple classifiers to solve a complex problem
and to improve the performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of
decision trees on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset." Instead of relying on one
decision tree, the random forest takes the prediction from each tree and based on
the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
Note: To better understand the Random Forest Algorithm, you should have
knowledge of the Decision Tree Algorithm.
Assumptions for Random Forest
Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct output,
while others may not. But together, all the trees predict the correct output.
Therefore, below are two assumptions for a better Random forest classifier:
o There should be some actual values in the feature variable of the dataset
so that the classifier can predict accurate results rather than a guessed
result.
o The predictions from each tree must have very low correlations.
Why use Random Forest?
Below are some points that explain why we should use the Random Forest
algorithm:
o It takes less training time as compared to other algorithms.
o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is missing.
How does Random Forest algorithm work?
Random Forest works in two-phase first is to create the random forest by
combining N decision tree, and second is to make predictions for each tree
created in the first phase.
The Working process can be explained in the below steps and diagram:
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points
(Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each decision
tree produces a prediction result, and when a new data point occurs, then based
on the majority of results, the Random Forest classifier predicts the final
decision. Consider the below image:
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification
of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the
disease can be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
Advantages of Random Forest
o Random Forest is capable of performing both Classification and
Regression tasks.
o It is capable of handling large datasets with high dimensionality.
o It enhances the accuracy of the model and prevents the overfitting issue.
Disadvantages of Random Forest
o Although random forest can be used for both classification and regression
tasks, it is not more suitable for Regression tasks.
Python Implementation of Random Forest Algorithm
Now we will implement the Random Forest Algorithm tree using Python. For
this, we will use the same dataset "user_data.csv", which we have used in
previous classification models. By using the same dataset, we can compare the
Random Forest classifier with other classification models such as Decision tree
Classifier, KNN, SVM, Logistic Regression, etc.
Implementation Steps are given below:
o Data Pre-processing step
o Fitting the Random forest algorithm to the Training set
o Predicting the test result
o Test accuracy of the result (Creation of Confusion matrix)
o Visualizing the test set result.
[Link] Pre-Processing Step:
Below is the code for the pre-processing step:
# importing libraries
import numpy as nm
import [Link] as mtp
import pandas as pd
#importing datasets
data_set= pd.read_csv('user_data.csv')
#Extracting Independent and dependent Variable
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values
# Splitting the dataset into training and test set.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25,
random_state=0)
#feature Scaling
from [Link] import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
In the above code, we have pre-processed the data. Where we have loaded the
dataset, which is given as:
2. Fitting the Random Forest algorithm to the training set:
Now we will fit the Random forest algorithm to the training set. To fit it, we
will import the RandomForestClassifier class from the [Link] library.
The code is given below:
#Fitting Decision Tree classifier to the training set
from [Link] import RandomForestClassifier
classifier= RandomForestClassifier(n_estimators= 10,
criterion="entropy")
[Link](x_train, y_train)
In the above code, the classifier object takes below parameters:
o n_estimators= The required number of trees in the Random Forest. The
default value is 10. We can choose any number but need to take care of
the overfitting issue.
o criterion= It is a function to analyze the accuracy of the split. Here we
have taken "entropy" for the information gain.
Output:
RandomForestClassifier(bootstrap=True, class_weight=None,
criterion='entropy',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators=10,
n_jobs=None, oob_score=False, random_state=None,
verbose=0, warm_start=False)
3. Predicting the Test Set result
Since our model is fitted to the training set, so now we can predict the test
result. For prediction, we will create a new prediction vector y_pred. Below is
the code for it:
#Predicting the test set result
y_pred= [Link](x_test)
Output:
The prediction vector is given as:
By checking the above prediction vector and test set real vector, we can
determine the incorrect predictions done by the classifier.
4. Creating the Confusion Matrix
Now we will create the confusion matrix to determine the correct and incorrect
predictions. Below is the code for it:
#Creating the Confusion matrix
from [Link] import confusion_matrix
cm= confusion_matrix(y_test, y_pred)
Output:
As we can see in the above matrix, there are 4+4= 8 incorrect predictions and
64+28= 92 correct predictions.
5. Visualizing the training Set result
Here we will visualize the training set result. To visualize the training set result
we will plot a graph for the Random forest classifier. The classifier will predict
yes or No for the users who have either Purchased or Not purchased the SUV
car as we did in Logistic Regression. Below is the code for it:
from [Link] import ListedColormap
x_set, y_set = x_train, y_train
x1, x2 = [Link]([Link](start = x_set[:, 0].min() - 1, stop =
x_set[:, 0].max() + 1, step =0.01),
[Link](start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step =
0.01))
[Link](x1, x2, [Link]([Link]([[Link](),
[Link]()]).T).reshape([Link]),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
[Link]([Link](), [Link]())
[Link]([Link](), [Link]())
for i, j in enumerate([Link](y_set)):
[Link](x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
[Link]('Random Forest Algorithm (Training set)')
[Link]('Age')
[Link]('Estimated Salary')
[Link]()
[Link]()
Output:
The above image is the visualization result for the Random Forest classifier
working with the training set result. It is very much similar to the Decision tree
classifier. Each data point corresponds to each user of the user_data, and the
purple and green regions are the prediction regions. The purple region is
classified for the users who did not purchase the SUV car, and the green region
is for the users who purchased the SUV.
So, in the Random Forest classifier, we have taken 10 trees that have predicted
Yes or NO for the Purchased variable. The classifier took the majority of the
predictions and provided the result.
6. Visualizing the test set result
Now we will visualize the test set result. Below is the code for it:
#Visulaizing the test set result
from [Link] import ListedColormap
x_set, y_set = x_test, y_test
x1, x2 = [Link]([Link](start = x_set[:, 0].min() - 1, stop =
x_set[:, 0].max() + 1, step =0.01),
[Link](start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step =
0.01))
[Link](x1, x2, [Link]([Link]([[Link](),
[Link]()]).T).reshape([Link]),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
[Link]([Link](), [Link]())
[Link]([Link](), [Link]())
for i, j in enumerate([Link](y_set)):
[Link](x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
[Link]('Random Forest Algorithm(Test set)')
[Link]('Age')
[Link]('Estimated Salary')
[Link]()
[Link]()
Output: