0% found this document useful (0 votes)
9 views23 pages

Naive Bayes Classifier Overview and Applications

The document discusses the Naive Bayes Classifier, a statistical classifier based on Bayes' Theorem that predicts class membership probabilities. It highlights the algorithm's assumptions of class conditional independence and equal feature importance, along with its advantages in speed and accuracy for large datasets. Additionally, it addresses the Laplacian correction to avoid zero probabilities and mentions the limitations of the classifier due to its independence assumptions.

Uploaded by

2023.gargi.dhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views23 pages

Naive Bayes Classifier Overview and Applications

The document discusses the Naive Bayes Classifier, a statistical classifier based on Bayes' Theorem that predicts class membership probabilities. It highlights the algorithm's assumptions of class conditional independence and equal feature importance, along with its advantages in speed and accuracy for large datasets. Additionally, it addresses the Laplacian correction to avoid zero probabilities and mentions the limitations of the classifier due to its independence assumptions.

Uploaded by

2023.gargi.dhuri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 3

Classification
Naive Bayes Classifier

1
Bayesian Classification: Why?
■ Bayesian classifiers are Statistical classifiers
■ They can predict class membership probabilities such as the probability
that a given tuple belongs to a particular class.

■ Bayesian Classification is based on Bayes’ Theorem.

■ Performance:
■Comparable performance with decision tree .

■Exhibited high accuracy and speed when applied to large databases.

■ Assumptions:
1. Na¨ıve Bayesian classifiers assume that the effect of an attribute value on a
given class is independent of the values of the other attributes. This
assumption is called class conditional independence.

2. Features are equally important: All features are assumed to contribute


equally to the prediction of the class label.
2
Bayes’ Theorem

■ Bayes’ Theorem finds the probability of an event occurring given the probability of
another event that has already occurred. Bayes’ theorem is stated mathematically
as the following equation:

where A and B are events and P(B) ≠ 0


● Basically, we are trying to find probability of event A, given the event B is true.
Event B is also termed as evidence.
● P(A) is the priori of A (the prior probability, i.e. Probability of event before
evidence is seen). The evidence is an attribute value of an unknown instance
(here, it is event B).
● P(B) is Marginal Probability: Probability of Evidence.
● P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is
seen.
● P(B|A) is Likelihood probability i.e the likelihood that a hypothesis will come true
based on the evidence.
3
Naïve Bayesian Classification Algorithm
● Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.

● Naïve Bayes Classifier is one of the simple and most effective


Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.

● It is a probabilistic classifier, which means it predicts on the basis


of the probability of an object.

● Some popular examples of Naïve Bayes Algorithm are spam filtration,


Sentimental analysis, and classifying articles.

4
Naïve Bayesian Classification

Why is it called Naïve Bayes?


The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:
● Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features.

● Bayes: It is called Bayes because it depends on the principle of Bayes'


Theorem.

5
Naïve Bayesian Classification
■ For the given dataset, we can apply Bayes’ theorem as:

where, y is class variable and X is a dependent feature vector (of size n)


where:x=(x1,x2,....xn)

■ Now, as the denominator remains constant for a given input, we can


remove that term:

■ To create a Naive Baayes classifier model. we find the probability of


given set of inputs for all possible values of the class variable y and pick
up the output with maximum probability.
i.e We need to maximize P(X|y ) x P(y ), for i = 1, 2, …. n 6
Naïve Bayesian Classification

7
Naïve Bayesian Classification(Cntd..

8
Naïve Bayesian Classification(Cntd..

9
Naïve Bayesian Classification Example
Problem: If whether is sunny,then player should play or not?

10
Create Frequency table for weather conditions
Applying Bayes'theorem:
Weather Yes No
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
Overcast 5 0

Rainy 2 2 P(Sunny|Yes)= 3/10= 0.3

Sunny 3 2 P(Sunny)= 0.35

Total 10 5 P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that


P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game. 11


Example: Predicting a class label using Naïve
Bayesian classification

• The data tuples are described by the attributes age, income, student, and credit rating.
• The class label attribute, buys computer, has two distinct values (namely, {yes, no}).
• Let C1 correspond to the class buys computer = yes and C2 correspond to buys
computer = no.

• The tuple we wish to classify is

12
Example: Predicting a class label using Naïve
Bayesian classification

• The tuple we wish to classify is

13
Example: Predicting a class label using Naïve
Bayesian classification

• We need to maximize P(X|Ci) x P(Ci), for i = 1, 2.


• Lets compute P(Ci), the prior probability of each class based on the training tuples as:
P(C1) = P(buys computer = yes) = 9/14 = 0.643
P(C2) = (buys computer = no) = 5/14 = 0.357

14
Example: Predicting a class label using Naïve
Bayesian classification

15
Example:

16
Example: Predicting a class label using Naïve
Bayesian classification

• We need to maximize P(X|Ci) P(Ci), for i = 1, 2.


• Compute P(Ci), the prior probability of each class based on the training tuples as:
P(C1) = P(Class= P) =9/14
P(C2) = (Class=N) =5/15
• To compute P(X | Ci) , for i=1,2,…., compute conditional probabilities:
P(Outlook=rain | calss=P) =
P(Outlook=rain | calss=N) =
P(Temp=Hot | calss=P) =
P(Temp=Hot | calss=N) =
P(Humidity=high| calss=P) =
P(Humidity=high| calss=N) =
P(Windy=false| calss=P) =
P(Windy=false| calss=P) =

• Use these probabilities and find P(X|Class=P) and P(X|Class= N)

• To find the class Ci that maximizes P(X|Ci)P(Ci) ,


Compute P(X|Class=P).P(Class=P) and P(X|Class=N).P(Class=N)
17
Example: Predicting a class label using Naïve
Bayesian classification

• We need to maximize P(X|Ci) P(Ci), for i = 1, 2.


• Compute P(Ci), the prior probability of each class based on the training tuples as:
P(C1) = P(Class= P) =9/14
P(C2) = (Class=N) =5/15
• To compute P(X | Ci) , for i=1,2,…., compute conditional probabilities:
P(Outlook=rain | calss=P) =3/9
P(Outlook=rain | calss=N) =2/5
P(Temp=Hot | calss=P) =2/9
P(Temp=Hot | calss=N) =2/5
P(Humidity=high| calss=P) =3/9
P(Humidity=high| calss=N) =4/5
P(Windy=false| calss=P) =6/9
P(Windy=false| calss=P) = 2/5

• To find the class Ci that maximizes P(X|Ci)P(Ci) ,


Compute P(X|Class=P).P(Class=P) and P(X|Class=N).P(Class=N)
P(X|Class=P).P(Class=P) = 0.010582 and
P(X|Class=N).P(Class=N) = 0.018286
Hence X is classified with Class= N
18
Applications of Naive Bayes Algorithm

● It is used for Credit Scoring.


● It is used in medical data classification.
● It can be used in real-time predictions because Naïve Bayes Classifier is
an eager learner.
● It is used in Text classification such as Spam filtering and Sentiment
analysis.

19
Advantages of Naive Bayes Algorithm

Advantages of Naïve Bayes Classifier:

● Naïve Bayes is one of the fast and easy ML algorithms to predict a


class of datasets.
● It can be used for Binary as well as Multi-class Classifications.
● It performs well in Multi-class predictions as compared to the other
Algorithms.
● It is the most popular choice for text classification problems where
the dataset is multidimensional.

20
Avoiding the 0-Probability Problem
■ Naïve Bayesian prediction requires each conditional [Link] be non-zero.
Otherwise, the predicted prob. will be zero

■ We can assume that our training dataset, D, is so large that adding one to each
count that we need would only make a negligible difference in the estimated
probability value, yet would conveniently avoid the case of probability values of
zero. This technique for probability estimation is known as the Laplacian
correction or Laplace estimator

Ex. Suppose a dataset with 1000 tuples, income=low (0), income= medium
(990), and income = high (10),
■ Use Laplacian correction (or Laplacian estimator)
■ Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
■ The “corrected” prob. estimates are close to their “uncorrected”
counterparts,yet zero probability value is avoided. 21
Naïve Bayesian Classifier: Comments
■ Disadvantages
■ Assumption: class conditional independence, therefore

loss of accuracy
■ Practically, dependencies exist among variables

■ E.g., hospitals: patients: Profile: age, family history, etc.


Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
■ Dependencies among these cannot be modeled by Naïve

Bayesian Classifier

■ How to deal with these dependencies?


■ Bayesian Belief Networks - allows a subset of the

variables to be conditionally independent

22
Naïve Bayesian Classifier: Disadvantages
■ Disadvantages
■ Assumption: class conditional independence, therefore

loss of accuracy

■ Practically, dependencies exist among variables or


attributes.

■ How to deal with these dependencies?


■ Bayesian Belief Networks –allows a subset of the

variables to be conditionally independent

23

You might also like