0% found this document useful (0 votes)
14 views47 pages

Importance of Bayesian Methods in ML

Uploaded by

Manali Patel
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views47 pages

Importance of Bayesian Methods in ML

Uploaded by

Manali Patel
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

6.

Bayesian Concept Learning


Subject: Machine Learning (3170724)
Faculty: Dr. Ami Tusharkant Choksi
Associate professor, Computer Engineering Department,
Navyug Vidyabhavan Trust
[Link] College of Engineering and Technology,
Surat, Gujarat State, India.
Website: [Link]

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)


Contents
- Impotence of Bayesian methods,
- Bayesian theorem,
- Bayes’ theorem and concept learning,
- Bayesian Belief Network

05 hours

CO-3 Evaluate the various Supervised Learning algorithms using appropriate Dataset.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 2


Introduction
● Bayes theorem was introduced by 18th century mathematician Thomas Bayes.
● He developed the foundational mathematical principles, known as Bayesian
methods, which describe the probability of events, and more importantly, how
probabilities should be revised when there is additional information available.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 3


WHY BAYESIAN METHODS ARE IMPORTANT?
● Bayesian learning algorithms, like the naive Bayes classifier, are highly practical
approaches to certain types of learning problems as they can calculate explicit
probabilities for hypotheses.
● In many cases, they are equally competitive or even outperform the other
learning algorithms, including decision tree and neural network algorithms.
● Bayesian classifiers use a simple idea that the training data are utilized to
calculate an observed probability of each class based on feature values.
● When the same classifier is used later for unclassified data, it uses the observed
probabilities to predict the most likely class for the new features.
● The application of the observations from the training data can also be thought of
as applying our prior knowledge or prior belief to the probability of an outcome,
so that it has higher probability of meeting the actual or real-life outcome.
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 4
WHY BAYESIAN METHODS ARE IMPORTANT?
● This simple concept is used in Bayes’ rule and applied for training a machine in
machine learning terms.
● Some of the real-life uses of Bayesian classifiers are as follows:
○ Text-based classification such as spam or junk mail filtering,
○ author identification, or topic categorization
○ Medical diagnosis such as given the presence of a set of observed symptoms during a disease,
identifying the probability of new patients having the disease
○ Network security such as detecting illegal intrusion or anomaly in computer networks
● Strength of Bayesian classifiers is that they utilize all available parameters,
others ignore the features that have weak effects.
● Bayesian classifiers assume that even if few individual parameters have small
effect on the outcome, the collective effect of those parameters could be quite
large. For such learning tasks, the naive Bayes classifier is most effective.
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 5
Features of Bayesian learning methods
● Prior knowledge of the candidate hypothesis is combined with the observed data
for arriving at the final probability of a hypothesis.
● So, two important components are :
○ the prior probability of each candidate hypothesis
○ the probability distribution over the observed data set for each possible hypothesis.
● The Bayesian approach to learning is more flexible than the other approaches
because each observed training pattern can influence the outcome of the
hypothesis by increasing or decreasing the estimated probability about the
hypothesis, whereas most of the other algorithms tend to eliminate a hypothesis
if that is inconsistent with the single training pattern.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 6


Features of Bayesian learning methods
● Bayesian methods can perform better than the other methods while validating
the hypotheses that make probabilistic predictions.
○ For example, when starting a new software project, on the basis of the demographics of the
project, we can predict the probability of encountering challenges during execution of the project.
● Through the easy approach of Bayesian methods, it is possible to classify new
instances by combining the predictions of multiple hypotheses, weighted by
their respective probabilities.
● In some cases, when Bayesian methods cannot compute the outcome
deterministically, they can be used to create a standard for the optimal decision
against which the performance of other methods can be measured.
● Bayesian method depends on the probability of the hypothesis set. If these
probabilities are not known in advance, we will use background knowledge.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 7


BAYES’ THEOREM
● Concept learning: how a child starts to learn meaning of new words, e.g. ‘ball’.
● The child is provided with positive examples of ‘objects’ which are ‘ball’.
● At first, the child may be confused with many different colours, shapes and sizes
of the balls and may also get confused with some objects which look similar to
ball, like a balloon or a globe.
● The child’s parent continuously feeds her positive examples like ‘that is a ball’,
‘this is a green ball’, ‘bring me that small ball’, etc.
● Seldom there are negative examples used for such concept teaching, like ‘this is a
non-ball’, but the parent may clear the confusion of the child when it points to a
balloon and says it is a ball by saying ‘that is not a ball’.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 8


Ball

Shape: round Shape: round


Color: red Color: red
Size: small Size: big Shape: round
Material: Rubber Material: Rubber Color: red, blue, white, yellow,
like Plastic orange, red
Size: small
Material: Plastic

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 9


BAYES’ THEOREM
● But it is observed that the learning is most influenced through positive examples
rather than through negative examples, and the expectation is that the child will
be able to identify the object ‘ball’ from a wide variety of objects and different
types of balls kept together once the concept of a ball is clear to her.
● We can extend this example to explain how we can expect machines to learn
through the feeding of positive examples, which forms the basis for concept
learning
● Let’s relate learning concept to model of Bayes.
● ‘meaning of a word’ as equivalent to learning, a concept using binary
classification.
● Let us define a concept set C and a corresponding function f(k). We also define
f(k) = 1, when k is within the set C and f(k) = 0
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 10
BAYES’ THEOREM
● Our aim is to learn the indicator function f that defines which elements are
within the set C. So, by using the function f, we will be able to classify the
element either inside or outside our concept set.
● In Bayes’ theorem, we will learn how to use standard probability calculus to
determine the uncertainty about the function f, and we can validate the
classification by feeding positive examples.
● Bayes theorem,

● where A and B are conditionally related events and p(A|B) denotes the
probability of event A occurring when event B has already occurred.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 11


BAYES’ THEOREM
● Let us assume that we have a training data set D where we have noted some
observed data. Our task is to determine the best hypothesis in space H by using
the knowledge of D.
● We should have knowledge of 1. Prior probability 2. Posterior probability 3.
likelihood
● The prior knowledge or belief about the probabilities of various hypotheses in H
is called Prior in context of Bayes’ theorem.
● For example, if we have to determine whether a particular type of tumour is
malignant for a patient, the prior knowledge of such tumours becoming
malignant can be used to validate our current hypothesis and is a prior
probability or simply called Prior.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 12


BAYES’ THEOREM
● Let us introduce few notations to explain the concepts. We will assume that
○ P(h) is the initial probability of a hypothesis ‘h’ that the patient has a malignant tumour based
only on the malignancy test, without considering the prior knowledge of the correctness of the
test process or the so-called training data.
○ Similarly, P(T) is the prior probability that the training data will be observed or, in this case, the
probability of positive malignancy test results.
○ We will denote P(T|h) as the probability of observing data T in a space where ‘h’ holds true, which
means the probability of the test results showing a positive value when the tumour is actually
malignant

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 13


Posterior probability
● The probability that a particular hypothesis holds for a data set based on the
Prior is called the posterior probability or simply Posterior.
● In the previous example, the probability of the hypothesis that the patient has a
malignant tumour considering the Prior of correctness of the malignancy test is
a posterior probability. In our notation, we will say that we are interested in
finding out P(h|T), which means whether the hypothesis holds true given the
observed training data T. This is called the posterior probability or simply
Posterior in machine learning language.
● So, the prior probability P(h), which represents the probability of the hypothesis
independent of the training data (Prior), now gets refined with the introduction
of influence of the training data as P(h|T).
● According to Bayes’ theorem combines prior and posterior probabilities.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 14


Bayes’ theorem
● From the above equation, we can deduce that P(h|T) increases as P(h) and
P(T|h) increases and also as P(T) decreases.
● The simple explanation is that when there is more probability that T can occur
independently of h then it is less probable that h can get support from T in its
occurrence.
● It is a common question in machine learning problems to find out the maximum
probable hypothesis h from a set of hypotheses H (h∈H) given the observed
training data T.
● This maximally probable hypothesis is called the maximum a posteriori (MAP)
hypothesis. By using Bayes’ theorem, we can identify the MAP hypothesis from
the posterior probability of each candidate hypothesis:

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 15


Bayes’ theorem
● and as P(T) is a constant independent of h, in this case, we can write

● In certain machine learning problems, we can further simplify above equation. if


every hypothesis in H has equal probable priori as P(hi) = P(hj)[flipping coin
● P(head)=P(tail)],
● and then, we can determine P(h|T) from the probability P(T|h) only. Thus,
P(T|h) is called the likelihood of data T given h, and any hypothesis that
maximizes P(T|h) is called the maximum likelihood (ML) hypothesis, h.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 16


Bayes’ theorem

Above figures are for the conceptual and mathematical representation of Bayes theorem and the
relationship of Prior, Posterior and Likelihood.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 17


Example
● Let us take the example of malignancy identification in a particular patient’s
tumour as an application for Bayes rule.
● We will calculate how the prior knowledge of the percentage of cancer cases in a
sample population and probability of the test result being correct influence the
probability outcome of the correct diagnosis.
● We have two alternative hypotheses: (1) a particular tumour is of malignant
type and (2) a particular tumour is non-malignant type. The priori available
are—1. only 0.5% of the population has this kind of tumour which is malignant,
2. the laboratory report has some amount of incorrectness as it could detect the
malignancy was present only with 98% accuracy whereas could show the
malignancy was not present correctly only in 97% of cases.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 18


Example: Solution
● This means the test predicted malignancy was present which actually was a false alarm in 2% of
the cases, and also missed detecting the real malignant tumour in 3% of the cases.
● Let us denote Malignant Tumour = MT,
● Positive Lab Test = PT,
● Negative Lab Test = NT
● h1 = the particular tumour is of malignant type = MT in our example
● h2 = the particular tumour is not malignant type = !MT in our example
● P(MT) = 0.005
● P(!MT) = 0.995
● P(PT|MT) = 0.98
● P(PT|!MT) = 0.02
● P(NT|!MT) = 0.97
● P(NT|MT) = 0.03

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 19


Example: Solution
● So, for the new patient, if the laboratory test report shows positive result, let us see if
we should declare this as the malignancy case or not:

● As P(h 2|PT) is higher than P(h1 |PT), it is clear that the hypothesis h2 has more
probability of being true. So, hMAP = h2 = !MT.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 20


Example: Solution
● This indicates that even if the posterior probability of malignancy is significantly
higher than that of nonmalignancy, the probability of this patient not having
malignancy is still higher on the basis of the prior knowledge.
● Also, it should be noted that through Bayes’ theorem, we identified the
probability of one hypothesis being higher than the other hypothesis, and we did
not completely accept or reject the hypothesis by this theorem.
● Furthermore, there is very high dependency on the availability of the prior data
for successful application of Bayes’ theorem.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 21


BAYES’ THEOREM AND CONCEPT LEARNING
● One simplistic view of concept learning can be that if we feed the machine with
the training data, then it can calculate the posterior probability of the
hypotheses and outputs the most probable hypothesis. This is also called brute-
force Bayesian learning algorithm
● It is also observed that consistency in providing the right probable hypothesis by
this algorithm is very comparable to the other algorithms.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 22


Brute-force Bayesian algorithm
● How to use the MAP(maximum a posteriori) hypothesis output to design a simple
learning algorithm called brute-force map learning algorithm
● Let us assume that the learner considers a finite hypothesis space H in which the
learner will try to learn some target concept c:X → {0,1} where X is the instance space
corresponding to H. The sequence of training examples is {(x1 , t1 ), (x2 , t2 ),…, (xm ,
tm )}, where xi is the instance of X and ti is the target concept of xi defined as ti =
c(xi).
● Without impacting the efficiency of the algorithm, we can assume that the sequence
of instances of x {x1 ,…, xm } is held fixed, and then, the sequence of target values
becomes T = {t1 ,…, tm }.
● For calculating the highest posterior probability, we can use Bayes’ theorem as
discussed earlier : Calculate the posterior probability of each hypothesis h in H:

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 23


Buys_computer
Student-computer
Not student - credit_rating-fair-yes
no
Buy Computer or not?

Lecturer - low
P(yes)=0.5
P(no)=0.7

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 24


Brute-force Bayesian algorithm
● Let us try to connect the concept learning problem with the problem of
identifying the h .
● On the basis of the probability distribution of P(h) and P(T|h), we can derive the
prior knowledge of the learning task. There are few important assumptions to be
made as follows:
● 1. The training data or target sequence T is noise free, which means that it is a
direct function of X only (i.e. ti = c(xi))
● 2. The concept c lies within the hypothesis space H
● 3. Each hypothesis is equally probable and independent of each other.
● On the basis of assumption 3, we can say that each hypothesis h within the space
H has equal prior probability, and also because of assumption 2, we can say that
these prior probabilities sum up to 1. So, we can write
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 25
Brute-force Bayesian algorithm
● P(T|h) is the probability of observing the target values ti in the fixed set of
instances {x1 ,…, xm ) in the space where h holds true and describes the concept
c correctly.
● Using assumption 1, we can say that if T is consistent with h, then the probability
of data T given the hypothesis h is 1 and is 0 otherwise:

● Using Bayes’ theorem to identify the posterior probability


● For the cases when h is inconsistent with the training data T, we get

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 26


Brute-force Bayesian algorithm
● when h is consistent with T,

● Now, if we define a subset of the


hypothesis H which is consistent
with T as HD , then by using the
total probability equation, we get
● So, with our set of assumptions
about P(h) and P(T|h), we get the
posterior probability P(h|T) as

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 27


Brute-force Bayesian algorithm
● So, with our set of assumptions about P(h) and P(T|h), we get the posterior probability
P(h|T) as

● where H is the number of hypotheses from the space H which are consistent with target data
set T.
● The interpretation of this evaluation is that initially, each hypothesis has equal probability
and, as we introduce the training data, the posterior probability of inconsistent hypotheses
becomes zero and the total probability that sums up to 1 is distributed equally among
● the consistent hypotheses in the set. So, under this condition, each consistent hypothesis is a
MAP hypothesis with posterior probability 1/|HD|

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 28


Concept of consistent learners
● From the discussion, we understand the behaviour of the general class of learner
whom we call as consistent learners.
● So, the group of learners who commit zero error over the training data and output
the hypothesis are called consistent learners.
● If the training data is noise free and deterministic (i.e. P(D|h) = 1 if D and h are
consistent and 0 otherwise) and
● if there is uniform prior probability distribution over H (so, P(hm) = P(hn ) for all
m, n), then every consistent learner outputs the MAP hypothesis.
● An important application of this conclusion is that Bayes’ theorem can
characterize the behaviour of learning algorithms even when the algorithm does
not explicitly manipulate the probability.
● As it can help to identify the optimal distributions of P(h) and P(T|h) under which
the algorithm outputs the MAP hypothesis, the knowledge can be used to
characterize the assumptions under which the algorithms behave optimally.
● The theorem can be used with the same effectiveness for noisy training data and
additional assumptions about the probability distribution governing the noise.
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 29
Bayes optimal classifier
● Bayes Optimal Classifier is a probabilistic model that finds the most probable
prediction using the training data and space of hypotheses to make a prediction
for a new data instance
● Why Bayes optimal classifier: It can be shown that of all classifiers, the Optimal
Bayes classifier is the one that will have the lowest probability of miss classifying
an observation, i.e. the lowest probability of error. So if we know the posterior
distribution, then using the Bayes classifier is as good as it gets.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 30


Bayes optimal classifier
● To illustrate the concept, let us assume three hypotheses h1 , h2 , and h3 in the
hypothesis space H. Let the posterior probability of these hypotheses be 0.4, 0.3,
and 0.3, respectively.
● There is a new instance x, which is classified as true by h1, but false by h2 and h3.
● Then the most probable classification of the new instance (x) can be obtained by
combining the predictions of all hypotheses weighed by their corresponding
posterior probabilities.
● By denoting the possible classification of the new instance as c from the set C, the
probability P(ci |T) that the correct classification for the new instance is ci is
● The optimal classification is for which P(ci|T) is maximum, is

● The optimal classification is for which


● P(ci|T) is maximum is,
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 31
Bayes optimal classifier
● So, extending the example, the set of possible outcomes for the new instance x is
within the set C = {True, False} and
● P(h1 | T) = 0.4, P(False | h1 ) = 0, P(True | h1 ) = 1
● P(h2 | T) = 0.3, P(False | h2 ) = 1, P(True | h2) = 0
● P(h3 | T) = 0.3, P(False | h3 ) = 1, P(True | h3 ) = 0
● Then,


This method maximizes the probability that the new instance is classified
correctly when the available training data, hypothesis space and the prior
probabilities of the hypotheses are known. This is thus also called Bayes optimal
classifier
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 32
Next Lecture

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 33


Naïve Bayes classifier

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 34


Bayes’ Theorem: Basics
■ Total probability Theorem:

■ Bayes’ Theorem:

■ Let X be a data sample (“evidence”): class label is unknown


■ Let H be a hypothesis that X belongs to class C
■ Classification is to determine P(H|X), (i.e., posteriori probability): the probability
that the hypothesis holds given the observed data sample X
■ P(H) (prior probability): the initial probability
■ E.g., X will buy computer, regardless of age, income, …

■ P(X): probability that sample data is observed


■ P(X|H) (likelihood): the probability of observing the sample X, given that the
hypothesis holds
■ E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)
35
Naïve Bayes classifier

■ A prior probability of hypothesis h or P(h): This is the


probability of an event or hypothesis before the
evidence is observed.
■ 2. A posterior probability of h or P(h|D): This is the
probability of an event after the evidence is observed
within the population D.

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)


36
Naïve Bayes Classifier
■ A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between attributes):

■ This greatly reduces the computation cost: Only counts the class
distribution
■ If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk
for Ak divided by |Ci, D| (# of tuples of Ci in D)
■ If Ak is continous-valued, P(xk|Ci) is usually computed based on
Gaussian distribution with a mean μ and standard deviation σ

and P(xk|Ci) is

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)


37
Naïve Bayes classifier

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)


38
Naïve Bayes Classifier: Training Dataset

Class:
C1:buys_computer =
‘yes’
C2:buys_computer =
‘no’

Data to be classified: X = (age <=30, Income = medium, Student = yes


Credit_rating = Fair)
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)
39
Naïve Bayes Classifier: An Example
■ P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
■ Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
■ X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)
40
one more numerical example

Find Fruit={Yellow, Sweet, Long}


Fruit Yellow Sweet Long Total
Mango 350 450 0 650
Banana 400 300 350 400
others 50 100 50 150
Total 800 850 400 1200

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)


41
One more numerical example

Find Fruit={Yellow, Sweet, Long}


P(A|B) = P(B|A).P(A)/P(B)
(i) trying for mango
P(X|Mango) = P(Y|M)*P(S|M)*P(L|M)
P(Y|M) = P(M|Y).P(Y)/P(M)=0.53
P(S|M)=0.69
P(L|M)=0
P(X|Mango) = P(Y|M)*P(S|M)*P(L|M)=0

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)


42
One more numerical example

Find Fruit={Yellow, Sweet, Long}


P(A|B) = P(B|A).P(A)/P(B)
(ii) Banana
P(X|Banana) = P(Y|B)*P(S|B)*P(L|B)
P(Y|B) = 1
P(S|B)=0.75
P(L|B)=0.875
P(X|B) = 0.65

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)


43
One more numerical example
Find Fruit={Yellow, Sweet, Long}
P(A|B) = P(B|A).P(A)/P(B)
(iii)Others
P(Y|O) = 0.33
P(S|O)=0.66
P(L|O)=0.33
P(X|O) = 0.072

P(X|Mango)=0, P(X|B) = 0.65, P(X|O) = 0.072


Maximum probability is for P(X|B). So, Fruit is Banana

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724)


44
Naïve Bayes classifier

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 45


Applications of Naïve Bayes classifier

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 46


References
1. Why bayes optimal classifier, [Link]
optimal-classifier/

Dr. Ami T. Choksi @CKPCET Machine Learning (3170724) 47

You might also like