0% found this document useful (0 votes)
3 views18 pages

Module 3 Chapter 3

Chapter 3 introduces the basics of learning theory and machine learning, emphasizing the process of acquiring knowledge through experiences and the importance of concept learning. It outlines learning objectives, types of learning, and the design of learning systems, including the significance of hypothesis representation and the tradeoff between bias and variance. The chapter also discusses computational learning theory and the design of learning systems, highlighting the role of training experiences and target functions in developing effective machine learning models.

Uploaded by

Raghavendra gs
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views18 pages

Module 3 Chapter 3

Chapter 3 introduces the basics of learning theory and machine learning, emphasizing the process of acquiring knowledge through experiences and the importance of concept learning. It outlines learning objectives, types of learning, and the design of learning systems, including the significance of hypothesis representation and the tradeoff between bias and variance. The chapter also discusses computational learning theory and the design of learning systems, highlighting the role of training experiences and target functions in developing effective machine learning models.

Uploaded by

Raghavendra gs
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 3

iteMo

Anuttöt
Basics of
Learning Theory
"Predicting the future isn't magic, it's artificial intelligence."
- Dave Waters

Learning is a process by which one can acquire knowledge and


concepts based on the experiences. Machine learning is an construct new ideas or
general concept from training examples without writing a intelligent way of learning
machine learning algorithms through which computers can program. There are many
data or experiences, identify patterns, and make intelligently learn from past
predictions when new data is fed. This
chapter deals with an introduction of concept learning
of machine learning. of hypotheses space and modelling

Learning Objectives
Understand the basics of learning theory and the key elements of
machine learning
Introduce Concept learning to obtain an abstraction and
Learn about hypothesis representation language arnd to generalization
from the data
thesis space using Find-S algorithm and its limitations explore searching the hvpo
Study about version spaces,
ation algorithm List-Then-Eliminate algorithm and Candidate Elimin
Introduce Inductive bias, aset of prior assumptions
considered by alearning algorithm
beyond the training data in order to perform induction
Discuss the tradeoff between the two factors
called bias and variance which evist
when modelling a machine learning
algorithm
Understand different challenges in model selection and
the
model evaluation
Study popular re-sampling model selection
LOOCV, etc.) to tune machine learning modelmethod called Cross-Validation (A-told.
Introduce the various learning frameworks and learn to
evaluate hvpothesis
INTRODUCTION TOLEARNINGAND ITS TYPES
3.1
through study, experience,
The
The process of acquiring knowledge and expertise
Generally, humans learn in different ways. To make
called as learning. machines bei
learn,
ng t
we
augh,
ne d
simulate the strategiesof human learning in machines. But, will the computers learn?
and logicians,
This
questi
has been raised over manycenturies by philosophers, mathematicians
First let us address the question-What sort of tasks can the computers learn? This depends
the nature of problems that the computers can solve. There are two kinds of problems -
and ill-posed. Computers can solve only well-posed problems, as these have
Specifications and have the following components inherent to it.
-wwelelll--dpefosin,
1. Class of learning tasks (T)
2. A measure of performance (P)
3. A source of experience (E)
The standard definition of learning proposed by Tom
from Efor the task T, andPimproves with experience E. LetMitchell is that a program can lear
us formalize the concept of learning
as follows:
Let x be the input and Xbe the input space,
space, which is the set of all possible outputs, thatwhich
is,
is the set of all inputs, and Y is the
output
Let Dbe the input dataset with yes/no.
Let the unknown target examples, (x,,y,).(x, y,),*.(x, y,) for n inputs.
The objective of the learning function isbe f: X ’ Y, that maps the input space to output spae
program to pick a function, g: 2 ’ Yto
All the possible formulae form a hypothesis space. In
which the learning algorithm chooses. short, let H be theapproximate hypothesis
set of all formulae from
samples. This is shown in Figure 3.1. The choice is good when the hypothesisg replicates ffor all

Ideal target
hypothesis f)

Training
samples
Learning Generated
algorithms hypothesis g(x)
Candidate Error
formulae Difference}
Hypothesis
space H

Figure 3.1: Learning Environment


It can be observed that training samples and target function are
problem. The learning algorithm and hypothesis set are dependent on the given
independent the given problem. Thus
learning model is informally the hypothesis set and learning of
be stated as follows: algorithm. Thus, learning model can
Learning Model = Hypothesis Set + Learning Algorithm
Basics of Learning Theory 79

Let us assume a problem of predicting a label for a given input data. Let Dbe the input dataset
with bothpositive and negative examples. Let ybe the output with clas 0or 1. The simple learning
model can be given as:
2x,w, > Threshold, belongs toclass land
i=l
D

Exw <Threshold, belongs to another class


Thiscan be put into a single equation as follows:
h(x) =

where, ,,x,",x,are the components of the input vector, ,w,...w, are the weights and +1
d

and -1 represent the class. This simple model is called perception model. One can simplify this by
making w, =band fixing it as 1, then the model can further be simplified as:
H(x) = sign(w' x).
This is called perception learning algorithm. The formal learning modelsare later discussed in
Section 3.7 of this chapter.

Classical and Adaptive Machine Learning Systems


A
classical machine learning system has components such as Input, Process and Output. The input
values are taken from the environment directly. These values are processed and a hypothesis is
generated as output model. This model is then used for making predictions. The predicted values
are consumed by the environment.
In contrast to the classical systems, adaptive systems interact with the input for getting
labelled data as direct inputs are not available. This process is called reinforcement learning.
In reinforcement learning, a learning agent interacts with the environment and in return gets
feedback. Based on the feedback, the learning agent generates input samples for learning, which
are used for generating the learning model. Such learning agents are not staticand change their
behaviour according to the external signalreceived from the environment. The feedback is known
as reward and learning here is the ability of the learning agent adapting to the environment
based on the reward. These are the characteristics of an adaptive system.

Learning Types
There are different types of learning. Some of the different learning methods are as follows:
1. Learn by memorization or learn by repetition also called as rote learning is done by
memorizing without understanding the logic or concept. Although rote learning is
basically learning by repetition, in machine learning perspective, the learning ocurs by
simply comparing with the existing knowledge for the same input data and producing the
output if present.
en
2. Learn byexamples also called as learn by experience or previous knowledge acquired
at some time, is like finding an analogy, which means pertorming inductive learning
from observations that formulate a general concept. Here, the learner learns by inferring a
general rule fromn the set of observations or examples. Therefore, inductive learning is also
called as discovery learning.
80 Machine Learning
expert or a teacher, generally called as passive learmi
3. Learn by being taught by an learning called active learning where the can learner
However, there is a special kind of data instances with the desired
interactively query a teacher/expert to label unlabelled
outputs.
critical thinking, also called as deductive learning, deduces new facte
4. Learning by
conclusion from related knownfacts and informatiorn.

3. Self learning, also called as reinforcement learning, is a self-directed learning that


normally learns from mistakes punishments and rewards.
6. Learning to solve problems is atype of cognitive learning where learning happens
in the mind and is possible by devising a methodology to achieve a goal. Here, the
learner initially is not aware of the solution or the way to achieve the goal but only knows
the goal. The learning happens either directly from the initial state by following the
steps to achieve the goal or indirectly by inferring the behaviour.
7. Learning by generalizing explanations, also called as explanation-based learning (EBL,
is another learning method that exploits domain knowledge from experts to improve
the accuracy of learned concepts by supervised learning.
Acquiring general concept from specific instances of the training dataset is the main challenge
of machine learning.

Scan for the figure showing Types of Learning'

3.2 INTRODUCTION TO COMPUTATION LEARNING THEORY


There are many questions that have been
raised by mathematicians and logicians over the
taken by computers to learn. Some of the questions are as follows: tine
1. How can a learning system
predict an unseen instance?
How do the hypothesis h is close to f,
when hypothesis fitself is
3. How many samples are required? unknown?
4. Can we measure the performance
of a
5. Is the solution obtained local or global? learning system?
These questions are the basis of a field
called
(COLT). It is aspecialized field of study of
machine'Computational
short
learning. COLTLearning
Or in
used for learning systems. It deals with deals Theory
formal methods
algorithms. It provides afundamental frameworks
basis for
for study of quantifying learning
with
tasks and learning
Approximate Learning (PAC) and machine learning. It deals with Probably
the quantification of the computationalVapniofk-Chervonenki
tation capacity quantification is the focusdifficulty
of s (VC)
VC learning tasks and
dimensions. PACis
The focus Oof compu-
and the
Computational Learning Theory uses many dimension. algorithms
Computer Science, Artificial Intelligence and concepts from diverse areas Theoretical
Statistics. The core such as conceptof
concept COLT is the
of
Basics of LearningTheory 81
learning framework. One such important framework is PAC. The learning framework is discussed
in adetailed manner in Section 3.7. COLT focuses on supervised learning [Link] the complexity
of analyzing is difficult, normally, binary classification tasks are considered for analysis.

3.3 DESIGN OF A LEARNING SYSTEM


Asystem that is built around alearning algorithm is called alearning system. The design of systems
focuses on these steps:
1. Choosing a training experience
2. Choosing a target function
3. Representation of a target function
4. Function approximation

Training Experience
Let us consider designing of a chess game. In direct experience, individual board states and correct
moves of the chess game are given directly. In indirect systerm, the move sequences and results are
only given. The training experience also depends on the presence of a supervisor who can label all
valid moves for a board state. In the absence of a supervisor, the game agent plays against itself and
learns the good moves, if the training samples cover all scenarios, or in other words, distributed
enough for performance computation. If the training samples and testing samples have the same
distribution, the results would be good.

Determine the Target Function


The next step is the determination of a target function. In this step, the type of knowledge that
needs to be learnt is determined. In direct experience, a board move is selected and is determined
whether it is a good move or not against all other moves. If it is the best move, then it is chosen as:
B-> M, where, Band Mare legal moves. In indirect experience, chosen
alllegaland
moves are accepted and
executed.
a score is generated for each. The move with largest score is then

Determinethe Target Function Representation


network. The linear
Ihe representation of knowledge may be a table, collection of rules or a neural
Combination of these factors can be coined as:

V=Wo + w,x, + w,x, + w,s


Where, x,, x, and x, represent different board features and w, W, w, and w, represent weights.

Ghoosing an Approximation Algorithm for the Target Function


reduce
The focus is to choose weights and fit the given training samples effectively. The aim is to
the error given as:
E=
Trainins
Sanples
The approximation is
82 " Machine Learning
predicted hypothesis. carried
Here, bis the sample and Vb) is the
hypothesis. Leto
out as:
difference between trained
and expected
Computing the error as the
be error(b).
weights are updated
as:

Then, for every board feature x, the


error(b) Xx,
w, = W, +u x
is the constant that moderates the size of the weight update.
Here, u
Thus, the learning system has the following components:
play against itselt.
A Performance system to allow the game to
A Criticsystem to generate the samples.
A Generalizer system to generate a hypothesis based on samples.
learnt function.
An Experimenter system to generate a new system based on the currently
This is sent as input to the performance system.

3.4 INTRODUCTIONTO CONCEPT LEARNING


Concept learming is a learning strategy of acquiring abstract knowledge or inferring a genera
concept or deriving acategory from the given training samples. It is a process of abstraction and
generalization from the data.
Concept learning helps to classify an object that has a set of common, relevant features. Thus
it helps a learner compare and contrast
categories based on the similarity and association
positive and negative instances in the training data to
simplify by observing the common features from the classify an object. The learner trie
simplified model to the future samples. This task is also training samples and then apply
Each concept or category obtained by
known as learning from
experience.
or false value. For example, humans can learning is a Boolean valued function which takes atrue
relevant features and categorize all animalsidentify
based ondifferent kinds of animals based on features
commot
that distinguish one animal from
another can be called specific
as a
sets of features. The special
categoris
for object and to recognize new
instances concept. This way of
Boolean those categories is called aslearning learning
It is formally defined as inferring a of
Concept learning requires three things: valued function by concept
1. Input- Training dataset which
processing training instances.
is a set of
of a concept or category to
the model. which it training instances,
belongs. Use this past each with thenane
labeled andbuild
2. Output - Target
concept or Target experience to trat
output y. It is to
In other words, itdetermine
function f. It is a mapping
the specific inputr tv
is to
specific set of features tofind the
features or
common functiontof() tron ane.g.,the
object

3. Test- Newinstances to test identify hypot


an hesis determine the
elephant
to features
target identy For
the
learned model. from all
animals. concep
Basics of Learning Theory 83

Formally, Concept learning is defined as-"Given aset of hypotheses, the learner searches
through the hypothesis space to identify the best hypothesis that matches the target concept':.
Consider the following set of training instances shown in Table 3.1.
Table 3.1: Sample Training Instances
[Link]. Horns Tail Tusks Paws Fur Color Hooves Size Elephant
1. No Short Yes No No Black No B1g Yes
Yes Short No No No Brown Yes Mediumn No

3 No Short Yes No No Black No Medium Yes

4. No
Long No Yes Yes White No Medium No
5 No Short Yes Yes Yes Black No Big Yes

Here, in this set of training instances, the independent attributes considered are 'Horns', Tail',
Tusks',Paws', 'Fur', 'Color, Hooves' and 'Size'. The dependent attribute is 'Elephant'. The target
concept is to identify the animal to be an Elepharnt.
Let us now take this example and understand further the concept of hypothesis.
Target Concept: Predict the type of animal - For exampleElephant'.

3.4.1 Representation of a Hypothesis


A hypothesis 'h' approximates a target function f' to represent the relationship between the
independent attributes and the dependent attribute of the training instances. The hypothesis is the
predicted approximate model that best maps the inputs to outputs. Each hypothesis is represented
as a conjunction of attribute conditions in the antecedent part.
For example, (Tail =Short) a (Color = Black)...
The set of hypothesis in the search space is called as hypotheses. Hypotheses are the plural
form of [Link] 'H is used to represent the hypotheses and '*' is used to represent
a candidate hypothesis.
Each attribute condition is the constraint on the attribute which is represented as attribute- value
pair. In the antecedent of an attribute condition of a hypothesis, each attribute can take value as
either ? or '@' or can hold a single value.
"" denotes that the attribute can take anyvalue <e.g.,Color
"o" denotes that the attribute cannot take any value, i.e., it represents a null value
[e.g., Horns = p)
Single value denotes a specific single value from aceptable values of the attribute, ie, the
attribute*Tail' can take a value as 'short |e.g.. Tail =Short)
For example, a hypothesis 'h' will look like,
Horns Tail Tusks Paws Fur Color Hooves Size
h= <No Yes ? Black No Medium>
Given a test instance x, we say h(x) = 1, if the test instance xsatisfies this hypothesis h.
84 Machine Learning
training instances with 8
The training
and one dependentdataset
Concept are,
given above has 5
attribute. different hypotheses that can be
Here, the
predicted independentfor thaterib
Color Hooves Size
Hons Tail Tusks P'aws Fur
h= Black No
<No Yes 2
(or)
?
Medium
h= Black No
<No Yes Big
The task is to predict the best hypothesis for thetarget concept(an elephant). The
hypothesis can allow any value for each of the attribute. mOst gen
It is represented as:
<?, , ?, ?, ?,?,?, ?>. This hypothesis indicates that any animal can be an elephant.
The most specific hypothesis willnot allow any value for each of the attribute <o, o.o a.
9.. This hypothesis indicates that no animal can be an elephant.
The target concept mentioned in this example is to identify the conjunction of speciñc kea
from the training instances to correctly identify an elephant.

Example 3.1: Explain Concept Learning Task of an Elephant from the dataset given in Table
Given,
Input: 5 instances each with 8 attributes
Target concept/function 'c: Elephant ’ (Yes, No}
Hypotheses H: Set of hypothesis each with conjunctions of literals as propositions [ie. ea
literal is represented as an attribute-value pair]
Solution: The hypothesis 'h' for the concept learning task of an Elephant is given as:
h= <No Short Yes ? ? Black No ?>
This hypothesis h is expressed in propositional logic form as below:
(Horns = No) a (Tail= Short) a (Tusks =Yes) a (Paws=?) A(Fur=)a (Color = Black) A(Hne
No) a (Size =?)
Output: Learn the hypothesis 'h to predict an 'Elephant' such that for agiven test instance
h(x) = c(r)
This hypothesis produced is also called as concept description which is a model that can
used to classify subsequent instances.

Thus, concept learning can also be called as Inductive Learning that tries to induce aget
function from specifictraining instances. This way of learning ahypothesis that can produe
approximate target function with a sufficiently large set of training instances Can also appr
ca
mately classify other unobserved instances and is called as inductive learning hypothesis, We
targt
only determine an approximate target function because it is very difficult to find an exact
approximate targe
function with the observed training instances. That is why a hypothesis is an
function that best maps the inputs to outputs.
85 as algorithm
besthypothesis
different
example, reduced
by classification. more set
hvpotheses
negative [Link]
cost in
function decision called of =384 including
defined or set empty solution
the a
Theory function is possible x2x2) or or
a instances oneas efficientfrom a function
be learning
target larger
a
in For whereas further when the to abe
Learning x3 instance
containing
distinct space
can space
variables. the representing ordered hypothesis/solution
target
machinea
thefunction be for following
training 2 hypotheses need
can
heuristic
represents x hypothesis
of
Basics approximates the hypothesis functioncan used 2x 8,749
every that
output algorithm are x2 hypothesis we hypotheses. are
targeta describe
space,
the
all-observed 2
=
classifies hernce hypotheses
hypothesis
linear that has distinct
x dataset. 1) hypothesis given
language
theto (2 x3+ the
the best hypotheses
3.1 are and searching optimized
a
that hypothesisrepresent a
variables
as
learning x4) anyit x3
there =81,920
x5x4
training of on possible
of would with Table is, x4onelarger
However, set the
approximations space
hypotheses hypothesis
representation tree. based
that
the attribute, x3and the wherein restrict
thethat would input a generated
aby consistent
hypothesis
as
only the instances; much
from an a
hypothesis/soluti
in >x3 finds generate
in hypothesis space the given in attribute. attribute to
order
hypotheses the White the instances x3 is hypothesis important that
possible algorithnm represents space
possible maps hypothesis isthat attribute of x4 of 3
Search will
the be each x4 the x the in
one
general
Black, set (3 strategy
of possible thatrepresents can space Big for 5 be of hypothesis methods
Spaceofall all set
3.4.2
Hypothesis learning space No
Short
No No
No
Brown, Medium,all 4
the 4 of empty
x each will each
best also
a the that
function the
hypothesis No Yes, values x the is Space the
search
set of this Generally,
best representsbias. Version
of
Yes, Long,Yes, Yes, (4
covering for the therefor ordering search
hypotheses generate thefor most improving
the set From machinealgorithm language each -Horns - - Yes, - -Hooves these
the -
Tail TusksPaws- Color - ol represents
'?"
Therefore, search Heuristic
a
is the space. Fur Size [?, only Thus, is Heuristic
spaewords, determine the of [Link], instances
Considering values the
Hypothesis search
[Link] algorithm
of a subset
set
The can to to
algorithms iteratively
Every regression including
instances. one
Hypothesis hypothesis specifyingVersion we symbols
other would manner The For values. distinct
So,
more instance., specific3.4.3Heuristicmeasure.
thespace.
tree two by of by
In fit a as
tested
[Link] the
86 Machine Learning hypothesis will be
initial [Link] hypothesis is a targu
the hypothesis space or a path [Link] tested
itis guaranteedto then
function or the goal state to see if it is a increases the efficiency because tough
solution, find
a
it will be selected. This method [Link] is useful for solving heuristic problems
whid
better hypothesis but may not bethe bestThe typical example problem solved by is search
could not solved by any other method.
the travelling salesman problem. climbing methods,
search methods are hill
Several commonly used heuristicsimulated-annealing, A* algorithm, andIgenetic algorithm
Constrairt
satisfaction problems, best-first search,
3.4.4 Generalization and Specialization
this concept hierarchy, let us apply this genei
In order to understand about how we construct
generalization of the most specific hypothes
principle of generalization/specialization relation. By the hypothesis space can be searched for
and by specialization of the most general hypothesis,
but does not match any negate
an approximate hypothesis that matches all positive instances
instance.

Searching the Hypothesis Space


There are two ways of learning the hypothesis,consistent with alltraining instances from the larz
hypothesis space.
1. Specialization - General to Specific learning
2. Generalization -Specific to General learning

Scon for information on 'Additional Examples on Generalization and


Specialization'

Generalization - Specific to General


the hypothesis space for an Learning This learning methodology will search throu
approximate hypothesis by generalizing the most specific hypothesis
Example 3.2: Consider the training instances shown
General Learning.
in Table 3.1 and illustrate Speciict
Solution: We will start from all false or
restrictive specialization. Consider only thethe most specific hypothesis to the mos
hypothesis. Ignore the negative instances. positive instances and determine
the most spea
This learning isillustrated as follows: generalize
The most specific hypothesis is
h= <p
taken now, which will not
Read the first instance I1,
to
classify any instance to true.
classified by the hypothesis hl. generalize the hypothesis hsothat instance canb
11: No Short Yes
No No this positive
hl= <No Short Yes No
No Black No Big Yes (Positive
innstancel
Black No
Big
Basics of Learning Theory " 87

at When reading the second instance I2 it is a negative instance, so ignore t


ind 2: Yes
h2 = <No
Short No
Short
No
Yes No
No Brown Yes
No Black
Medium No (Negative instance)
No Big
Similarly. when reading the thirdinstance 3, itis apositive instance so generalize h2 to h3 to
accommodate it. The resulting h3 is generalized.
I3: No Short Yes No No Black No Medium Yes (Positive instance)
thms h3 = <No Short les No No Black No

Ignore 14 sine it is a negative instance.


14: No Long No Yes Yes White No Medium No(Negative instance)
ener h4=<No Short Yes No No Black No
thess
ed fr When reading the fith instance 15,h4 is further generalized to h5.
I5: No Short Yes Yes Yes Black No Big Yes (Positive instance)
gative h5 = <No Short Yes Black No

Now, after observing all the positive instances, an approximate hypothesis h5 is generated
which can now classify any subsequent positive instance to true.

larg: Specialization - General to Specific Learning This learning methodology will search through
the hypothesis space for an approximate hypothesis by specializing the most general hypothesis.
Example 3.3: lMlustrate learning by Specialization - General to Specific Learning for the data
instances shown in Table 3.1.

Solution: Start from the most general hypothesis which willmake true allpositive and negative
instances.
Initially,
h=<? ? ? ?>
roug
his more general to classify all instances to true.
hesis
I1: No Short Yes No No Black No Big Yes (Positive instance)
hl =<? 2 ?

I2: Yes Short No No No Brown Yes Medium No (Negative instance)


h2 = <No ?
<? 2 Yes ? A.
<? Black ?

<? ? No

<? ? ? Big
h2 imposes constraints so that it willnot classity a negative instance to true.
I3: No Short Yes No No Black No Mediunn Yes (Positive instance)
h3 = <No ? ?
tame

Yes .
88 Machine Learning
Black ? ?>
<? ?
No ?>
<? ?
Big>
<? ?
White No Medium No (Negative instance)
14: No No Yes Yes
Long ? ?>
h4 = <? 2 Yes 2
? ?>
<? Black
<? ? Big> ?
this negative instance.
Kemove any hypothesis inconsistent with
15: No Short Yes
Yes Yes Black No Big Yes (Positive instance)
? ?>
h5 = <? Yes
<? ? Black ?>

<? 2 ? ? ? Big>
Thus, h5 is the hypothesis space generated which will classify the positive instances to true and
negative instances to false.

3.4.5 Hypothesis Space Search by Find-S Algorithm


Find-S algorithm is guaranteed to converge to the most specific hypothesis in H that is consistent
with the positive instances in the training dataset. Obviously, it will also be consistent with the
negative instances. Thus, this algorithm considers only the positive instances and eliminates negative
instances while generating the hypothesis. It initially starts with the most specific
hypothesis.
Algorithm 3.1: Find-S
Input: Positive instances in the Training dataset
Output: Hypothesis 'h
1. Initialize 'h' to the most
specific hypothesis.
h=< ....>
2. Generalize the
initial
3. For each subsequent hypothesis the first positive instance [Since 'h' is more
for
instances: speCt
If it is a positive instance,
Check for each attribute value in
the instance with
If the attribute the
Else if the
value is the same as
the hypothesis"h'.
to '?' in 'h'. attribute value is
different hypothesisvalue, then do nothing
than the
Else if it is a negative instance, hypothesisvalue, change
Ignore it.

Example 3.4: Consider the


details of the performance of : training
dataset of 4
students
and
semester. Apply the Find-S algorithm, their instancesof shown in Table 3.2. 1t Containsthe
likelihood getting ajob offer or notintheir
final
89
Theory
Basics of Learning

Table 3.2: Training Dataset


Interest Job
CGPA nteractiveness Practical Communication Logical Offer
Knowledge Skills Thinking
Yes Yes
29 Yes Excellent Good Fast
29 Yes Good Good Fast Yes Yes

28 No Good Good Fast No No


29 Yes Good Good Slow No Yes

Solution:
Step 1: Initialize 'h' to the most specific hypothesis. There are 6 attributes, so for each attribute,
we initially fill'ã inthe initial hypothesis ht.
h=<0
Step 2: Generalize the initial hypothesis for the first positive instance. I1 is a positive instance,
so generalize the most specific hypothesis 'h to include this positive instance. Hence,
I1: 29 Yes Excellent Good Fast Yes Positive instance
h=<9 Yes Excellent Good Fast Yes>
Step 3: Scan the next instance I2, since 2 is a positive instance. Generalize 'h' to include positive
instance 12. For each of the non-matching attribute value in h' put a ? to include this positive
instance. The third attribute value ismismatching in 'h with I2, so put a '?.
I2: 29 Yes Good Good Fast Yes Positive instance
h=<9 Yes Good Fast Yes>
Now,scan [Link] it is a negative instance, ignore it. Hence, the hypothesis remains the same
without any changeafter scanning I3.
I3: 28 No Good Good Fast No Negative instance
h=<9 Yes Good Fast Yes>
Now scan 14. Since it is a positive instance, check for mismatch in the
The 5th and 6th attribute value are mismatching, so add '? to those hypothesis 'h' with 14.
attributes in 'h'.
14: 9 Yes Good Good Slow No Positive instance
h=<9 Yes 2 Good ?>
Now, the final hypothesis generated with Find-S algorithm is:
h=<9 Yes Good ?>
It includes all positive instances and obviously ignores any negative instance.

Limitations of Find-S Algorithm


1. Find-S algorithm tries to find a
hypothesis that is consistent with positive
all negative instances. As long as the training instances, ignoring
dataset is consistent, the hypothesis found by this
algorithm may be consistent.
2. The algorithm finds only one unique
hypothesis,
that are consistent with the training dataset. wherein there may be nmany other hypotheses
90 Machine Learning hencesuch
inconsistent data instances
some errors; it
dataset nnaycontain consistent hypothesis since ignores negative
3. Many tinws,the training determining the
can mislead this algorithm in
consistent with the training data
instances.
hypotheses that are
Hence, it is necessary to findIthe set
of
the limitations of Find-S algorithm, Candidat.
including the negative examples. To
overcome
hypotheses consisternt with the training
theset of all
Elimination algorithm was proposedtooutput
dataset.

3.4.6 Version Spaces


space contains the subset of hypotheses from the hypothesis space that is consisBen
The version
with alltraining instances in the training dataset.

Scan for information on 'Additional Examples on Version Spaces'

List-Then-Eliminate Algorithm
The principle idea of this learning algorithm is to initialize the version space to contain all
and then eliminate any hypothesis that is found inconsistent with any hypotheses
the algorithm starts with a version space to contain all training instances. Initially
The hypotheses that are inconsistent with the hypotheses scanning each training instane
training
outputs the list of remaining hypotheses that are all instance are eliminated. Finally, the algorithm
consistent.
Algorithm 3.2: List-Then-Eliminate
Input: Version Space - a list of all
Output: Set of consistent hypotheses hypotheses
1. Initialize the
version space with alist of
2. For each training instance, hypotheses,
remove from version space any
hypothesis that is inconsistent.
This algorithm works fine
deploy this algorithm. Hence,, aif the
algorithm. variationhypotof this
hesisideaspaceis is finite but diticult '

VersionSpaces and the introduced in pract


the ically
it is Eliminatit
Candidate
Candidate
generate all Elimination
Version space learning is to
combination of theconsitwostentcaseshypotAlhesesgorithm
the version space by the
Specific to General
Sto namely, around. This
learning,-GSpecieneraalliizzee
General to Specific learning
G to the
exclude the negative
algorithm comput
include positive example
Basics of Learning Theory " 91

Using the Candidate Elimination algorithm, we can compute the version space containing
all(and only those) hypotheses from H that are consistent with the given observed sequence of
training instances. The algorithm defines two boundariescalled 'general boundary which is a set of
all hypotheses that are the most general and 'specific boundary' which is a set of all hypotheses that
are the most specific. Thus, the algorithm limits the version space to contain only those hypotheses
that are most general and most specific. Thus, it provides a compact representation of List-thern
algorithm.

Algorithm 3.3: Candidate Elimination


Input: Set of instances in the Training dataset
Output: Hypothesis G and S
1. Initialize G, to the maximally general hypotheses.
2. Initialize S, to the maximally specific hypotheses.
Generalize the initial hypothesis for the first positive instance.
3. For each subsequent new training instance,
If the instance is positive,
Generalize S to include the positive instance,
Check the attribute value of the positive instance and S,
If the attribute value of positive instance and S are different, fill that
field value with '?'.
If the attribute value of positive instance and S are same, then do no
change.
Prune G toexclude all inconsistent hypotheses in G with the positive instance.
If the instance is negative,
Specialize G to exclude the negative instance,
Addto G all minimal specializations to exclude the negative example and
be consistent with S.
If the attribute value of S and the negative instance are different, then
fill that attribute value with S value.
If the attribute value of S and negative instance are
same, no need to
update 'G and fill that attribute value with '?.
Remove from S all inconsistent hypotheses with the negative instance.

Generating Positive Hypothesis 'S If it is a positive example, refine S to include the


instance. We need to generalize S to include the positive instance. The hypothesis is the positive
of "S' and positive instance. When generalizing, for the first conjunction
positive instance, add to S minimal
generalizations such that S is filled with attribute values of the positive instance. For the all
subsequent
Positive instances scanned, check the attribute value of the positive
instance and S obtained in the
fill that
S are different,
92 Machine Learning
positive instance and field val
values of same, no change is required.
previous iteration. Ifthe attributee instance and S are
positive
with a"?. If the attribute values of
If it is a negative instance, it skips.
G' If it is a negative instance, refine Gto exclude the
Generating Negative Hypothesis G to exclude all inconsistent hypotheses in G with the positi.
negative instance. Then, prune specializations to exclude the negative instance
ax
to G all minimal
instance. The idea is to add hypothesis indicates general hypothesis.
with the positive instance. Negative
be consistent negative instances are different, then fillthat field wit
Ifthe attribute values of positive and
negative instance as true. If the
positive instance value so that the hypothesis does not classify that
need to update 'G and fll that
attribute values of positive and negative instances are same, then no
attribute value with a ?.
Generating Version Space - [Consistent Hypothesis] We need to take the combination of
sets in 'G' and check that with '[Link] the combined set fields are matched with fields in 'S, then
only that is included in the version space as consistent hypothesis.

Example 3.4: Consider the same set of instances from the training dataset shown in Table 3.3
and generate version space as consistent hypothesis.
Solution:
Step 1: Initialize 'G' boundary to the maximally general
G=<? ? ?
hypotheses,
?>
Step 2: Initialize 'S boundary to the
each attribute, we initially fill '¢ in themaximally specific hypothesis. There are 6 attributes, so or
S= <0
hypothesis 'S.
Generalize the initial hypothesis for the
so generalize the most specific hypothesis 'S' to first positive instance. I1 is a positive instane
11: >9 Yes
Excellent Good
include this positive instance. Hence,
S1 =<9 Yes Fast Yes
G1 = <?
Excellent Good Fast
Positive instance
Yes>
?
?
Step 3: ?>
Iteration 1
Scan the next instance 12.
instance /2. For each of Since I2 is a
the positive instance,
instance. The third attribute non-mat
value isching attribute value in 'S1'.
generalize positive
'S1' to include Positivw
12:
S2 =<9
29 Yes
Yes
Good
?
miGoodsmatchingFastin 'S1' with I2, so include tni
Duta'?' to
put a '?'.
Prune G1 to
exclude all Good Fast Yes Positive instance
consistent
G2 = <?
withthis positive
?
iinstnconsiste nt hy Yes>
ance, there is nopotheses with the Since GI `
?
change. The resulpositive
?>
ting G2 instance.
is,
Basics of Learning Theory 93

Iteration 2
Now Scan 3,
No Good Good Fast No Negative instance
Since it is anegative instance, specialize G2 to exclude the negative example but stay
consistent with S. Generate hvpothesis for each of the non-matching attribute value in S2
and fill with the attribute value of S2. In those generated hypotheses, for all
matching attribute
values, put a ?. The first, second and 6th attribute values do not match, hence '3 hypotheses
are generated in G3.
There is no inconsistent hypothesis in S2 with the negative instance, hence S3
same.
remains the
G3= <9 ?
<? Yes ? ?
<? ? Yes>
S3 =<>9 Yes Good Fast Yes>
Iteration 3
Now Scan 14. Since it is a positive instance, check for mismatch in the
The 5th and 6h attribute value are mismatching, so add ? to those hypothesis 'S3 with 14.
attributes in 'S4.
14: >9 Yes Good Good Slow No Positive
instance
$4 =<9 Yes ? Good ? ?>
Prune G3 to exclude all inconsistent hypotheses with the positive
instance 14.
G3 =<9 ? ? ? ?>
<? Yes ? ? ?>
<? ? ? ? Yes> Inconsistent
Since the third hypothesis in G3 is inconsistent with this positive
one. The resulting G4 is, instance, remove the third
G4 = <9 ? ? ?>
<? Yes ? ? ?>
Using the two boundary sets, S4 and G4, the version space is
consistent hypotheses. converged to contain the set of
The final version space is,
<9 Yes ? ? ?>
<9 ? Good ? ?>
<? Yes ? Good ?
Thus, the algorithm finds the version space to contain
general and most specific. only those hypotheses that are most
The diagrammatic representation of deriving the
version space is shown in Figure 3.2.
94 " Machine Learning

Exc Good Fast Yes


S1: >9 Yes

S2: >9 Yes Good Fast Yes

S3: >9 Yes Good Fast Yes

S4: 29 Yes ? Good ?

Version space
Yes ? ? >9 ? Good ? Yes Good

G4: >9 ? ? ? ? Yes ?

G3: 29 ? ? ? ? ? Yes ? ?
? ? Yes

G2:? ? ? ?

G1:? ?

G:?
?
?
Figure 3.2: Deriving the
Version Space
3.5 INDUCTION BIASES
Induction is a process of
model. Inductive bias is learning a target
the set of prior function or
the training data, in
similar to prior knowledge used
order to
perform assumpt
for induction. iItons
general
consi
is also d izibyng atraining dataalgorithm bevo
ered
into agenerd

For example, linear learning new called as the learning


bias of [Link]
regression
related to the target variable. learning concept
assumes s. the
choose greedy best attributes as
neighbour classifier assumes the split
Similarly, C4.5 that the
predictors (independent
deciwhension tree learning variables)art
criterion assumes to alwa"
neighbourall input
instances in const r ucting thealgorithm
and Naive Bayes classifier k-neares
assumes that The
There are two
types bias: of
variables Eucl i
ared ean dista decision
nces tree. tothe
same clas
1. belong
2.
Constraint Restriction
Preference
or
-impose
ordering
-limit the
on hypothesis space independent variables, and so on.

You might also like