0% found this document useful (0 votes)
12 views53 pages

Basic Learning Theory Overview

The document discusses basic learning theory, focusing on the processes of learning, types of problems, and the design of learning systems. It covers concepts such as hypothesis representation, generalization, specialization, and various algorithms like Find-S and Candidate Elimination. Additionally, it outlines the machine learning process, including model training, evaluation, and re-sampling methods like cross-validation.

Uploaded by

shahidhhkhan58
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views53 pages

Basic Learning Theory Overview

The document discusses basic learning theory, focusing on the processes of learning, types of problems, and the design of learning systems. It covers concepts such as hypothesis representation, generalization, specialization, and various algorithms like Find-S and Candidate Elimination. Additionally, it outlines the machine learning process, including model training, evaluation, and re-sampling methods like cross-validation.

Uploaded by

shahidhhkhan58
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

.

IN
S. J. P. N. TRUST’S
HIRASUGAR INSTITUTE OF TECHNOLOGY, NIDASOSHI

C
Accredited at 'A+' Grade by NAAC
Programmes Accredited by NBA: CSE & ECE

N
Department of Computer Science & Engineering

SY
U
VT

Module 2:
Chapter 3: Basic Learning Theory
Dr. S. V. Manjaragi
Asso. Prof. , Dept. of Computer Science & Engg.,
Hirasugar Institute of Technology, Nidasoshi
6/2/2025
.IN
Learning

C
• Learning is a process by which one can acquire

N
knowledge and construct new ideas or concepts
based on the experiences.

SY
• The standard definition of learning proposed by tom
mitchell is that a program can learn from E for the task
U
T, and P improves with experience E.
VT
• There are two kinds of problems – well-posed and ill-
posed.
• Computers can solve only well-posed problems, as
these have well-defined specifications and have the
following components inherent to it.
• Class of learning tasks (T)
• A measure of performance (P)
• A source of experience (E)
.IN
Learning Environment

C
N
SY
U
VT
.IN
Learning Types

C
N
SY
U
VT
.IN
Introduction to Computation Learning

C
Theory

N
SY
U
VT
.IN
Design of a Learning System

C
• A system that is built around a learning

N
algorithm is called a learning system.

SY
• The design of systems focuses on these steps:
U
– Choosing a training experience
VT
– Choosing a target function
– Representation of a target function
– Function approximation
Introduction to Concept Learning

.IN
• Concept learning is a learning strategy of acquiring

C
abstract knowledge or inferring a general concept or

N
deriving a category from the given training samples.

SY
• It is a process of abstraction and generalization from the
data.
U
• Concept learning helps to classify an object that has a set
VT
of common, relevant features.
• Concept learning requires three things:
– Input - Training dataset which is a set of training instances, each
labeled with the name of a concept or category to which it
belongs. Use this past experience to train and build the model.
– Output - Target concept or Target function f. It is a mapping
function f(x) from input x to output y. It is to determine the
specific features or common features to identify an object.
– In other words, it is to find the hypothesis to determine the
target concept. For e.g., the specific set of features to identify
an elephant from all animals.
– Test - New instances to test the learned model.
Introduction to Concept Learning

.IN
• Concept learning is defined as "Given a set of

C
hypotheses, the learner searches through the

N
hypothesis space to identify the best hypothesis that

SY
matches the target concept".

U
VT

• Here, in this set of training instances, the independent


attributes considered are 'Horns', 'Tail', 'Tusks', 'Paws',
'Fur', 'Color', 'Hooves' and 'Size'. The dependent attribute
is 'Elephant'.
• The target concept is to identify the animal to be an
Elephant.
.IN
Representation of a Hypothesis

C
• A hypothesis 'h' approximates a target function 'f' to

N
represent the relationship between the independent

SY
attributes and the dependent attribute of the training
instances.
U
• The hypothesis is the predicted approximate model that
VT
best maps the inputs to outputs.
• Each hypothesis is represented as a conjunction of
attribute conditions in the antecedent part.

• The set of hypothesis in the search space is called as


hypotheses.
• Generally 'H' is used to represent the hypotheses and 'h'
is used to represent a candidate hypothesis.
.IN
Representation of a Hypothesis

C
• In the antecedent of an attribute condition of a hypothesis,

N
each attribute can take value as either '?' or 'Ø' or can hold a

SY
single value.
• "?" denotes that the attribute can take any value [e.g., Color =
?] U
VT
• "Ø" denotes that the attribute cannot take any value, i.e., it
represents a null value [e.g., Horns = Ø ]
• Single value denotes a specific single value from acceptable
values of the attribute, i.e., the attribute 'Tail' can take a value
as 'short' [e.g., Tail = Short]
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
.IN
Hypothesis Space
• Hypothesis space is the set of all possible hypotheses that

C
approximates the target function f.

N
• The subset of hypothesis space that is consistent with all-

SY
observed training Instances is called as version space.
• From this set of hypotheses in the hypothesis space, a
U
machine learning algorithm would determine the best
VT
possible hypothesis that would best describe the target
function or best fit the outputs.
• Every machine learning algorithm would represent the
hypothesis space in a different manner about the function
that maps the input variables to output variables.
• For example, a regression algorithm represents the
hypothesis space as a linear function whereas a decision tree
algorithm represents the hypothesis space as a tree.
.IN
Hypothesis Space

C
N
SY
U
VT
.IN
Heuristic Space Search

C
• Heuristic search is a search strategy that finds an

N
optimized hypothesis/solution to a problem by

SY
iteratively improving the hypothesis/solution
• Heuristic search methods will generate a possible
U
hypothesis that can be a solution
VT
• This hypothesis will be tested with the target function or
the goal state to see if it is a real solution. If the tested
hypothesis is a real solution, them it will be selected.
• commonly used heuristic search methods are hill
climbing methods, constraint satisfaction problems,
best-first search, simulated-annealing, A* algorithm,
and genetic algorithms.
.IN
Generalization and Specialization

C
• By generalization of the most specific hypothesis and

N
by specialization of the most general hypothesis, the

SY
hypothesis space can be searched for an
U
approximate hypothesis that matches all positive
VT
instances but does not match any negative instance.
• Searching the Hypothesis Space
1. Specialization — General to Specific learning
2. Generalization — Specific to General learning
.IN
Generalization – Specific to general learning

C
• This learning methodology will search through the

N
hypothesis space for an approximate hypothesis by

SY
generalizing the most specific hypothesis.

U
VT
VT
U
SY
N
C
.IN
.IN
Specialization – general to specific learning

C
N
• This learning methodology will search through the

SY
hypothesis space for an approximate hypothesis by
U
specializing the most general hypothesis.
VT
Dataset Example

.IN
• Let’s say we’re trying to learn the concept of a "Fruit

C
that is safe to eat" based on some features.

N
• Attributes:

SY
• Color: Red, Green, Yellow
U
• Size: Small, Medium, Large
VT
• Shape: Round, Oval
Training Examples:

Example Color Size Shape Safe to Eat?


1 Red Small Round Yes
2 Green Small Round Yes
3 Red Large Oval No
.IN
• Start with: H0 = (?, ?, ?) This accepts everything.
• After Example 1 (Red, Small, Round, Yes): We specialize H to

C
fit this positive example:

N
• H1 = (Red, Small, Round)

SY
• After Example 2 (Green, Small, Round, Yes): To generalize H1
to also include this example:
U
• H2 = (?, Small, Round)
VT
• After Example 3 (Red, Large, Oval, No):
• We specialize H2 to exclude this negative example.
• H2 = (?, Small, Round) → includes Example 3 because it
doesn't constrain Size or Shape enough.
• But Example 3 has Large size and Oval shape.
• So we need to specialize H2 to exclude this combination.
Possible specializations of H2:
.IN
C
N
SY
(Color, Size, Shape):
1. (?, Small, Round) ← current (too general)
U
VT
→ Try specializing to:
2. (?, Small, ?) → still includes negative
3. (?, ?, Round) → still includes negative
→ Best: (?, Small, Round) already excludes
(Large, Oval)
.IN
Hypothesis Space Search by Find-S Algorithm

C
• Find-S algorithm is guaranteed to converge to

N
the most specific hypothesis in H that is

SY
consistent with the positive instances in the
U
training dataset.
VT
• Thus, this algorithm considers only the positive
instances and eliminates negative instances
while generating the hypothesis.
• It initially starts with the most specific
hypothesis.
VT
U
SY
N
C
.IN
.IN
• Consider the training dataset of 4 instances

C
N
shown in Table 3.2.

SY
• It contains the details of the performance of
students and their likelihood of getting a job
U
VT
offer or not in their final semester.
• Apply the Find-S algorithm
VT
U
SY
N
C
.IN
.IN
Limitations of Find-S Algorithm
• Find-S algorithm tries to find a hypothesis that is

C
consistent with positive instances, ignoring all

N
negative instances. As long as the training

SY
dataset is consistent, the hypothesis found by
this algorithm may be consistent.
U
• The algorithm finds only one unique hypothesis,
VT
wherein there may be many other hypotheses
that are consistent with the training dataset.
• Many times, the training dataset may contain
some errors; hence such inconsistent data
instances can mislead this algorithm in
determining the consistent hypothesis since it
ignores negative instances.
.IN
Version Spaces

C
N
• The version space contains the subset of

SY
hypotheses from the hypothesis space that is
U
consistent with all training instances in the
VT
training dataset.
.IN
List-Then-Eliminate Algorithm

C
• The principle idea of this learning algorithm is to initialize the

N
version space to contain all hypotheses and then eliminate

SY
any hypothesis that is found inconsistent with any training
instances.
U
VT
Version Spaces and the Candidate

.IN
Elimination Algorithm

C
• Version space learning is to generate all consistent hypotheses

N
around.

SY
• This algorithm computes the version space by the combination of
the two cases namely,
– Specific to General learning — Generalize S to include the positive
example
U
VT
– General to Specific learning — Specialize G to exclude the negative
example
• The algorithm defines two boundaries called 'general
boundary' which is a set of all hypotheses that are the most
general
• and 'specific boundary' which is a set of all hypotheses that
are the most specific.
• Thus, the algorithm limits the version space to contain only
those hypotheses that are most general and most specific.
Candidate Elimination Algorithm

.IN
C
N
SY
U
VT
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
.IN
Modelling in Machine Learning

C
N
• A machine learning model is an abstraction of the training dataset

SY
that can perform a prediction on new data.
• Training the model means feeding instances to the machine
U
learning algorithm.
VT
• Training datasets are used to fit and tune the model.
• After training a machine learning algorithm with the training data, a
predictive model is generated as output to which a new data is fed
to make predictions.
• The process of modelling means training a machine learning
algorithm with the training dataset, tuning it to increase
performance, validating it and making predictions for a new
unseen data.
• The major concern in machine learning is what model to select,
how to train the model, time required to train, the dataset to be
used, what performance to expect, and so on.
.IN
Machine Learning Process

C
N
• The four basic steps in the machine learning

SY
process are:
U
1. Choose a machine learning algorithm to suit the
VT
training data and the problem domain
2. Input the training dataset and train the machine
learning algorithm to learn from the data and
capture the patterns in the data
3. Tune the parameters of the model to improve
the accuracy of learning of the algorithm
4. Evaluate the learned model once the model is
built
.IN
Re-sampling Methods

C
N
• Re-sampling is a technique to select a model by

SY
reconstructing the training dataset and test
dataset by randomly choosing instances by
U
some method from the given dataset.
VT
• This method involves selecting different instances
repeatedly from a training dataset to tune a
model.
• It is done to improve the accuracy of a model.
• The common re-sampling model selection
methods are Random tram/test splits, Cross-
Validation (K-fold, LOOCV, etc.) and Bootstrap.
.IN
Cross-Validation

C
N
• Cross-Validation is a method by which we can tune the

SY
model with only training dataset.
U
• It is a model evaluation approach by which we can set
VT
aside some data of the training dataset for validation
and fit the rest of the data to train the model.
• The best model is found by estimating the average of
errors on different test data.
• The popular cross-validation family of methods
includes Holdout method, K-fold cross-validation,
Stratified cross-validation and Leave-One-Out Cross-
Validation (LOOCV).
.IN
Holdout Method

C
N
• This is the simplest method of cross-validation.

SY
• The dataset is split into two subsets called training dataset
and test dataset. The model is trained using the training
U
dataset and then evaluated using the test dataset.
VT
• This holdout method can be applied for a single time which
is called as single holdout method
• or it can be repeated for more than once which is called as
repeated holdout method.
• The average performance on the test dataset is estimated
to evaluate the model.
• Even though this model is very simple, it can exhibit high
variance and the performance largely depends on how the
dataset is split.
.IN
K-fold Cross-Validation

C
N
• It will split the training dataset into k equal

SY
folds/parts creating k — 1 subsets of training set and
U
one test subset.
VT
• Out of the k folds, k — 1 folds are used for training and
one fold is used for testing the model.
• This has to be performed for k iterations and during
each iteration a different fold is selected for testing.
• The average performance of the model on k iterations
is the final estimate of the model performance.
.IN
K-fold Cross-Validation

C
N
SY
U
VT
.IN
K-fold Cross-Validation

C
N
SY
U
VT
.IN
K-fold Cross-Validation

C
N
SY
U
VT
.IN
Stratified K-fold Cross-Validation

C
N
• This method is similar to k-fold cross-

SY
validation but with a slight difference.
U
• Here, it is ensured that while splitting the
VT
dataset into k folds, each fold should contain
the same proportion of instances with a
given categorical value. This is called
stratified cross-validation.
Leave-One-Out Cross-Validation

.IN
(LOOCV)

C
• This method repeatedly splits the n data

N
SY
instances of the dataset into training dataset
containing n — 1 data instances and leaving
U
one data instance for evaluating the model.
VT
• This process is repeated n times and average
test error is then estimated for the model.
• Even though this model is expensive and time
consuming because it has to run for n times
(i.e., n data instances in the dataset), it has
less bias.
Leave-One-Out Cross-Validation

.IN
(LOOCV)

C
N
SY
U
VT
Leave-One-Out Cross-Validation

.IN
(LOOCV)

C
N
SY
U
VT
Model Performance

.IN
• One way to compute the metrics is to form a table called

C
contingency table.

N
SY
U
VT
• True Positive (TP) = Number of cancer patients who are
classified by the test correctly,
• True Negative (TN) = Number of normal patients who do not
have cancer are correctly detected.
• The two errors that are involved in this process:
• False Positive (FP) that is an alarm that indicates that the tests
show positive when the patient has no disease
• and False Negative (FN) : another error that says a patient has
cancer when tests says negative or normal.
• FP and FN are costly errors in this classification process.
Model Performance

.IN
C
N
SY
U
VT
Model Performance

.IN
C
N
SY
U
VT
.IN
Visual Classifier Performance

C
N
SY
U
VT

You might also like