0% found this document useful (0 votes)

12 views28 pages

8 Kernel Methods

Kernel methods are algorithms used for pattern analysis in computer science, with the Support Vector Machine (SVM) being a prominent example. These methods utilize kernel functions to enable non-linear decision boundaries by mapping data into high-dimensional feature spaces, allowing for efficient computation of inner products. The choice of kernel can significantly influence the performance of algorithms, and challenges remain in selecting the most effective kernel for specific datasets.

Uploaded by

f20221231

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views28 pages

8 Kernel Methods

Uploaded by

f20221231

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Kernel Methods

Prof. Navneet Goyal,

CS Dept.
BITS-Pilani, Pilani Campus
INDIA

Figure source:
[Link]
[Link]/thbio/group/neuralnet
/index_p.html
Kernel Methods
• In computer science, kernel methods (KMs) are a
class of algorithms for pattern analysis, whose best
known element is the Support Vector Machine
(SVM) (Wikipedia)
• Transformations
• Feature Spaces
• Kernel Functions
• Kernel Tricks
• Inner Products
Kernel Methods
Algorithms capable of operating with kernels include:
• Support vector machine (SVM)
• Gaussian processes
• Fisher's linear discriminant analysis (LDA)
• Principal components analysis (PCA) (Kernel PCA)
• Canonical correlation analysis
• Ridge regression
• Spectral clustering
• Linear adaptive filters
•…
Kernel Methods
• Kernels are non-linear generalizations of
inner products
Kernel Methods
• Any kernel-based method comprises of
two modules:
• Mapping into embedding or feature space
• Learning algorithm designed to discover
linear patterns in that space
Kernel Methods
• Why this approach works?
• Detecting linear patterns has been the focus of
much research in statistics and machine learning
• Resulting algorithms are well understood and
efficient
• Computational shortcut: makes it possible to
represent linear patterns efficiently in high
dimensional space to ensure adequate
representational power
• The shortcut is nothing but the KERNEL FUNCTION
Kernel Methods
• Kernel methods allow us to extend algorithms such as
SVMs to define non-linear decision boundaries
• Other algorithms that only depend on inner products
between data points can be extended similarly
• Kernel functions which are symmetric and positive
definite allows us to implicitly define inner products in
high dimensional
• Replacing inner products in input space with positive
definite kernels immediately extends algorithms like SVM
to
• Linear separation in high dimensional space
• Or equivalently to a non-linear separation in input space
Kernel Methods
• Input space, χ
• High dimensional space, ℍ
• ℍ can be really large!!
• Document classification
• Trigrams
• Vocabulary of 100000 words
• Dimension of feature space reaches 1015
Kernel Functions
• A function K: 𝟀 x 𝟀 → ℝ is called a kernel over 𝟀
if for any two points x, x’ ∈ 𝟀,
K(x,x’) = 〈 ϕ (x), ϕ (x’)〉
For some mapping ϕ : 𝟀 → ℍ to a Hilbert space ℍ,
called a feature space
• K is efficient!
• K(x,x’) is O(N)
• 〈 ϕ (x), ϕ (x’)〉 is O(dim ℍ) with dim ℍ ≫ N
• K is flexible!
• No need to explicitly define or compute ϕ
• Kernel K can be arbitrarily chosen so long as the existense of
ϕ is guaranteed, i.e. K satisfies Mercer’s condition
Kernel Trick: Computational saving of the kernel trick
Example Quadratic Basis function (Andrew Moore Tut.)

The cost of
computation is:
O(m 2 )

(m is the dimension of input)

Whereas the corresponding Kernel is :
K (a, b) = (a × b + 1) 2
The cost of computation is: O (m)

To believe that it is
really the real Kernel :
Mercer’s Theorem

A function K ( xi , x j ) is a kernel (there exists a f ( x) such

that K (xi , x j ) º f (xi )T f (x j )) The Kernel matrix is
Symmetric Positive Semi-definite.
Another version of Mercer’s theorem that isn’t
related to the kernel matrix is: K ( x , x ) function
i j
is a kernel for any g (u ) such that

ò
2
g (u ) du is finite then

ò K (u, v) g (u ) g (v)dudv ³ 0
This version of Mercer’s condition does not require us to
know ϕ (x)!!
Kernels satisfying Mercer’s condition are called Positive
Definite Kernel Functions!
Examples of Kernels

-Some common choices (the first two always satisfying

Mercer’s condition):
-Polynomial kernel K ( xi , x j ) = ( xi t x j + 1) p
-Gaussian* Kernel (data is lifted to infinite dimension):
1 2
K ( xi , x j ) = exp( - xi - x j )
2s 2

*The Gaussian kernel is also known as the squared exponential kernel – SE kernel – or radial basis
function (RBF) kernel
K ( xi , x j ) = tanh(kxi × x j - d )
-Sigmoidal :
(it is not a kernel for every k and 𝞭 ).
-In fact, SVM model using a sigmoid kernel function is
equivalent to a two-layer, feed-forward neural network!!
Some Valid Kernels
Boser, Guyon and Vapnik, 1992

• Polynomial (Splines etc.)

K (x i , x) = (1 + x i × x)n
• Gaussian (Radial Basis Function Networks)
2
xi - x
K (x i , x) = exp(- 2s 2 )
• Sigmoid (Two-Layer Perceptron)
K (x i , x) = tanh( L + x i × x) only for certain L

x1 k a1 y1
sign
x y
x2
k
a 2 y2
Feature Spaces
F : x ® F( x), R ® F d

example: ( x, y ) ® ( x , y , 2 xy )
2 2

14
Kernel Trick
Kernel Trick: Examples

Consider the mapping:

The points in feature space corresponding to

x1 and x2:

Find:
Kernel Trick: Examples
Kernel Functions: Example
§ Given x = (x1, x2, x3); y = (y1, y2, y3). Find the
kernel for the function:
f(x)=(x1x1, x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3)
Also find f(x).f(y) without finding f(x) & f(y).
Verify your result using x = (1, 2, 3) & y = (4, 5, 6).
Kernel Functions: Example
§ Consider a two-dimensional input space 𝑋 ⊆ ℝE
together with the feature map:
∅: 𝒙 = 𝑥F , 𝑥E ⟼ ∅ 𝒙 = 𝑥FE , 𝑥EE , 2 𝑥F 𝑥E ∈ 𝐹
= ℝG
§ Find the Kernel corresponding to ∅ and 𝐹
§ Also find the Kernel function corresponding to:
∅: 𝒙 = 𝑥F , 𝑥E ⟼ ∅ 𝒙 = 𝑥FE , 𝑥EE , 𝑥F 𝑥E , 𝑥E 𝑥F ∈ 𝐹
= ℝH
§ What conclusions you can draw?
Kernel Functions: Example
§ Find a mapping f from ℝ𝟐 → ℝ𝟔 corresponding to
the kernel 𝐾 𝑿, 𝒀 = [2 𝑿. 𝒀 + 3]E
§ For a given kernel, is the mapping unique?
Kernel Functions
§ How to choose the mapping ϕ ?
ú Several common non-linear mappings can be used
ú In fact we do not even need to know what the mapping
is J
§ How to avoid CoD?
ú Dot product in feature space can be represented as a
kernel function of input points:
K(x,x’) = ϕ (x).ϕ (x’)
ú SVMs do not suffer from CoD
Complexity of SVMs depend on the number of Support
Vectors (SVs)
Kernel Functions
§ Which Kernel to choose?
ú Properties of the kinds of kernels that can be used
to replace the dot product have been studiedJ
ú Admissible Kernels functions are:
Polynomial Kernel
Gaussian Radial Basis Function Kernel
Sigmoid Kernel
ú Each of these Kernels result in a different nonlinear
classifier in the input space
Kernel Functions
§ Which Kernel to choose?
ú No golden rules for determining which admissible
kernel will result in the most accurate SVM L
ú Kernel chosen does not make a large difference in
resulting accuracy
ú SVM training always finds a global solution
§ Challenges in SVM
ú Scalability
Need to improve the speed of training/testing so that SVM
scales to very large data sets (millions of SVs)
ú Determining best Kernel for a given data set
ú More efficient implementations of SVM for multiclass
problems
Kernel Functions: Examples
§ Decision boundary a circle cantered at
(0.5,0.5) with r=0.2
§ Points on circles with centre at origin and radii
1 & 2 respectively.
Kernel Trick
• Simple trick to kernelize existing algorithms that are
based on inner products

• Replace inner products with Kernels

• Explicit feature representation not required

25
Making New Kernels
Important Kernel Issues

How to know which Kernel to use?

- This is a good question and actually still an open question,
many researches have been working to deal with this issue
but still we don’t have a firm answer. It is one of the weakness
of SVM. We will see an approach to this issue latter.
- How to verify that rising to higher dimension using a specific
kernel will map the data to a space in which they are linearly
separable?

- For most of the kernel function we don’t know the

corresponding mapping function ϕ (x) so we don’t know to
which dimension we rose the data. So even though rising to
higher dimension increases the likelihood that they will be
separable we can’t guarantee that.
Important Kernel Issues

Gaussian Radial Basis Kernel lifts the data to infinite dimension

so our data is always separable in this space so why don’t we
always use this kernel?
First of all we should decide which s to use in this kernel
1 2
( exp(- xi - x j )).
2s 2

Secondly, a strong kernel ,which lifts the data to infinite

dimension, sometimes may lead us the severe problem of
Overfitting

Symptoms of overfitting:
1. Low margin à poor classification performance
2. Large number of support vectorsà Slows down the
computation

Lecture No 7
No ratings yet
Lecture No 7
7 pages
Understanding SVM Kernel Functions
No ratings yet
Understanding SVM Kernel Functions
12 pages
Understanding Kernel Models in Machine Learning
No ratings yet
Understanding Kernel Models in Machine Learning
56 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
25 pages
Understanding Kernels in SVM
No ratings yet
Understanding Kernels in SVM
5 pages
Kernel Methods in Data Modeling
No ratings yet
Kernel Methods in Data Modeling
49 pages
Non-linear Learning with Kernels
No ratings yet
Non-linear Learning with Kernels
23 pages
Kernelization in Machine Learning Models
No ratings yet
Kernelization in Machine Learning Models
32 pages
Understanding Kernel Trick in SVMs
No ratings yet
Understanding Kernel Trick in SVMs
32 pages
Overview of Kernel Methods in ML
No ratings yet
Overview of Kernel Methods in ML
8 pages
Kernels
No ratings yet
Kernels
7 pages
Understanding Kernels in SVMs
No ratings yet
Understanding Kernels in SVMs
13 pages
Inbound 4219268206608678432
No ratings yet
Inbound 4219268206608678432
36 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
23 pages
Understanding Kernel Methods in ML
No ratings yet
Understanding Kernel Methods in ML
5 pages
SVM Kernel Functions Explained
No ratings yet
SVM Kernel Functions Explained
9 pages
SVM Kernel Functions Explained
No ratings yet
SVM Kernel Functions Explained
5 pages
Nonlinear SVMs and Kernel Functions
No ratings yet
Nonlinear SVMs and Kernel Functions
16 pages
High Dimensional Kernel Methods Explained
No ratings yet
High Dimensional Kernel Methods Explained
33 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
6 pages
SVM Kernel Types: Linear, Polynomial, Gaussian
No ratings yet
SVM Kernel Types: Linear, Polynomial, Gaussian
8 pages
Introduction to Kernel Methods in ML
No ratings yet
Introduction to Kernel Methods in ML
5 pages
Understanding Kernel Functions in ML
No ratings yet
Understanding Kernel Functions in ML
19 pages
SVM Kernels in Machine Learning
No ratings yet
SVM Kernels in Machine Learning
15 pages
Kernel Functions and Their Properties
No ratings yet
Kernel Functions and Their Properties
3 pages
Kernel Functions in Support Vector Machines
No ratings yet
Kernel Functions in Support Vector Machines
51 pages
Kernel Methods in Data Mining
No ratings yet
Kernel Methods in Data Mining
24 pages
Understanding Kernel Methods in Data Analysis
No ratings yet
Understanding Kernel Methods in Data Analysis
24 pages
Understanding Kernel Methods in ML
No ratings yet
Understanding Kernel Methods in ML
3 pages
Understanding Linear Kernels in SVM
No ratings yet
Understanding Linear Kernels in SVM
19 pages
Understanding Kernel Methods in ML
No ratings yet
Understanding Kernel Methods in ML
31 pages
Kernel Methods for Pattern Analysis
No ratings yet
Kernel Methods for Pattern Analysis
77 pages
Understanding Kernel Machines in SVM
No ratings yet
Understanding Kernel Machines in SVM
36 pages
Kernel Machines and SVMs: Techniques & Future
No ratings yet
Kernel Machines and SVMs: Techniques & Future
58 pages
Kernel Methods for Nonlinear Models
No ratings yet
Kernel Methods for Nonlinear Models
26 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
760 pages
SVMs and Kernel Methods Tutorial
No ratings yet
SVMs and Kernel Methods Tutorial
85 pages
Svms Kernels
No ratings yet
Svms Kernels
16 pages
Understanding Mercer Kernels in Machine Learning
No ratings yet
Understanding Mercer Kernels in Machine Learning
21 pages
Types of Machine Learning Kernels
No ratings yet
Types of Machine Learning Kernels
5 pages
Kernelized Machine Learning Algorithms
No ratings yet
Kernelized Machine Learning Algorithms
31 pages
SVM Kernels
No ratings yet
SVM Kernels
1 page
Kernel PCA and Matrix Computations
No ratings yet
Kernel PCA and Matrix Computations
23 pages
Understanding the Kernel Trick in SVMs
No ratings yet
Understanding the Kernel Trick in SVMs
40 pages
Understanding Kernel Functions in ML
No ratings yet
Understanding Kernel Functions in ML
6 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
16 pages
Dual Problems and Kernels in SVMs
No ratings yet
Dual Problems and Kernels in SVMs
14 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
15 pages
Introduction To SVM
No ratings yet
Introduction To SVM
29 pages
Support Vectors
No ratings yet
Support Vectors
10 pages
SVM and Kernel Methods in Deep Learning
No ratings yet
SVM and Kernel Methods in Deep Learning
15 pages
SVM and Kernel Methods Overview
No ratings yet
SVM and Kernel Methods Overview
4 pages
Support Vector Machines Explained
No ratings yet
Support Vector Machines Explained
26 pages
Lecture29 Kernel Kmeans
No ratings yet
Lecture29 Kernel Kmeans
64 pages
Mercer’s Theorem in Machine Learning
No ratings yet
Mercer’s Theorem in Machine Learning
31 pages
Lecture20 Annotated
No ratings yet
Lecture20 Annotated
38 pages
Understanding Kernel Machines in ML
No ratings yet
Understanding Kernel Machines in ML
16 pages
About I 2 I Solutions
No ratings yet
About I 2 I Solutions
14 pages
TC Electronic M-100 User Manual
No ratings yet
TC Electronic M-100 User Manual
21 pages
Drying Rate and Moisture Content Analysis
No ratings yet
Drying Rate and Moisture Content Analysis
4 pages
Holistic Academy: GPSC & Engineering Prep
No ratings yet
Holistic Academy: GPSC & Engineering Prep
34 pages
Setting Personal Fitness Goals
No ratings yet
Setting Personal Fitness Goals
25 pages
Calculating Limits: Exercises 21-74
No ratings yet
Calculating Limits: Exercises 21-74
31 pages
Mid-Atlantic Diction Pronunciation Guide
No ratings yet
Mid-Atlantic Diction Pronunciation Guide
2 pages
Awwa M45
50% (2)
Awwa M45
6 pages
Test Review - Interchange 1
No ratings yet
Test Review - Interchange 1
6 pages
Understanding Marketing's Role in Business
No ratings yet
Understanding Marketing's Role in Business
23 pages
Montalvo Salon & Spa Work Plan Analysis
No ratings yet
Montalvo Salon & Spa Work Plan Analysis
23 pages
Photonic Circuits - Interferometers and Applications: Dr. Lukas Chrostowski
No ratings yet
Photonic Circuits - Interferometers and Applications: Dr. Lukas Chrostowski
29 pages
Elon Musk's Twitter Acquisition Saga
No ratings yet
Elon Musk's Twitter Acquisition Saga
4 pages
C Programming Practical Exercises
No ratings yet
C Programming Practical Exercises
14 pages
Understanding Sentence Flow Charts
No ratings yet
Understanding Sentence Flow Charts
9 pages
Tennis Speed and Agility Training Guide
No ratings yet
Tennis Speed and Agility Training Guide
3 pages
Zenith Chris Heintz Aircraft Manual 1976
No ratings yet
Zenith Chris Heintz Aircraft Manual 1976
144 pages
Enphase Technology Overview and Impact
No ratings yet
Enphase Technology Overview and Impact
31 pages
Minimizing Hood Jam in IRUT & LF
No ratings yet
Minimizing Hood Jam in IRUT & LF
14 pages
Optimal Nitrogen Removal in Natural Gas
No ratings yet
Optimal Nitrogen Removal in Natural Gas
11 pages
Statement of Purpose: Electrical Engineering
100% (6)
Statement of Purpose: Electrical Engineering
2 pages
Introduction to Programming Basics
No ratings yet
Introduction to Programming Basics
67 pages
Maths Lab Equipment & Kits Catalogue
No ratings yet
Maths Lab Equipment & Kits Catalogue
2 pages
Functional Training and Phase Play in Soccer
100% (4)
Functional Training and Phase Play in Soccer
3 pages
Effective Reading for Dyslexia
No ratings yet
Effective Reading for Dyslexia
2 pages
Key Insights on Death of a Salesman
No ratings yet
Key Insights on Death of a Salesman
19 pages
Bad Conversation Habits in English
No ratings yet
Bad Conversation Habits in English
6 pages
Mechanical Systems: Types & Modeling
No ratings yet
Mechanical Systems: Types & Modeling
28 pages
Marieb 4e Anatomy & Physiology Test Bank
No ratings yet
Marieb 4e Anatomy & Physiology Test Bank
28 pages
Understanding Single Phase Transformers
No ratings yet
Understanding Single Phase Transformers
73 pages

8 Kernel Methods

Uploaded by

8 Kernel Methods

Uploaded by

Kernel Methods

Prof. Navneet Goyal,

(m is the dimension of input)

A function K ( xi , x j ) is a kernel (there exists a f ( x) such

-Some common choices (the first two always satisfying

• Polynomial (Splines etc.)

Consider the mapping:

The points in feature space corresponding to

• Replace inner products with Kernels

• Explicit feature representation not required

How to know which Kernel to use?

- For most of the kernel function we don’t know the

Gaussian Radial Basis Kernel lifts the data to infinite dimension

Secondly, a strong kernel ,which lifts the data to infinite

You might also like