0% found this document useful (0 votes)

32 views5 pages

ECE 449: Machine Learning Concepts

The document provides an overview of various machine learning algorithms including K-Nearest Neighbors, Perceptron, Naive Bayes, Logistic Regression, and Support Vector Machines, detailing their methodologies, advantages, and limitations. It discusses key concepts such as distance metrics, probability estimation, optimization techniques, and the importance of model assumptions. Additionally, it covers parameter estimation methods like Maximum Likelihood Estimation and Maximum A Posteriori Estimation, highlighting their applications in different contexts.

Uploaded by

hzz121600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views5 pages

ECE 449: Machine Learning Concepts

Uploaded by

hzz121600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ECE 449 Machine Learning: Eric Ji

1 K-Nearest Neighbors (Non-linear)

• Find top K nearest neighbors under metric d and return most common/average label/value among
them
– d(x, z) = ( |x − z|p )1/p
P

– L1 (Manhattan) Distance: p = 1
– L2 (Euclidean) Distance: p = 2
• Determine K using validation set
– Small K is sensitive to noise and will overfit
– Large K includes too far examples and will underfit
• Simple to implement but several issues
– Require memory to store dataset
– Computationally expensive inference time
– Sensitive to outliers
– Curse of dimensionality: High-dimensional data spreads far away from each other giving low
performance
• Nonparametric models place mild assumptions on data distribution and good for complex data, but
require storage/computation of entire dataset
• Parametric models place strong modeling assumptions and require fitting the model to achieve more
efficient storage/computation

2 Perceptron (Linear)
• Only applies to linearly separable data
• Perceptrons are linear classifiers trying to learn a hyperplane
• Hyperplane in Rd -space is represented as w0 + wT x = 0 where w ∈ Rd
– w is orthogonal to hyperplane and points to positive half-space
• Predicted label y(x) = sign(wt + b)
• Perceptron algorithm iterates through all the data and simultaneously updates W until all data is
correctly labeled
– Update rule: wnew = w + yx when y(wT x) <= 0
• Theorem: Given w∗ that perfectly separates the data and γ = min|w∗T x( i)|, ∀x(i) in D, the perception
algorithm takes at most 1/γ 2 to converge

1
3 Probability and Estimation
• Useful Probability Properties:
P (A,B)
– Conditional Probability: P (A|B) = P (B)
P (B|A)P (A)
– Bayes Rule: P (A|B) = P (B)

• Useful Log Rules:

– log(AB ) = B · log(A)
– log(AB) = log(A) + log(B)
– log(A/B) = log(A) − log(B)

• Useful Derivative Rules:

d 1
– dx log(x) = x

• From a dataset of joint probabilities P (X1 , X2 , x3 , ..., Xd , Y ) we can calculate P (Y |X1 , X2 , x3 , ..., Xd )

– Intuitive to learn P (Y |X) from joint distribution, but requires lots of data that may not be
attainable to produce accurate model
• Estimate parameters from sparse data using Maximum Likelihood Estimation and Maximum A Pos-
terior Estimation

• MLE chooses parameter θ that maximizes maximizes probability of observing dataset D

– θ̂ = argmaxθ P (D|θ) where P (D|θ) is the likelihood function
• Steps for solving MLE:
– Take log of likelihood
– Take derivative in respect to θ and set equal to 0
– Solve for θ that maximizes the likelihood
• MAP chooses parameter θ that is most probable given prior P (θ) and dataset D

– θ̂ = argmaxθ P (θ|D)
– θ̂ = argmaxθ P (D|θ)P
P (D)
(θ)
according to Bayes Rule
– θ̂ = argmaxθ P (D|θ)P (θ) as P (D) does not depend on θ
• MAP is better than MLE when small number of samples of dataset and prior is accurate
• As the number of samples from our dataset approaches infinity, the prior becomes irrelevant and MAP
will become MLE

2
4 Naive Bayes(Probalistic)
• Aims to learn P (Y |X) through P (X|Y ) and P (Y ) using Bayes rule with conditional independence
assumption to reduce number of parameters to estimate
P (X1 ,...,Xd |Y )P (Y )
– P (Y |X1 , ..., Xd ) = ∝ P (X1 , ..., Xd |Y )P (Y ) ignoring normalization
P (X1 ,...,Xd )

• Conditional Independence: P (X1 , ..., Xd |Y ) = j P (Xj |Y )

– Requires estimating 2(2d − 1) + 1 parameters without assuming conditional independence

– Requires estimating 2d + 1 parameters with assuming conditional independence
• Utilize MLE and MAP to estimate the parameters to learn P (Y |X)
– MAP makes it such that P (Y |X) won’t be 0 if one component of the product is 0

• If X is a continuous value, it is common to assume P (X|Y ) follow a normal distribution

– Variance can be independent of class, feature, or both
• Gaussian Naive Bayes can be linear with many assumptions regarding the data’s distributions

5 Logistic Regression (Linear)

• Discriminative counterpart to Naive Bayes that directly learn P (Y |X)
– Discriminative models directly calculate the weights
– Generative models calculate all the probabilities/parameters to calculate the weights then

• Learn a set of weights for each class

• P (Y |X) can be represented by sigmoid function
1P
– P (Y = c|X) = 1+exp(w0 + j wj Xj )

• Calculate weights using MCLE

– wM CLE = argmaxw P (y ( i)|x(i), w)
Q

– Objective is concave, but does not have a closed form so needs optimization techniques

• Can apply MAP by placing a prior on the weights themselves

– wM CLE = argmaxw P (w) P (y ( i)|x(i), w)
Q

• Logistic regression typically gives the better solution compared to naive bayes, especially with lots of
data and conditional independence does not hold

3
6 Optimization
• Gradient Descent uses first order Taylor expansion approximation to assume an objective function l
around weights w is linear
– l(w + s) = l(w) + g(w)T s where g(w) = ∇l(w)
• Gradient Descent Update rule: wnew = w − αg(w) to minimize l(w)

– Step size α should decrease by a constant rate for each update for good convergence
• Batch gradient uses error over training of entire dataset and updates w
• Stochastic gradient uses error over single sample and updates w
• Newton’s Method uses 2nd order Taylor expansion approximation

– l(w + s) = l(w) + g(w)T s + 21 sT (H(w)s

• Newton’s Update rule: wnew = w − H(w)−1 g(w)
– H(w) is the Hessian matrix which composes of the outer-product of the second derivative of l(w)
in respect to w

• Encorporating a prior for a MAP estimate results in a regularization term when updating weights
– wnew = w − αg(w) − αλw
– Helps reduce overfitting by keeping weights near 0

7 Linear Regression
• Used to learn function that linearly maps X onto Y where Y is continuous
– First choose parameterized for for P (Y |X, w)
– Then derive MLE or MAP and estimate w

• MLE produces Squared loss for objective: l(w) = 1 i

− wT xi )2
P
N i (y

– Closed form solution that minimizes l(w) give w = (X T X)−1 X T y

• MAP produces Square loss plus sum of squared weights objective: l(w) = 1 i
− wT xi )2 + λ||w||22
P
N i (y

– Closed form solution that minimize l(w) gives w = (X T X + λI)−1 X T y

4
8 Support Vector Machine
• Separate positive and negative samples as wide as possible
• Hard margin SVM is for linearly separable data and expects perfect separation
– Objective is to minimize 12 ||w||22 such that y (i) (wT x(i) + b) ≥ 1
– Only need support vectors for inference
∗ y (i) (wT x(i) + b) = 1
• Soft margin SVM allows for misclassified samples in non-linearly separable data
– Objective is to minimize 21 ||w||22 + C i ξi such that y (i) (wT x(i) + b) ≥ 1 − ξi
P

– C is trade-off parameter where C = ∞ causes hard margin

– ξi is slack variable where ξi = max(0, 1 − y (i) (wT x(i) + b)
• In both objectives ||w||22 is the regularize
• In soft margin objective i max(0, 1 − y (i) is the hinge loss
P

• Utilize Lagrangian multiplier to minimize quadratic objective without the constraints

– Want to solve dual problem: maxα minw,b L(w, b, α)

• Hard margin objective: L(w, b, α) = 21 ||w||22 + i αi (1 − y (i) (wt x(i) + b))
P

– w∗ = i α∗ y (i) x(i)
P

• Can also be applied to soft margin

Types and Techniques in Machine Learning
No ratings yet
Types and Techniques in Machine Learning
9 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
19 pages
Machine Learning in EDA Tools
No ratings yet
Machine Learning in EDA Tools
150 pages
ML Exam Ready Notes - MD
No ratings yet
ML Exam Ready Notes - MD
22 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
33 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
34 pages
AIML IAT 1 Notes
No ratings yet
AIML IAT 1 Notes
16 pages
Linear Models in Machine Learning
No ratings yet
Linear Models in Machine Learning
86 pages
L11: Three Algorithms: Probability To Practice: C. V. Jawahar
No ratings yet
L11: Three Algorithms: Probability To Practice: C. V. Jawahar
18 pages
06 - Consolidation and Review
No ratings yet
06 - Consolidation and Review
14 pages
Machine Learning: Logistic Regression & Classifiers
No ratings yet
Machine Learning: Logistic Regression & Classifiers
104 pages
Comprehensive Machine Learning Study Guide
No ratings yet
Comprehensive Machine Learning Study Guide
10 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
9 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
30 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
59 pages
ML Algorithms Cheat Sheet PDF
No ratings yet
ML Algorithms Cheat Sheet PDF
1 page
Simple Linear Regression Derivation Guide
No ratings yet
Simple Linear Regression Derivation Guide
8 pages
ML Master Notes-2
No ratings yet
ML Master Notes-2
24 pages
ML Assignment1 Answers
No ratings yet
ML Assignment1 Answers
12 pages
Supervised & Unsupervised Learning Algorithms
No ratings yet
Supervised & Unsupervised Learning Algorithms
1 page
Deep Learning Algorithm Overview
No ratings yet
Deep Learning Algorithm Overview
24 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
70 pages
In-Depth Guide to Machine Learning Algorithms
No ratings yet
In-Depth Guide to Machine Learning Algorithms
167 pages
AIAP Topics04
No ratings yet
AIAP Topics04
18 pages
Supervised Learning Cheat Sheet
100% (1)
Supervised Learning Cheat Sheet
4 pages
Supervised Learning Overview and Models
No ratings yet
Supervised Learning Overview and Models
4 pages
15 Essential Machine Learning Models
No ratings yet
15 Essential Machine Learning Models
21 pages
Lec 04
No ratings yet
Lec 04
24 pages
Predicting Student Pass Rates
No ratings yet
Predicting Student Pass Rates
17 pages
Unit V: Support Vector Machines & Bayesian Learning
No ratings yet
Unit V: Support Vector Machines & Bayesian Learning
16 pages
Linear and Logistic Regression Overview
No ratings yet
Linear and Logistic Regression Overview
8 pages
Overview of Machine Learning Algorithms
No ratings yet
Overview of Machine Learning Algorithms
53 pages
ML Important Notes
No ratings yet
ML Important Notes
11 pages
Complete ML Cheat Sheet - Detailed Revision Guide
No ratings yet
Complete ML Cheat Sheet - Detailed Revision Guide
24 pages
ML AI Complete Guide
No ratings yet
ML AI Complete Guide
34 pages
Session 5
No ratings yet
Session 5
36 pages
Overview of Machine Learning Algorithms
No ratings yet
Overview of Machine Learning Algorithms
36 pages
Machine Learning Algorithms Overview
No ratings yet
Machine Learning Algorithms Overview
14 pages
Machine Learning Cheatsheet Overview
100% (1)
Machine Learning Cheatsheet Overview
15 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
57 pages
CS-601 Machine Learning Class Notes
No ratings yet
CS-601 Machine Learning Class Notes
17 pages
Data Quality and Preprocessing in ML
100% (1)
Data Quality and Preprocessing in ML
162 pages
Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
9 pages
Machine Learning Overview by dcamenisch
No ratings yet
Machine Learning Overview by dcamenisch
12 pages
Super Cheatsheet Machine Learning PDF
100% (1)
Super Cheatsheet Machine Learning PDF
16 pages
Machine Learning Cheatsheet Guide
No ratings yet
Machine Learning Cheatsheet Guide
16 pages
Supervised vs Unsupervised Learning Explained
No ratings yet
Supervised vs Unsupervised Learning Explained
11 pages
Machine Learning Cheatsheet Guide
No ratings yet
Machine Learning Cheatsheet Guide
47 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
33 pages
Bayesian Learning in Probabilistic Models
No ratings yet
Bayesian Learning in Probabilistic Models
66 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
15 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
25 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
9 pages
Linear Regression Lectures 2 4
No ratings yet
Linear Regression Lectures 2 4
53 pages
Managing Incident Resources in ICS
No ratings yet
Managing Incident Resources in ICS
52 pages
JOOUST 2023/2024 Exam Results Summary
No ratings yet
JOOUST 2023/2024 Exam Results Summary
20 pages
FDP on Water Supply Optimization Techniques
No ratings yet
FDP on Water Supply Optimization Techniques
1 page
Fire Alarm System Technical Specifications
No ratings yet
Fire Alarm System Technical Specifications
20 pages
BCA Microprocessor Question Bank 2018
0% (1)
BCA Microprocessor Question Bank 2018
1 page
Delta VFD-ED Series User Manual
No ratings yet
Delta VFD-ED Series User Manual
300 pages
1Z0-1151-25 Exam Questions & Answers
No ratings yet
1Z0-1151-25 Exam Questions & Answers
7 pages
Sine Wave Generator Experiment Report
100% (1)
Sine Wave Generator Experiment Report
17 pages
MQTC v2016 IIB Performance Final-24-33
No ratings yet
MQTC v2016 IIB Performance Final-24-33
10 pages
GUI Extension Path Logging Issues
No ratings yet
GUI Extension Path Logging Issues
20 pages
Generated Qna
No ratings yet
Generated Qna
6 pages
SAG Mill 1 Electrical Equipment Layout
No ratings yet
SAG Mill 1 Electrical Equipment Layout
1 page
Supply Chain 5.0: Review of Impacts and Challenges
No ratings yet
Supply Chain 5.0: Review of Impacts and Challenges
11 pages
Tableau Prep: Data Transformation Guide
No ratings yet
Tableau Prep: Data Transformation Guide
3 pages
IJREET
No ratings yet
IJREET
6 pages
Howard Newton James: Accounting Leader
No ratings yet
Howard Newton James: Accounting Leader
4 pages
SY0-601問題集、CompTIA実際の試験問題 - 模擬練習
No ratings yet
SY0-601問題集、CompTIA実際の試験問題 - 模擬練習
24 pages
Universiti Kuala Lumpur Exam Schedule
No ratings yet
Universiti Kuala Lumpur Exam Schedule
7 pages
Stylevana Newsletter & Order Tracking Info
No ratings yet
Stylevana Newsletter & Order Tracking Info
1 page
Courses Proposal EE6XX Mixed Signal IC Design
No ratings yet
Courses Proposal EE6XX Mixed Signal IC Design
3 pages
C.R.I. Pumps: Industry Overview & Insights
No ratings yet
C.R.I. Pumps: Industry Overview & Insights
5 pages
Valid Parentheses String Analysis
No ratings yet
Valid Parentheses String Analysis
12 pages
Wish My Teacher Knew
0% (1)
Wish My Teacher Knew
10 pages
840 Series Control Valves Overview
No ratings yet
840 Series Control Valves Overview
4 pages
Verilog Structural and Memory Modeling
No ratings yet
Verilog Structural and Memory Modeling
16 pages
Laravel 12 RouteServiceProvider Guide
No ratings yet
Laravel 12 RouteServiceProvider Guide
15 pages
Understanding Linear Equations and Functions
No ratings yet
Understanding Linear Equations and Functions
9 pages
Aeromag September - October 2020
No ratings yet
Aeromag September - October 2020
27 pages
Gas Cutting Tools Inspection Checklist
No ratings yet
Gas Cutting Tools Inspection Checklist
1 page
GUI Design Principles and Interaction Styles
No ratings yet
GUI Design Principles and Interaction Styles
10 pages