0% found this document useful (0 votes)
3 views44 pages

Machine Learning Basics and Examples

The document provides an overview of machine learning, focusing on types such as supervised and unsupervised learning, along with examples like predicting housing prices using regression models. It discusses the K-Nearest Neighbor (KNN) algorithm, its workings, advantages, and disadvantages, as well as methods for selecting the optimal number of neighbors (K). Additionally, it covers linear regression, logistic regression, and the importance of regularization techniques in model training.

Uploaded by

tranlam021102eee
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views44 pages

Machine Learning Basics and Examples

The document provides an overview of machine learning, focusing on types such as supervised and unsupervised learning, along with examples like predicting housing prices using regression models. It discusses the K-Nearest Neighbor (KNN) algorithm, its workings, advantages, and disadvantages, as well as methods for selecting the optimal number of neighbors (K). Additionally, it covers linear regression, logistic regression, and the importance of regularization techniques in model training.

Uploaded by

tranlam021102eee
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning

101
Types of Machine Learning

Supervised Learning Unsupervised Learning

Input data is labelled Input data is unlabeled

Uses training dataset Uses just input dataset

Used for prediction Used for analysis

Classification | Regression Clustering | Density


estimation | Dimensionality
reduction
Example Machine Learning

Machine Learning Example: Predicting Housing Prices


We might use a regression model, like linear regression, to predict house prices based on data
features. The
model learns the relationship between the features and the house prices from a dataset of historical
housing data.
In this example, "data" refers to the information used to train the machine learning model, and
"features"
are the specific characteristics of the houses that are used as input to make predictions about their
prices.

● Features: These are the characteristics or attributes of the house that we use to
make predictions. Features can include the number of bedrooms, square footage,
neighborhood, proximity to schools, year built, number of bathrooms, etc.

● Labels: This is what we're trying to predict, in this case, the price of the house.
Example Machine Learning
Machine Learning Methods

Continuous data can take any value


Discrete data consists of distinct,
within a range. These values are
separate values. These values are
measurable, and there are infinite
countable, and there are no
possible values between any two
intermediate values between them.
points.
K-Nearest
Neighbor (KNN)
Motivation

Lazy learning?
How KNN works?

Step 1: Select the value of K


Step 2: Calculating distance
Step 3: Finding Nearest Neighbors
Step 4: Voting for Classification OR
Taking Average for Regression
Calculating Distance

Minkowski distance
Example

Example:
Use the Iris Dataset to build
simple KNN Classification model

Why we need Standardize data?

Why KNN need fit method here?


Choose the K at the "elbow" – where the error
How to select K? Elbow method stops decreasing significantly.

Heuristic
method?

K small (k=1,2,3) K large Common rule


- Model becomes very sensitive to noise or
- Model becomes too general, losing
outliers. local structure.
- Can lead to overfitting (too specific to
- Can lead to underfitting (too Where NNN is the number of training
training data). simplistic). samples.

Also, use odd K in binary classification


to avoid ties.
How to select K? K-fold cross validation
How to select K? K-fold cross validation for Time Series

Ensures:

● No shuffling
● No leakage
● Respect for time order

"No Leakage", it means:

The model only sees past or allowed data, and


no information from the future (or test set) is
leaked into training.

- This approach respects the chronological order of data.

How it works:

- Each fold trains on past data and tests on future data.


How to select K? K-fold cross validation

● For each candidate K (number of neighbors), run KNN


with k-fold cross-validation.

● Calculate the average accuracy for each K.

● Plot K vs. average accuracy.

● Choose K with the highest average accuracy.


Question

Given the same


- Dataset
- Algorithm
- Distance metric

Why we have the difference


between the two graphs?
Space to enhance

Speed up Parameter Hyperparameters

Fine Tuning Values that the model learns from


Values that are set before training
— they control the learning process
the data during training.
or model behavior.

None param ● n_neighbors (K): how many


● KNN is a non-parametric model — neighbors to consider.
KNN it doesn’t learn internal parameters
from data. ● metric: distance function (e.g.,
● It just stores the training data. 'euclidean', 'manhattan',
'chebyshev').

● weights: uniform or distance-based


weighting.

Linear If using regularization (e.g., Ridge/Lasso):


Regression ●

Coefficients β1,β2,…,βn
Intercept β0
● alpha or lambda: regularization
strength.
These are learned during training to If using gradient descent to train:
minimize the error.
● learning_rate
● Number of iterations (epochs)
Weight in KNN

● Uniform Weights

● Distance Weights

● User-Defined Weights
Pros/Cons

Pros Cons

Simple to use: Easy to understand


and implement. Slow with large data: Needs to
No training step: No need to train compare every point during
as it just stores the data and uses it prediction.
during prediction. Struggles with many features:
Few parameters: Only needs to set Accuracy drops when data has too
the number of neighbors (k) and a many features.
distance method. Can Overfit: It can overfit
Versatile: Works for both especially when the data is
classification and regression high-dimensional or not clean.
problems.
Space to enhance K-D Tree

A K-D Tree is a binary tree used to


Speed up
organize points in a k-dimensional
space. It enables efficient
operations like:

● Nearest neighbor search


● Range search
● Spatial partitioning

How this work?


1. Pick any one feature at random

2. Find median

3. Split dataset in approximate


equal halves

4. Pick next feature and repeat step


#2,3

5. Continue until all data points are


partitioned
Quizzzzzz

1. Which of the following is NOT a step in 2. What is a potential drawback of the KNN
the KNN algorithm? algorithm?

A. Choose the number of neighbors (K) A. It requires a lot of training time


B. Calculate the distance between the test B. It does not work with numerical features
point and training points C. It’s sensitive to feature scaling
C. Train a model to learn weights D. It cannot be used for classification problems
D. Assign the label based on majority vote of
neighbors

3. What happens if K is set to 1? 4. How does increasing the value of K affect


the KNN algorithm?
A. It always chooses the most frequent class
B. It becomes very sensitive to noise in the data A. It makes the model more complex and likely
C. It averages the labels of 3 neighbors to overfit
D. It ignores the closest point B. It makes the model less sensitive to noise
C. It increases the risk of underfitting
D. Both B and C
Linear

Regression
Linear regression relies on the
assumption that the hidden
true pattern is linear.
Train- Test- Validation
Example

Example:
Use the Diabetes Dataset to build
a linear regression model

Scale data

Inverse scale
How this work?

Mathematical Model

Simplify

Linear regression finds the coefficients 𝛽 that


minimize the error between the predicted values 𝑦

The most common way to measure error is the Mean


Squared Error (MSE)
How this work?
How this work? Gradient descent
How this work? Gradient descent

Learning Rate Hyperparameter

Affect
Space to enhance

Variants of Linear Regression

Batch Gradient Descent Stochastic Gradient Descent

Mini-Batch Gradient Descent


Space to enhance

● Epochs Ensure Data


Completeness: An epoch
represents one complete pass
through the entire training
dataset, allowing the model to
refine its parameters with each
iteration.

● Batch Size affects training


efficiency: The batch size refers
to how many samples are
processed in each batch. A larger
batch size allows the model to
process more data at once,
smaller batches on the other
hand provide more frequent
updates.

● Iterations update the model: An


iteration occurs each time a
batch is processed where the
model find the loss, adjusts its
parameters and updates its
weights based on that loss.
Space to enhance

Variants of Linear Regression


Space to enhance

Regularization is a technique that adds a penalty to the loss function during


training to discourage the model from fitting the noise or becoming too complex.

Ridge Lasso

Elastic Net

L1 L2
Lasso Ridge
Space to enhance
Space to enhance

Gradient Descent with Momentum

Further reading: NAG, AdaGrad, Adam, RMSprop


Regression Lab

House Price Regression Dataset

Features:

1. Square_Footage: The size of the house in square feet. Larger homes typically have higher prices.
2. Num_Bedrooms: The number of bedrooms in the house. More bedrooms generally increase the value of a
home.
3. Num_Bathrooms: The number of bathrooms in the house. Houses with more bathrooms are typically priced
higher.
4. Year_Built: The year the house was built. Older houses may be priced lower due to wear and tear.
5. Lot_Size: The size of the lot the house is built on, measured in acres. Larger lots tend to add value to a
property.
6. Garage_Size: The number of cars that can fit in the garage. Houses with larger garages are usually more
expensive.
7. Neighborhood_Quality: A rating of the neighborhood’s quality on a scale of 1-10, where 10 indicates a
high-quality neighborhood. Better neighborhoods usually command higher prices.
8. House_Price (Target Variable): The price of the house, which is the dependent variable you aim to predict.
Logistic

Regression
Motivation
Motivation
Types of Logistic Regression

Yes/No, True/False, Low/ Medium/ High


Class A, B, C
Positive/Negative -> Encode:
-> Encode: 0/1/2
-> Encode: 0/1 Low = 0 | Medium = 1 | High =2
How this work

Sum(Pi) = 1
How this work

Loss
Average Surprise
Function
- Cross
Entropy

Surprise (S) = 1/P (inverse of probability)

When P = 0 -> S= 1/0 -> +∞ (non-sense, it should be non-surprise)

● Using log to scale

When P = 1 -> S = log(1/P) = log(1/1) = 0 -> No surprise

When P = 0 -> S = log(1/0) = log(1) - log(0) -> Infinitive surprise

Not exist
Example

Example:
Use the Breast Cancer
Dataset to build a
binomial logistic
regression model
Enhancement?

Regularization L1, L2

You might also like