Machine Learning
101
Types of Machine Learning
Supervised Learning Unsupervised Learning
Input data is labelled Input data is unlabeled
Uses training dataset Uses just input dataset
Used for prediction Used for analysis
Classification | Regression Clustering | Density
estimation | Dimensionality
reduction
Example Machine Learning
Machine Learning Example: Predicting Housing Prices
We might use a regression model, like linear regression, to predict house prices based on data
features. The
model learns the relationship between the features and the house prices from a dataset of historical
housing data.
In this example, "data" refers to the information used to train the machine learning model, and
"features"
are the specific characteristics of the houses that are used as input to make predictions about their
prices.
● Features: These are the characteristics or attributes of the house that we use to
make predictions. Features can include the number of bedrooms, square footage,
neighborhood, proximity to schools, year built, number of bathrooms, etc.
● Labels: This is what we're trying to predict, in this case, the price of the house.
Example Machine Learning
Machine Learning Methods
Continuous data can take any value
Discrete data consists of distinct,
within a range. These values are
separate values. These values are
measurable, and there are infinite
countable, and there are no
possible values between any two
intermediate values between them.
points.
K-Nearest
Neighbor (KNN)
Motivation
Lazy learning?
How KNN works?
Step 1: Select the value of K
Step 2: Calculating distance
Step 3: Finding Nearest Neighbors
Step 4: Voting for Classification OR
Taking Average for Regression
Calculating Distance
Minkowski distance
Example
Example:
Use the Iris Dataset to build
simple KNN Classification model
Why we need Standardize data?
Why KNN need fit method here?
Choose the K at the "elbow" – where the error
How to select K? Elbow method stops decreasing significantly.
Heuristic
method?
K small (k=1,2,3) K large Common rule
- Model becomes very sensitive to noise or
- Model becomes too general, losing
outliers. local structure.
- Can lead to overfitting (too specific to
- Can lead to underfitting (too Where NNN is the number of training
training data). simplistic). samples.
Also, use odd K in binary classification
to avoid ties.
How to select K? K-fold cross validation
How to select K? K-fold cross validation for Time Series
Ensures:
● No shuffling
● No leakage
● Respect for time order
"No Leakage", it means:
The model only sees past or allowed data, and
no information from the future (or test set) is
leaked into training.
- This approach respects the chronological order of data.
How it works:
- Each fold trains on past data and tests on future data.
How to select K? K-fold cross validation
● For each candidate K (number of neighbors), run KNN
with k-fold cross-validation.
● Calculate the average accuracy for each K.
● Plot K vs. average accuracy.
● Choose K with the highest average accuracy.
Question
Given the same
- Dataset
- Algorithm
- Distance metric
Why we have the difference
between the two graphs?
Space to enhance
Speed up Parameter Hyperparameters
Fine Tuning Values that the model learns from
Values that are set before training
— they control the learning process
the data during training.
or model behavior.
None param ● n_neighbors (K): how many
● KNN is a non-parametric model — neighbors to consider.
KNN it doesn’t learn internal parameters
from data. ● metric: distance function (e.g.,
● It just stores the training data. 'euclidean', 'manhattan',
'chebyshev').
● weights: uniform or distance-based
weighting.
Linear If using regularization (e.g., Ridge/Lasso):
Regression ●
●
Coefficients β1,β2,…,βn
Intercept β0
● alpha or lambda: regularization
strength.
These are learned during training to If using gradient descent to train:
minimize the error.
● learning_rate
● Number of iterations (epochs)
Weight in KNN
● Uniform Weights
● Distance Weights
● User-Defined Weights
Pros/Cons
Pros Cons
Simple to use: Easy to understand
and implement. Slow with large data: Needs to
No training step: No need to train compare every point during
as it just stores the data and uses it prediction.
during prediction. Struggles with many features:
Few parameters: Only needs to set Accuracy drops when data has too
the number of neighbors (k) and a many features.
distance method. Can Overfit: It can overfit
Versatile: Works for both especially when the data is
classification and regression high-dimensional or not clean.
problems.
Space to enhance K-D Tree
A K-D Tree is a binary tree used to
Speed up
organize points in a k-dimensional
space. It enables efficient
operations like:
● Nearest neighbor search
● Range search
● Spatial partitioning
How this work?
1. Pick any one feature at random
2. Find median
3. Split dataset in approximate
equal halves
4. Pick next feature and repeat step
#2,3
5. Continue until all data points are
partitioned
Quizzzzzz
1. Which of the following is NOT a step in 2. What is a potential drawback of the KNN
the KNN algorithm? algorithm?
A. Choose the number of neighbors (K) A. It requires a lot of training time
B. Calculate the distance between the test B. It does not work with numerical features
point and training points C. It’s sensitive to feature scaling
C. Train a model to learn weights D. It cannot be used for classification problems
D. Assign the label based on majority vote of
neighbors
3. What happens if K is set to 1? 4. How does increasing the value of K affect
the KNN algorithm?
A. It always chooses the most frequent class
B. It becomes very sensitive to noise in the data A. It makes the model more complex and likely
C. It averages the labels of 3 neighbors to overfit
D. It ignores the closest point B. It makes the model less sensitive to noise
C. It increases the risk of underfitting
D. Both B and C
Linear
Regression
Linear regression relies on the
assumption that the hidden
true pattern is linear.
Train- Test- Validation
Example
Example:
Use the Diabetes Dataset to build
a linear regression model
Scale data
Inverse scale
How this work?
Mathematical Model
Simplify
Linear regression finds the coefficients 𝛽 that
minimize the error between the predicted values 𝑦
The most common way to measure error is the Mean
Squared Error (MSE)
How this work?
How this work? Gradient descent
How this work? Gradient descent
Learning Rate Hyperparameter
Affect
Space to enhance
Variants of Linear Regression
Batch Gradient Descent Stochastic Gradient Descent
Mini-Batch Gradient Descent
Space to enhance
● Epochs Ensure Data
Completeness: An epoch
represents one complete pass
through the entire training
dataset, allowing the model to
refine its parameters with each
iteration.
● Batch Size affects training
efficiency: The batch size refers
to how many samples are
processed in each batch. A larger
batch size allows the model to
process more data at once,
smaller batches on the other
hand provide more frequent
updates.
● Iterations update the model: An
iteration occurs each time a
batch is processed where the
model find the loss, adjusts its
parameters and updates its
weights based on that loss.
Space to enhance
Variants of Linear Regression
Space to enhance
Regularization is a technique that adds a penalty to the loss function during
training to discourage the model from fitting the noise or becoming too complex.
Ridge Lasso
Elastic Net
L1 L2
Lasso Ridge
Space to enhance
Space to enhance
Gradient Descent with Momentum
Further reading: NAG, AdaGrad, Adam, RMSprop
Regression Lab
House Price Regression Dataset
Features:
1. Square_Footage: The size of the house in square feet. Larger homes typically have higher prices.
2. Num_Bedrooms: The number of bedrooms in the house. More bedrooms generally increase the value of a
home.
3. Num_Bathrooms: The number of bathrooms in the house. Houses with more bathrooms are typically priced
higher.
4. Year_Built: The year the house was built. Older houses may be priced lower due to wear and tear.
5. Lot_Size: The size of the lot the house is built on, measured in acres. Larger lots tend to add value to a
property.
6. Garage_Size: The number of cars that can fit in the garage. Houses with larger garages are usually more
expensive.
7. Neighborhood_Quality: A rating of the neighborhood’s quality on a scale of 1-10, where 10 indicates a
high-quality neighborhood. Better neighborhoods usually command higher prices.
8. House_Price (Target Variable): The price of the house, which is the dependent variable you aim to predict.
Logistic
Regression
Motivation
Motivation
Types of Logistic Regression
Yes/No, True/False, Low/ Medium/ High
Class A, B, C
Positive/Negative -> Encode:
-> Encode: 0/1/2
-> Encode: 0/1 Low = 0 | Medium = 1 | High =2
How this work
Sum(Pi) = 1
How this work
Loss
Average Surprise
Function
- Cross
Entropy
Surprise (S) = 1/P (inverse of probability)
When P = 0 -> S= 1/0 -> +∞ (non-sense, it should be non-surprise)
● Using log to scale
When P = 1 -> S = log(1/P) = log(1/1) = 0 -> No surprise
When P = 0 -> S = log(1/0) = log(1) - log(0) -> Infinitive surprise
Not exist
Example
Example:
Use the Breast Cancer
Dataset to build a
binomial logistic
regression model
Enhancement?
Regularization L1, L2