0% found this document useful (0 votes)
30 views166 pages

Unit 3 AI&ML

The document covers supervised learning in machine learning, focusing on linear regression models, classification techniques, and various algorithms such as logistic regression, support vector machines, and decision trees. It explains key concepts including data, model training, evaluation, and optimization methods like gradient descent. Additionally, it discusses evaluation metrics for model performance and introduces Bayesian linear regression and random forests.

Uploaded by

23l138
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views166 pages

Unit 3 AI&ML

The document covers supervised learning in machine learning, focusing on linear regression models, classification techniques, and various algorithms such as logistic regression, support vector machines, and decision trees. It explains key concepts including data, model training, evaluation, and optimization methods like gradient descent. Additionally, it discusses evaluation metrics for model performance and introduces Bayesian linear regression and random forests.

Uploaded by

23l138
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CS3491

ARTIFICIAL INTELLIGENCE
AND
MACHINE LEARNING
SUPERVISED LEARNING Unit - 3
SUPERVISED LEARNING

Introduction to machine learning – Linear Regression Models:


Least squares, single & multiple variables, Bayesian linear
regression, gradient descent, Linear Classification Models:
Discriminant function – Probabilistic discriminative model -
Logistic regression, Probabilistic generative model – Naive
Bayes, Maximum margin classifier – Support vector machine,
Decision Tree, Random forests
[Link]

Machine Learning
Machine Learning is the process of training a model, to make useful
predictions or generate content from data.
[Link]

Machine Learning
ML - core concepts
● Data

● Model

● Training

● Evaluating

● Inference
● Data
Store related data in datasets
Datasets are made up of individual examples that contain
features and a label.
A dataset is characterized by its size and diversity.
● Model
A model is the complex collection of numbers that define the
mathematical relationship from specific input feature patterns to
specific output label values. The model discovers these patterns
through training.

● Training
Before a supervised model can make predictions, it must be trained.
To train a model, we give the model a dataset with labeled
examples.
● Training
An ML model making a prediction from a labeled example.

An ML model updating its predicted value


● Training

An ML model updating its predictions for each labeled example in the training
dataset.
● Evaluate

Evaluating an ML model by
comparing its predictions to the
actual values.
[Link]

Supervised Learning
Supervised learning is a type of machine learning where a model learns from labeled
data, meaning the input data comes with corresponding correct output or "label" values.
Supervised Learning
Regression

A regression model predicts a numeric value.

Classification

[Link]
Linear Regression
Linear regression is a statistical technique used to find the relationship
between variables. In an ML context, linear regression finds the relationship
between features and a label.
Linear Regression
Linear Regression

creating model by drawing a best fit line through the points


Linear regression equation
❏ In this example, calculate the weight and bias from the line drew.

❏ The bias is 30 (where the line intersects the y-axis), and the weight is -3.6

(the slope of the line). The model would be defined as 𝑦′=30+(−3.6)(𝑥1)

❏ , and use it to make predictions. For instance, using this model, a

4,000-pound car would have a predicted fuel efficiency of 15.6 miles per

gallon.
Models with multiple features

For example, a model that predicts gas mileage could additionally use features
such as the following:

● Engine displacement
● Acceleration
● Number of cylinders
● Horsepower
Mean Squared Error (MSE):

Summing up all the squared The actual (true) value for the ith data point
errors
The squared error — squaring
ensures all errors are positive

The error (residual) —


Total number of data points
how far off the
prediction is
The predicted value by the model for the
ith data point
Mean Squared Error (MSE):
Example
Actual: y=[3,5,2]
Predicted: y^=[2.5,5.3,1.7]

1. (3−2.5)2=0.25
2. (5−5.3)2=0.09
3. (2−1.7)2=0.09

MSE=⅓ (0.25+0.09+0.09) = 0.43 / 3 ≈0.143


Least Square Method
Least Squares method is a statistical technique used to find the
equation of best-fitting curve or line to a set of data points by minimizing
the sum of the squared differences between the observed values and the
values predicted by the model.

Regression line / line of best fit

Slope

Intercept
Example
The table below provides the monthly average petrol prices from April (Month 4) to
September (Month 9).

a. Using linear regression, calculate the best-fit line for the given data.
b. Predict the petrol price in December (Month 12).
c. Interpret the goodness of the regression line using R2
c. Interpret the goodness of the regression line using R2

R² measures how well the linear regression line fits the data by quantifying the
proportion of variance in the dependent variable (petrol price) explained by the
independent variable (month).

SST ≈ 42.6667
SSR ≈ 4.3333
R² = 1 - (4.3333/42.6667) ≈ 0.895.

This confirms the prior result of 0.895 (89.5%), indicating strong fit.
reference table for R2 calculation

Month (x) Actual (y) Predicted (y^\hat{y}y^) Residual (y - y^\hat{y}y^) Squared Residual

4 77 77.17 -0.17 0.03

5 78 78.34 -0.34 0.12

6 81 79.52 1.48 2.19

7 80 80.69 -0.69 0.48

8 82 81.86 0.14 0.02

9 83 83.04 -0.04 0.00


Total SSR ≈ 4.33
Bayesian linear regression
Bayesian regression is a probabilistic approach to regression where we treat model
parameters (like weights) as random variables, not fixed values.

Apply Bayes’ Theorem to update our beliefs about the parameters after seeing the
data.
Bayesian linear regression
Model Assumption (Like Ordinary Linear Regression)

Prior over Weights w

Likelihood Function

Posterior over Weights

Making Predictions
Bayesian linear regression
Gradient Based Optimization
The goal of optimization in machine learning is to find the parameters of the
model that minimize (or maximize) a loss function. The loss function quantifies
how well the model's predictions match the actual data.
Gradient descent is a widely used optimization algorithm for minimizing the
loss function. It involves the following steps:
1. Compute Loss: At each step, calculate the loss.

2. Compute Gradient: Find the gradient (slope) of the loss function with
respect to model parameters.

3. Update Parameters: Adjust parameters in the opposite direction of the


gradient.
SIMPLE LINEAR
REGRESSION
Example 1
Example 2

[Link]
Linear Classification Model
Classification
Classification is a fundamental machine learning technique aimed at organizing input data
into distinct classes.
Classification
Classification is a fundamental machine learning technique aimed at
organizing input data into distinct classes.

Types of Classification

❖ Binary Classification
Two distinct categories

❖ Multiclass Classification
more than two categories
Characteristics of Classification Models

❖ Class Separation

❖ Decision Boundaries

❖ Sensitivity to Data Quality

❖ Handling Imbalanced Data

Classification Algorithms
Linear Classifiers
Non - Linear Classifiers
Classification
Classification
Classification
Data points of 2 classes
Linear Line to separate the classes
How Mean and Variance plays a role?
Mean
Variance
Variance measures how much each feature spreads.

Correlation measures how features move together.


Predict the Class with argmax function
The argmax function is used to select the class with the highest score or
probability.
Predict the Class with
argmax function

Example
import numpy as np

# Example output probabilities from a model


probabilities = [Link]([0.1, 0.8, 0.1])

# Find the index of the maximum probability


predicted_class = [Link](probabilities)

print(predicted_class) # Output: 1
Discriminant Function

A discriminant function is a function used in pattern classifiers


to partition the feature space based on probabilities or
equivalent functions, helping to determine the class to which a
given input belongs
Discriminant Function
Discriminant Function
Discriminant Function
Discriminant Function
Discriminant Function
Linear Discriminant Analysis (LDA)

Modeling of distinctions between groups, effectively separating two


or more classes.

LDA operates by projecting features from a higher-dimensional


space into a lower-dimensional one.

In machine learning, LDA serves as a supervised learning algorithm


specifically designed for classification tasks, aiming to identify a
linear combination of features that optimally segregates classes
within a dataset.
Linear Discriminant Analysis (LDA)

LDA assumes that the data has a Gaussian distribution and that the
covariance matrices of the different classes are equal.

It also assumes that the data is linearly separable, meaning that a


linear decision boundary can accurately classify the different
classes.
Linear Discriminant Analysis (LDA)

Two criteria are used by LDA to create a new axis:


1. Maximize the distance between the means of the two
classes.
2. Minimize the variation within each class.
Linear Discriminant Analysis (LDA)

Two criteria are used by LDA to create a new axis:


1. Maximize the distance between the means of the two
classes.
2. Minimize the variation within each class.
Mathematical Intuition Behind LDA

1: Maximize the distance between class means

2: Minimize the variation within each class


Step - by - step

1. Project data onto a line

2. Class means after projection


Step - by - step

3. Variance (scatter) within each class after projection


Step - by - step

4. Fisher’s Criterion (Objective Function)

5. Optimal w
Probabilistic generative model
vs
Probabilistic discriminative model
Probabilistic generative model
vs
Probabilistic discriminative model
Probabilistic generative model
vs
Probabilistic discriminative model
Probabilistic generative model
vs
Probabilistic discriminative model
Probabilistic generative model
vs
Probabilistic discriminative model
Example 2
Problem in Linear Classification
Classes are not completely separated using linear equation since
linear classification inadequate

Logistic regression used to predict the probability of output being a


specific category based on the input feature,
Logistic Regression

❏ Logistic regression is a supervised machine learning algorithm used for classification


tasks where the goal is to predict the probability that an instance belongs to a given
class or not.

❏ Logistic regression is used for binary classification where we use sigmoid function, that
takes input as independent variables and produces a probability value between 0 and 1.

❏ For example, we have two classes Class 0 and Class 1 if the value of the logistic
function for an input is greater than 0.5 (threshold value) then it belongs to Class 1
otherwise it belongs to Class 0.
❏ In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
function, which predicts two maximum values (0 or 1).
Logistic Regression
Equation of Logistic Regression:
Logistic Regression
❏ Logistic regression is a supervised learning algorithm used to estimate the
probability that a given instance belongs to a particular class.
❏ Logistic regression outputs probabilities between 0 and 1. It achieves this by passing
the linear combination of input features through a sigmoid function, ensuring that
predictions remain within a valid probability range.

Types of Logistic Regression


❏ Binomial Logistic Regression: Applied when the dependent variable has
two outcomes (e.g., spam vs. non-spam).
❏ Multinomial Logistic Regression: Used when the target variable has more
than two categories without a natural order (e.g., product categories).
❏ Ordinal Logistic Regression: Applied when the dependent variable has
ordered categories (e.g., satisfaction levels: low, medium, high).
Steps in Logistic Regression

1. Sigmoid Function and Probability Estimation


sigmoid function transforms the linear regression output into a probability value between 0 and 1

2. Logistic Regression Equation

3. Cost Function

4. Gradient descent to optimize parameters


Linear vs Logistic Regression
Example
Classify whether a student gets admitted (1) or not admitted (0) to a college based on
their exam score.
Example
Step 1: Model Assumption

Step 2: Initialize Parameters

Step 3: Define Cost Function (Log Loss)


Example
Step 1: Model Assumption

Step 2: Initialize Parameters

Step 3: Define Cost Function (Log Loss)


Example
Step 4: Gradient Descent Optimization
Example
Step 5: After Training

Step 6: Make Predictions


Support Vector Machine (SVM) Algorithm

Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification and regression tasks.

SVM aims to find the optimal hyperplane in an N-dimensional space to separate


data points into different classes.

The algorithm maximizes the margin between the closest points of different classes.

Hyperplane

Support Vectors

Margin
Linear SVM

Non-Linear SVM
Decision Tree Classification Algorithm

❏ Internal nodes represent the features of a dataset.


❏ Branches represent the decision rules.
❏ Each leaf node represents the outcome

Decision Node Leaf Node


Decision Tree Classification Algorithm
○ Step-1: Begin the tree with the root node, says S, which contains the

complete dataset.

○ Step-2: Find the best attribute in the dataset using Attribute Selection

Measure (ASM).

○ Step-3: Divide the S into subsets that contains possible values for the best

attributes.

○ Step-4: Generate the decision tree node, which contains the best attribute.

○ Step-5: Recursively make new decision trees using the subsets of the

dataset created in step -3. Continue this process until a stage is reached

where you cannot further classify the nodes and called the final node as a

leaf node.
Example
Random Forest Algorithm
Random Forest Algorithm
Evaluation Metrics
○ Accuracy
○ Logarithmic Loss
○ Area Under Curve (AUC)
○ Precision
○ Recall
○ F1 Score
○ Confusion Matrix
Accuracy
Accuracy is the proportion of correct predictions made by the model, out of all
predictions:
True Positive Rate

True Negative Rate

False Positive Rate

Precision Recall
Confusion Matrix
Decision Tree problems
❏ Selecting the Best Attribute

❏ Creating Tree Nodes

❏ Stopping Criteria

❏ Handling Missing Values

❏ Tree Pruning

Entropy Information Gain


Thank You

You might also like