0% found this document useful (0 votes)
4 views42 pages

AIFinance 2a - Introduction To Supervised Learning

The document provides an overview of supervised learning, detailing parametric models like linear and logistic regression, as well as non-parametric models such as decision trees. It explains the objectives of supervised algorithms, including classification and regression tasks, and discusses the training process and optimization techniques like gradient descent. Additionally, it covers the decision-making process in building decision trees, focusing on information gain and the challenges of selecting attributes and thresholds for splits.

Uploaded by

phiklongk
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views42 pages

AIFinance 2a - Introduction To Supervised Learning

The document provides an overview of supervised learning, detailing parametric models like linear and logistic regression, as well as non-parametric models such as decision trees. It explains the objectives of supervised algorithms, including classification and regression tasks, and discusses the training process and optimization techniques like gradient descent. Additionally, it covers the decision-making process in building decision trees, focusing on information gain and the challenges of selecting attributes and thresholds for splits.

Uploaded by

phiklongk
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Dr Ho Diep

Date: 27-Aug-2025

AIFinance 2a:
Introduction to
Supervised
Learning
1
Intro Class: Overview

● Principles of Supervised Learning

● Parametric Models: Linear regression

● Parametric Models: Logistic regression

● Non-Parametric Models: Decision Trees and Random Forests


Introduction to the principles of Supervised Learning -
Categorisation of Supervised models
● In some instances, all explanatory
features are considered on the same
footing. This is typically the case with
regressions, and related ones such as
Logit (Parametric models).
● Alternatively, the explanatory features
may be ordered in a successive manner in
order to refine the selection effort (non
Parametric models).
● In the first instance, fitting a model means
finding the optimal weights applied to
each feature. In the second instance
finding the optimal model means ordering
the most relevant features and finding the
best cut-offs at each step.
Parametric Models
Supervised Learning – Parametric Models -
Setting up the Objective
● Supervised Learning is the process of learning a function which maps input data to an output
based on several input-output pairs. Let's detail the process:
○ First, we have a dataset of pairs {features, target} = {(Xi , Yi )1<i<n } over𝙓 x 𝗬
○ Typically : 𝙓 = RD and 𝗬 = {0, 1}.
○ The pairs {(Xi , Yi )1<i<n } are assumed to be independent and identically distributed (i.i.d.)
following an unknown distribution. It is important to mention here that we assume no
sequentiality in the data.
● Example:
○ Let’s consider this small dataset: We try to predict whether a student will fail or pass the
final exam based on some feature values.
○ Yi = 1 if the student pass, Yi = 0 if he fails.
○ For each Xi , the first coordinate represents the number of hours spent on the course, the
second coordinate is the average intermediary quiz mark and the third coordinate is the
number of hours spent on the coursework.
Setting up the Objective

● A Supervised Algorithm is an algorithm that aims at building a predictor (i.e, a function


which minimizes an error, based on the dataset.

● In the previous example, our objective was to predict a discrete value : pass or fail (1 or 0). This
supervised task is called classification.
● We can also try to predict a continuous value: the final exam mark for instance. In that case, the
task is called regression.
Setting up the Objective

● To define the error, we need first to define a loss function (i.e, a function which
measures the "distance" between the the output of the predictor and the true labels (Yi )1<i<n ).
● For the loss function, we usually choose for all pairs (output, true label),

● We then define the following error l risk associated to the predictor , the aggregate loss over the
train set, which follows an unknown distribution :

● Our objective is to find the optimal predictor among all the possible functions defined by the
modeler (e.g. Logit, linear regression, etc) :

● Since is unknown, we optimize (minimize) the empirical cumulative risk :


Linear Regression
Linear Regression

● Let us start with the simplest regression model : Linear Regression.


● Consider the following (fake) dataset representing the salary (in the x axis) of some (fake)
employees according to the number of years of experience (in the y axis).
Setting up the Objective:

● We would like to find a way to define the red line from the pairs (experience, salary) represented
by the blue points.
● In that way, we could assign an estimated salary to each value of the experience variable.
Linear Regression: a Mathematical Perspective
The training process:
The training process:

● In optimization matters, we usually prefer to minimize functions instead of maximizing them.


● Thus, we transform the likelihood maximization problem into the equivalent cost minimization
problem, where the cost is the following negative log-likelihood:

● The training problem can then be written as the following equivalent minimization problem:
Matrix Notation and Optimization:
Using a Gradient Descent for Optimization
Using a Gradient Descent for Optimization
Logistic Regression
Logistic Regression
Introduction:
● The Logistic Regression is one of the easiest classification models to implement. It also
performs very well on linearly separable classes.
● We call decision boundary the hypersurface separating the space of input data between two
subsets, one for each class. The classifier will classify all the points belonging in one side of the
decision boundary as belonging in one class and all those on the other side as belonging in the
other class.
● In the case of a Logistic Regression, the decision boundary is a hyperplane.
● The following scatterplot of the public Iris dataset shows a linear decision boundary associated
with Logistic Regression.
Presenting the logit function:
The sigmoid function and the Logistic Regression
Model
The prediction phase after training:
The Training Process: finding the optimal w
Using a Gradient Descent for Optimization
Non Parametric Models
A high level description of the algorithm

● The DT algorithm is basically just a bunch of nested if-statements on the input features (also
called attributes) in the training dataset.
● The decision algorithm:
○ We start at the tree root (with the whole dataset)
○ Then we split the dataset on the attribute that results in the largest Information Gain (IG).
○ We iterate the splitting procedure at each child node until the leaves are pure (which means
that the samples at the leaves belong to the same class)
○ A very deep tree is prone to overfitting. To avoid that, we set a limit for the maximal depth
of the tree.
Building Decision Trees

● First, we need to define an objective function that we want to optimize (Information Gain).
● Then, at each iteration, two challenges arise when trying to choose the best split.
○ How do we choose the best attribute responsible for the split ?
○ How do we choose the threshold when splitting based on the "best attribute" ?
Summary

You might also like