Introduction to machine
learning
Kh. Aghajani
[Link]@[Link]
Machine Learning definition
• Arthur Samuel (1959). Machine Learning: Field of study
that gives computers the ability to learn without being
explicitly programmed.
• Tom Mitchell (1998) Well-posed Learning Problem: A
computer program is said to learn from experience E with
respect to some task T and some performance measure P, if
its performance on T, as measured by P, improves with
experience E.
Machine learning algorithms:
- Supervised learning
- Unsupervised learning
- Semi supervised
- Reinforcement learning
Supervised learning
• Supervised learning, also known as supervised machine learning, is a subcategory
of machine learning and artificial intelligence. It is defined by its use of labeled
datasets to train algorithms that to classify data or predict outcomes accurately.
Unsupervised learning
• Unsupervised learning, also known as unsupervised machine learning, uses
machine learning algorithms to analyze and cluster unlabeled datasets. These
algorithms discover hidden patterns or data groupings without the need for
human intervention.
Semi-supervised learning
Semi-supervised learning is a broad category of machine learning that uses labeled data to
ground predictions, and unlabeled data to learn the shape of the larger data distribution.
Reinforcement learning
Reinforcement learning is a machine learning training method based on rewarding desired behaviors and
punishing undesired ones. In general, a reinforcement learning agent -- the entity being trained -- is able to
perceive and interpret its environment, take actions and learn through trial and error.
Supervised learning-regression
• Purpose: Regression is used when the output variable (also known as the
dependent variable) is continuous. It predicts a numerical value or a real
number. For example, predicting house prices, temperature, stock prices, or
a person's age.
• Output: The output of a regression model is a continuous range of values. It
can be any real number, and the prediction typically falls within a specific
numerical range.
• Algorithms: Algorithms commonly used for regression tasks include linear
regression, polynomial regression, decision trees, support vector regression,
and neural networks, among others.
• Evaluation: Regression models are evaluated using metrics like Mean
Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error
(MAE), which measure the accuracy of the model's numerical predictions.
Supervised learning- classification
• Purpose: Classification is used when the output variable is categorical. It predicts the class
or category to which a data point belongs. Common examples include email spam
classification (spam or not spam), image recognition (cat or dog), or disease diagnosis
(positive or negative).
• Output: The output of a classification model is a discrete category or label. It assigns data
points to predefined classes or categories.
• Algorithms: Algorithms used for classification tasks include logistic regression, decision
trees, support vector machines, k-nearest neighbors, Naive Bayes, and various deep
learning models such as convolutional neural networks (CNNs) and recurrent neural
networks (RNNs).
• Evaluation: Classification models are evaluated using metrics such as accuracy, precision,
recall, F1-score, and the area under the Receiver Operating Characteristic (ROC-AUC) curve,
depending on the specific problem and the desired trade-offs between true positives, true
negatives, false positives, and false negatives.
Linear regression
Housing Prices problem
Price
(in 1000s of dollars)
Size (feet2)
Supervised Learning
Regression Problem
Given the “right answer” for
each example in the data. Predict real-valued output
Size in feet2 (x) Price ($) in 1000's (y)
Training set of 2104 460
housing prices 1416 232
1534 315
852 178
Notation: … …
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
Hypothesis:
Parameters:
Cost Function:
Goal:
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Gradient descent method
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce until we hopefully
end up at a minimum
The impact of initial point
at local optima
Current value of
The impact of learning rate
If α is too small, gradient descent can be slow.
If α is too large, gradient descent can overshoot the minimum. It may
fail to converge, or even diverge.
Linear regression-example
Linear regression with Multiple features (variables)
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years)
2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
Multivariate linear regression.
Hypothesis:
Parameters:
Cost function:
Gradient descent:
Repeat
(simultaneously update for every )
New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
simultaneously update for
(simultaneously update )
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. = size (0-2000 feet2) size (feet2)
= number of bedrooms (1-5)
number of bedrooms
Feature Scaling
Get every feature into approximately a range.
Replace with (x i i ) / i to make features have approximately zero mean
(Do not apply to ).
Polynomial regression
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even when • Need to compute
is large.
• Slow if is very large.
Logistic Regression
(Classification)
Classification
Email: Spam / Not Spam?
Online Transactions: Fraudulent (Yes / No)?
Tumor: Malignant / Benign ?
0: “Negative Class” (e.g., benign tumor)
1: “Positive Class” (e.g., malignant tumor)
Logistic Regression Model
Interpretation of Hypothesis Output
= estimated probability that y = 1 on input x
Example: If
Tell patient that 70% chance of tumor being malignant
Decision Boundary
x2
x2
3 -1 1 x1
2 -1
1
Predict “ “ if
1 2 3
x1
Predict “ “ if
Training set:
m examples
How to choose parameters ?
Logistic regression - cost function
Logistic regression cost function
To fit parameters :
To make a prediction given new :
Output
Gradient Descent
Want :
Repeat
(simultaneously update all )
Linear regression : ℎ𝜃 𝑥 = 𝜃 𝑇 𝑥
Logistic regression :
Cross-entropy derivative
Multi-class classification:
One-vs-all
Binary classification: Multi-class Classification:
x2 x2
x1 x1
x2
One-vs-all (one-vs-rest):
x1
x2 x2
x1 x1
x2
Class 1:
Class 2:
Class 3:
x1
One-vs-all
Train a logistic regression classifier for each
class to predict the probability that .
On a new input , to make a prediction, pick the
class that maximizes
Regularization : Example: regression (housing prices)
Price
Price
Price
Size Size Size
Overfitting: If we have too many features, the learned hypothesis
may fit the training set very well ( ), but
fail to generalize to new examples (predict prices on new examples).
Example: Logistic regression
x2 x2 x2
x1 x1 x1
( = sigmoid function)
Addressing overfitting:
size of house
no. of bedrooms
no. of floors
age of house
average income in neighborhood
kitchen size
Cost function with considering Regularization term
Price
Price
Size of house Size of house
Suppose we penalize and make , really small.
+100𝜃32 + 100𝜃42
In regularized linear regression, we choose to minimize
What if is set to an extremely large value (perhaps for too large
for our problem, say )?
Price
Size of house
Gradient descent
Repeat
Regularized logistic regression.
x2
x1
Cost function:
𝑛
𝜆
+ 𝜃𝑗2
2𝑚
𝑗=1