Chapter 3:
What is machine learning?
❖ The goal from the
start had been to
create a machine
that could think
and learn like a
person.
What are machine learning algorithms?
❑ A machine learning algorithm is the procedure and mathematical logic through
which a “machine” learns to identify patterns in training data and apply that
pattern recognition to make accurate predictions on new data.
❑ Machine learning algorithms are the fundamental building blocks of modern AI
and data science,
From: simple linear regression models to: cutting edge deep learning techniques.
What are machine learning algorithms?
❑ Machine learning algorithms are pieces of code that help
people explore, analyze, and find meaning in complex data
sets.
❑ Each algorithm is a finite set of unambiguous step-by-step
instructions that a machine can follow to achieve a certain
goal.
❑ In a machine learning model, the goal is to establish or
discover patterns that people can use to make predictions or
categorize information.
Applications of Machine learning
Machine learning is used across various sectors to improve efficiency and reduce costs:
Applications of Machine learning
Machine learning is used across various sectors to improve efficiency and reduce costs:
✓ Healthcare: Detecting diseases and researching new
vaccines.
✓ Industry & Engineering: Enhancing production
lines and performance.
✓ Social Media & Marketing:
• Content Moderation: Companies like Meta
(Facebook) use ML models to automatically flag
comments or posts that violate policies (e.g., hate
speech) because it is impossible for humans to
monitor millions of users manually.
• Recommendation Systems: Powering the
algorithms that suggest posts or products based on
your behavior.
Applications of Machine learning
Machine learning is used across various sectors to improve efficiency and reduce costs:
Recommendation Systems:
Powering the algorithms that
suggest posts or products
based on your behavior.
Why Machine Learning?
the traditional methods cannot handle the
massive and complex datasets we have today.
Machine learning is essential because:
❑ Handling Big Data: It processes data at
scales humanly impossible.
❑ Discovering Hidden Patterns: It
identifies relationships and "hidden
patterns" within raw data that humans
might miss, helping in making informed
decisions.
Key Concepts & Terminology
1. Independent & Dependent Variables
2. Dependent
1. Independent
Variables (Inputs data): Variables (Target
Outputs):
• These are the data • This is the goal or
points you feed into what you want to
the model. predict
• Example, in a Facebook • (e.g., total sales or revenue
ad campaign, variables
include the audience's location, generated from an ad).
interests, and hobbies.
Key Concepts & Terminology
2. Feature:
❑ Features are also known as independent variables, predictors, or attributes.
❑ A feature refers to a measurable property or characteristic of a phenomenon being
observed.
❑ Features are the input variables that a machine learning model uses to make predictions.
❖ Feature Engineering
Feature engineering is the process of selecting, transforming, or creating features in
order to improve the performance of a machine learning model.
Examples of Features
➢ In a house price prediction dataset, features may include the number of bedrooms,
square footage, and location.
➢ In a spam email classification system, features may include the presence of specific
keywords, the length of the email, and the sender’s address.
Key Concepts & Terminology
3. Label
❑: A label, also called the target variable or dependent variable, is the output that the
model is trained to predict.
❑ In supervised learning, labels represent the known outcomes that the model learns to
associate with the input features during training.
Examples of Labels
➢ In a house price prediction model, the label is the actual price of the house.
➢ In a spam email classifier, the label indicates whether the email is spam or not spam.
Key Concepts & Terminology
4. Data
: is the collection of information used to train and evaluate machine learning models. It
Data
can be numbers, text, images, or any measurable observations.
Types of Data:
❑ Structured: Organized in rows and columns (e.g., spreadsheets, databases)
❑ Unstructured: Not organized (e.g., images, audio, text)
Examples:
➢ House prices dataset: size, number of rooms, location, price
➢ Medical records: age, blood pressure, cholesterol, diagnosis
Key Concepts & Terminology
5. Data Splitting:
The procedure involves taking a dataset and dividing it into two subsets.
1. The first subset is used to fit the model and is referred to as the training
dataset.
2. The second subset is not used to train the model; instead, the input element of
the dataset is provided to the model, then predictions are made and compared
to the expected values. This second dataset is referred to as the test dataset.
The train-test split is a technique for evaluating the performance of a machine
learning algorithm.
It can be used for classification or regression problems and can be used for any
supervised learning algorithm.
Key Concepts & Terminology
5. Data Splitting:
To build a reliable model that can generalize to new information, data is typically divided into three distinct sets:
Key Concepts & Terminology
5. Data Splitting
The Training Set: The Validation Set: The Test Set:
It is the set of data that is used The validation set is a set of data, The test set is a separate set
to train and make the model separate from the training set, that of data used to test the model
learn the hidden is used to validate our model after completing the training.
features/patterns in the data. performance during training.
Key Concepts & Terminology
5. Data Splitting
Why do we need splitting?
Whenever we train a machine learning model, we
can’t train that model on a single dataset or even
we train it on a single dataset then we will not be
able to assess the performance of our model.
➢ For that reason, we split our source data into
training, testing, and validation datasets.
Key Concepts & Terminology
5. Data Splitting:
Why do we need data splitting?
Data splitting ensures that a machine learning model truly learns patterns rather than just
memorizing examples.
Training data = Classroom teaching
Validation data = Practice problems
Testing data = Exam questions
Key Concepts & Terminology
5. Data Splitting:
Why do we need data splitting?
Data splitting ensures that a machine learning model truly learns patterns rather than just
memorizing examples.
❑ Imagine a classroom scenario. The teacher explains an algorithm using examples — these
examples represent the training set, and the student represents the model learning from
them.
❑ To check understanding, the teacher gives practice problems. This stage is like validation,
where the model’s performance is evaluated and improved. If mistakes occur, the teacher
clarifies concepts — similar to parameter tuning. Solving diverse practice problems
strengthens learning, just like cross-validation improves model reliability and accuracy.
❑ Finally, during the exam, students must solve new problems without help. This represents
the test set. The exam result reflects the student’s true understanding, just as test accuracy
shows how well a model generalizes to unseen data.
Key Concepts & Terminology
6. Model :
❑ A model is a mathematical or computational algorithm that learns patterns from data to
make predictions or decisions.
❑ It maps features (inputs) to labels (outputs).
Example:
➢ Linear Regression: Predicts house prices from size, bedrooms, and location.
➢ Logistic Regression: Classifies emails as spam or not spam.
➢ Decision Tree: Determines whether a patient has a disease based on symptoms.
Key Concepts & Terminology
7. Generalization :
❑ Generalization is a model’s ability to perform well on new, unseen data, not just the data
it was trained on.
❑ A model that generalizes well captures underlying patterns rather than memorizing the
training examples.
Example:
➢ A house price prediction model trained on one city can accurately predict prices for new
houses in a different neighborhood.
➢ A spam classifier correctly identifies new spam emails it has never seen before.
Key Concepts & Terminology
8. Underfitting and Overfitting in ML
Machine learning models should learn useful patterns from training data.
When a model learns too little or too much, we get underfitting or overfitting.
Underfitting Overfitting Good Fit
• Underfitting means • Overfitting means that • A good model finds
that the model is too the model learns not the right spot, it is
simple and does not just the underlying complex enough to
cover all real patterns pattern, but also noise capture real patterns,
in the data. or random quirks in but not so complex
the training data. that it “memorizes”
model memorizes noise
training data
Key Concepts & Terminology
8. Underfitting and Overfitting in ML
Key Concepts & Terminology
8. Underfitting and Overfitting in ML
6.1. What is Underfitting?
▪ Underfitting happens when the model fails to learn important patterns.
▪ It performs poorly on both training and testing data.
Underfitting happens due to:
➢ Model is too simple
➢ Very high regularization
➢ Features are weak or missing
➢ Not enough training
➢ High bias
Key Concepts & Terminology
8. Underfitting and Overfitting in ML
6.2. What is Overfitting?
▪ Overfitting happens when the model learns too much from the training data,
including noise and outliers.
▪ It performs very well on training data but poorly on test data.
Overfitting happens due to:
➢ Model too complex
➢ Too many features
➢ Very little data
➢ No regularization
➢ High variance
Key Concepts & Terminology
8. Underfitting and Overfitting in ML
Key Differences and Causes:
Underfitting: Overfitting:
Characteristics: Characteristics:
- High training error, - Low training error,
- high testing error. - high testing error.
Causes: Causes:
- Model is too simple (e.g., linear - Model is too complex,
model for non-linear data), - training data is too noisy,
- insufficient training time, - the training dataset is too small.
- too few features.
Key Concepts & Terminology
8. Underfitting and Overfitting in ML
Solutions to Improve Performance:
Fixing Underfitting: Fixing Overfitting:
❑ Increase model complexity, feature ❑ Use more training data,
engineering (add more features), ❑ Reduce model complexity,
regularization (L1/L2)
❑ Reduce regularization. ❑ Apply regularization (L1/L2),
❑ Use dropout for neural networks.
Examples for overfitting and underfitting:
Overfitting Examples (High complexity, poor generalization)
Medical A model trained on a small dataset memorizes
Diagnosis: specific noise or artifacts instead of general disease
features, failing on new images.
Stock A complex neural network captures random
Prediction: fluctuations instead of trends, leading to poor
performance on future prices.
Customer Using too many specific demographic details
Churn: prevents the model from identifying patterns across a
broader customer base.
Examples for overfitting and underfitting:
Underfitting Examples (High complexity, poor generalization)
Housing A model using only square footage fails
Prices: because it ignores critical factors like
location or age.
Weather Using only temperature and humidity fails to
Forecasting capture complex seasonal or atmospheric
relationships.
Image A shallow decision tree is too simple to
Recognition differentiate between complex categories
like cats and dogs.
TYPES OF MACHINE LEARNING
Machine learning can be categorized into three main types:
TYPES OF MACHINE LEARNING
Machine learning can be categorized into three main types:
TYPES OF MACHINE LEARNING
Machine learning can be categorized into three main types:
Supervised Unsupervised Reinforcement
Learning: Learning: Learning:
▪ Models are trained on ▪ Models are trained on ▪ Agents learn to make
labeled data, where each unlabeled data, where decisions or take actions in
data point is associated there are no predefined an environment to
with a known target or target outputs. maximize a cumulative
output. ▪ The objective is to discover reward signal.
▪ The goal is to learn a hidden patterns, structures, ▪ Through a trial-and-error
mapping function that can or relationships within the process, the agent learns
accurately predict the data. which actions lead to
output for unseen data. favorable outcomes.
TYPES OF MACHINE LEARNING
TYPES OF MACHINE LEARNING
Examples: SL vs UnSL
Types of Supervised learning
Supervised learning problems are mainly
divided into:
Regression (predicting continuous values ):
A regression problem is when the output
variable is a real value,
such as dollars or weight.
Classification (predicting categories ):
A classification problem is when the output
variable is a category,
such as red or blue or disease and no disease.
Types of Supervised learning
1. Regression Algorithms : (Predicting continuous numerical values)
Common regression algorithms include:
Algorithm Description
Models a linear relationship between input variables and a continuous target
Linear Regression
variable.
Extends linear regression by adding polynomial terms to model nonlinear
Polynomial Regression
relationships.
Linear regression with L2 regularization to reduce overfitting and handle
Ridge Regression
multicollinearity.
Regression version of Support Vector Machines that finds a function within a
Support Vector Regression (SVR)
margin of tolerance.
Decision Tree Regression Tree-based model that splits data into regions to predict continuous values.
Ensemble method combining multiple decision trees to improve prediction
Random Forest Regression
accuracy and reduce variance.
Deep learning models capable of modeling complex and highly nonlinear
Neural Networks for Regression
relationships.
Types of Supervised learning
2. Classification Algorithms (Predicting categorical or discrete labels)
Algorithm Description
Probabilistic linear classifier that models the probability of class membership
Logistic Regression
using a sigmoid function.
Instance-based method that assigns a class based on the majority label
k-Nearest Neighbors (k-NN)
among the k closest samples.
Rule-based tree structures that split data using feature thresholds to classify
Decision Trees
samples.
Ensemble technique combining multiple decision trees to improve
Random Forest
generalization and reduce overfitting.
Margin-based classifier that finds the optimal hyperplane separating classes
Support Vector Machines (SVM)
with maximum margin.
Probabilistic classifier based on Bayes’ theorem with an assumption of feature
Naïve Bayes
independence.
Multi-layer neural models capable of learning complex nonlinear decision
Artificial Neural Networks (ANNs)
boundaries.
Convolutional Neural Networks Deep learning models particularly effective for image and spatial data
(CNNs) classification.
Challenges in Supervised learning
The problematic aspects that arise when working with supervised learning methods in machine learning.
Insufficient Lack of quality training data with appropriate labeled examples can lead
training data
Training Data to suboptimal model performance.
Overfitting and Overfitting occurs when a model learns training data too well, affecting its
ability to generalize. Conversely, underfitting means the model fails to
Underfitting learn intricate patterns, resulting in a weak prediction.
When one class is heavily underrepresented in the dataset, algorithms may
Class Imbalance develop a prediction bias, making it difficult to sustain adequate
classification performance.
Noisy Data and Issues like data inconsistencies, incorrect labels, or outliers can significantly
Outliers impact model performance and call for diverse pre-processing strategies.
Computational Certain supervised learning algorithms require substantial computational
resources, which could slow down or constrain the model's training and
Complexity implementation, especially in large-scale real-world applications.
Types of Unsupervised learning
These are again divided into tow main categories based:
Category Objective Description Example
Identifies natural groups (clusters) within
Discover hidden Grouping customers based on
Clustering structure in data
unlabeled datasets based on similarity
purchasing behavior
between data points.
Customers who buy a new
Discover relationships Finds meaningful rules and correlations
Association between variables between variables in large datasets.
home are likely to purchase
new furniture
Types of Unsupervised learning
These are again divided into tow main categories based:
Algorithm Description
Divides data into K groups based on distance to cluster
K-Means
centers.
Hierarchical Clustering Builds clusters step by step in a tree-like structure.
DBSCAN Groups points based on density and detects outliers.
Mean Shift Finds clusters by locating high-density regions.
Gaussian Mixture Models Probabilistic method assuming data comes from multiple
(GMM) Gaussian distributions.
Spectral Clustering Uses similarity graphs to cluster complex-shaped data.
Real-World Examples of Supervised and Unsupervised Learning
Learning Type Subtype / Task Real-World Example Practical Application
Classifying emails as spam or not
Supervised Learning Classification Email spam detection
spam
Predicting whether a tumor is
Supervised Learning Classification Medical diagnosis
benign or malignant
Estimating property prices based
Supervised Learning Regression House price prediction
on area and features
Predicting future electricity
Supervised Learning Regression Energy consumption forecasting
demand
Grouping customers based on
Unsupervised Learning Clustering Customer segmentation
purchasing behavior
Organizing photos by similarity
Unsupervised Learning Clustering Image grouping
without labels
Discovering products frequently
Unsupervised Learning Association Market basket analysis
bought together
Exercises
Exercise 1 – Identify the Learning Type
For each of the following cases, determine:
➢ Is it Supervised or Unsupervised learning?
➢ What is the task type (Classification / Regression / Clustering / Association)
1. Predicting whether an email is spam or not.
2. Predicting whether a customer will buy a product (Yes/No).
3. Estimating house prices based on size and location.
4. Predicting student final grades based on study hours.
5. Grouping customers based on purchasing behavior (without labels).
6. Grouping product images without predefined categories.
7. Discovering products frequently bought together in supermarkets.
8. Finding relationships between items in transaction data.
9. Predicting car fuel consumption based on engine characteristics.
10. Classifying handwritten digits (0–9).
Exercise 2 – Complete the Table
Complete the following table:
Learning
Input (X) Output (Y) Task Type Application
Type
Email content ? ? ? Spam filtering
Patient medical data Disease type ? ? Hospital diagnosis
Customer transactions ? ? ? Market basket analysis
Sensor temperature data Equipment failure (0/1) ? ? Predictive maintenance
House features (size, location) House price ? ? Real estate prediction
Student data (hours, attendance) Final grade ? ? Academic performance
Images of products ? ? ? Product grouping
Bank transactions Fraud (0/1) ? ? Fraud detection
Customer demographics Customer segments ? ? Customer segmentation
Engine data Fuel consumption ? ? Automotive prediction
Exercise 3 – Electrical Engineering Case Study
In an electrical engineering system, the following data is collected from a power grid:
• Voltage
• Current
• Frequency
And we have a target variable:
• Power Stability (0 = Unstable, 1 = Stable)
Answer the following:
1. What is the type of learning?
2. What is the type of task?
3. Can Logistic Regression be used? Why?
4. If we want to predict the future value of voltage instead of stability, what does the task become?
Exercise 4 –Image Organization without Labels (Think & Analyze)
A company has collected 10,000 product images from an online store. The images belong to different
categories such as shoes, bags, electronics, and clothes, but no labels are available.
The company wants to automatically group similar products together.
Questions
1. What type of machine learning approach should be used?
Explain why this approach is suitable for this problem.
2. Which algorithm could be used to group the images?
Suggest one or two algorithms and justify your choice.
3. Before clustering the images, what feature extraction technique could be applied?
(Hint: using deep learning models trained for image representation)
4. How can we evaluate the quality of the clustering results if labels are not available?
Suggest at least two evaluation methods.
Exercise Solutions
Exercise 1 – Complete the Table (Solution)
Case Learning Type Task Type
Predicting whether an email is spam or not Supervised Classification
Predicting whether a customer will buy a product (Yes/No) Supervised Classification
Estimating house prices based on size and location Supervised Regression
Predicting student final grades based on study hours Supervised Regression
Grouping customers based on purchasing behavior (no labels) Unsupervised Clustering
Grouping images of products without predefined categories Unsupervised Clustering
Discovering products frequently bought together in supermarkets Unsupervised Association
Finding relationships between items in transaction data Unsupervised Association
Predicting car fuel consumption based on engine characteristics Supervised Regression
Classifying handwritten digits (0–9) Supervised Classification
Exercise 2 – Complete the Table (Solution)
Input (X) Output (Y) Learning Type Task Type Application
Email content Spam / Not Spam Supervised Learning Classification Spam filtering
Patient medical data Disease type Supervised Learning Classification Hospital diagnosis
Customer transactions Items frequently bought together Unsupervised Learning Association Market basket analysis
Sensor temperature data Equipment failure (0/1) Supervised Learning Classification Predictive maintenance
House features (size, location) House price Supervised Learning Regression Real estate prediction
Student data (hours,
Final grade Supervised Learning Regression Academic performance
attendance)
Images of products Product groups / clusters Unsupervised Learning Clustering Product grouping
Bank transactions Fraud (0/1) Supervised Learning Classification Fraud detection
Customer demographics Customer segments Unsupervised Learning Clustering Customer segmentation
Engine data Fuel consumption Supervised Learning Regression Automotive prediction
Exercise 3 – Electrical Engineering Case Study (Solution)
1. Type of learning:
Supervised Learning
(because we have input data and a labeled target variable: Power Stability)
2. Type of task:
Classification
(the output is binary: 0 = Unstable, 1 = Stable)
3. Can Logistic Regression be used? Why?
Yes, Logistic Regression can be used because it is designed for binary classification problems.
It models the probability that the system is stable (1) or unstable (0) based on the input variables (Voltage,
Current, Frequency).
4. If we predict the future value of voltage instead of stability:
The task becomes Regression
(because voltage is a continuous numerical value)
Exercise 4 – Image Organization without Labels ((Solution)
1. Type of Machine Learning Approach
The suitable approach is unsupervised learning.
Explanation:
In this problem, the dataset does not contain labels indicating the category of each image. Therefore, the
algorithm must automatically discover patterns or structures in the data. Unsupervised learning is designed for
such situations because it can identify groups of similar data points without predefined categories.
2. Algorithm to Group the Images
The task is a clustering problem, where similar images are grouped together.
Two suitable algorithms are:
1. K-Means Clustering
• It partitions data into K clusters based on similarity.
• Images with similar features will be assigned to the same cluster.
• It is simple, fast, and widely used for large datasets.
2. DBSCAN
• Groups data based on density of points.
• Can detect outliers (noise).
• Does not require specifying the number of clusters in advance.
Exercise 4 – Image Organization without Labels (Solution)
3. Feature Extraction Technique
Before clustering, the images must be converted into numerical feature vectors.
A common approach is to use deep learning feature extraction with a pretrainedConvolutional Neural Network.
Procedure:
• Use a pretrained CNN model (e.g., ResNet or VGG).
• Remove the final classification layer.
• Extract the feature vector from one of the last layers.
• Use these vectors as input for the clustering algorithm.
This approach captures visual patterns such as shapes, textures, and colors, which improves clustering
performance.
4. Evaluating Clustering Without Labels
Since labels are not available, we use internal evaluation metrics.
1. Silhouette Score
• Measures how similar a data point is to its own cluster compared to other clusters.
• Values range from -1 to 1.
• A higher score indicates better clustering.
2. Davies–Bouldin Index
• Measures the average similarity between clusters.
• Lower values indicate better separation between clusters.
• These metrics evaluate the compactness and separation of clusters without requiring labeled data.