MLT Unit-1 Introduction
MLT Unit-1 Introduction
1|P a g e
Unit-1: Introduction to Machine Learning
Machines "learn" by continuously increasing their understanding through data-driven iterations like how
humans learn from experience.
1. Healthcare: It helps doctors to diagnose diseases from medical images like X-rays and MRIs. It also
predicts patient outcomes and personalizes treatments which improves healthcare quality.
Examples: Medical image analysis, drug discovery, and patient risk prediction.
2|P a g e
Unit-1: Introduction to Machine Learning
2. Finance: In finance it detects fraudulent transactions in real time and supports algorithmic trading. It
also helps to assess credit risk helps in making lending safer and faster. Maximum financial processes
and improves fraud detection accuracy.
Examples: Credit scoring, algorithmic trading, fraud detection.
3. Retail and E-Commerce: It helps in personalized product recommendations and forecasts demand to
optimize inventory and also analyzes customer sentiment to improve shopping experiences. Enhances
customer experience, inventory management, and demand forecasting.
Example: Personalized recommendations, dynamic pricing, stock management.
4. Transportation and Automotive: Self-driving cars rely on ML to navigate and make decisions. It
optimizes delivery routes and predicts vehicle maintenance needs which reduces downtime. Improves
logistics, route optimization, and autonomous vehicle capabilities.
Examples: Self-driving cars, demand forecasting, fleet management.
6. Social Media and Entertainment: Platforms like Netflix and YouTube use ML to recommend
content we'll enjoy. It enables image and speech recognition for better user interaction. Makes
personalized content recommendations and enhances user experience.
Examples: Movie/music recommendations, content creation, and game personalization.
7. Education: Individualizes learning experiences, automates administrative tasks, and monitors student
progress.
Example: Adaptive learning platforms, grading automation, student analytics.
8. Agriculture: Automates crop management, forecasts yields, and monitors soil and plant health.
Example: Crop monitoring, pest detection, yield prediction.
10. Security and Surveillance: Helps in face recognition, anomaly detection, and automatic monitoring.
Examples: Intrusion detection, CCTV monitoring, and access control.
3. Blockchain
Blockchain, the system behind cryptos like Bitcoin, is helpful to many businesses. This technology applies
a decentralized ledger to document all transactions, thus ensuring transparency among the parties involved
3|P a g e
Unit-1: Introduction to Machine Learning
without any third party. Moreover, blockchain transactions are irreversible, meaning they can never be
erased or altered after updating the ledger.
Blockchain will likely combine with machine learning and AI since some features support each other in
both technologies. These features include a decentralized ledger, transparency, and immutability.
4. Explainable AI (XAI)
As ML models become increasingly sophisticated, the demand for transparency and interpretability
increases. Explainable AI aims to make human-understandable model decisions, solving ethical issues and
regulatory compliance in sensitive areas such as healthcare, finance, and legal systems.
6. Self-Supervised Learning
This movement seeks to decrease the reliance on large labeled datasets by employing unlabeled data for
representation learning.
Self-supervised learning has been very promising in natural language processing (NLP), computer vision,
and speech recognition, and it is important for organizations with limited labeled data.
8. Federated Learning
Federated learning enables models to be trained on decentralized devices while maintaining data
localization, ensuring privacy and security.
It's especially effective in sectors such as healthcare and finance, where data privacy is essential, but
collaborative learning is required.
4|P a g e
Unit-1: Introduction to Machine Learning
Philosophical Foundations
Aristotle introduced the concept of logical reasoning, suggesting that thought processes could follow
structured rules, similar to mechanical systems.
René Descartes later proposed that machines might replicate aspects of human thinking, hinting at the
possibility of intelligent systems.
5|P a g e
Unit-1: Introduction to Machine Learning
DeepFace (2014)
In 2014, Facebook introduced DeepFace, a facial recognition project that used deep learning to identify
faces with high accuracy. This technology demonstrated the practical applications of ML in biometric
security and image recognition.
6|P a g e
Unit-1: Introduction to Machine Learning
2017: The Transformer architecture is introduced, becoming the foundation for modern large
language models (LLMs) like ChatGPT.
2022-Present: Release of ChatGPT sparks mainstream awareness and rapid innovation in
generative AI.
7|P a g e
Unit-1: Introduction to Machine Learning
Additionally, there is a more specific category called Semi-Supervised Learning and Self-Supervised
Learning, which combines elements of both supervised and unsupervised learning.
1. Supervised learning
Supervised learning is a machine learning approach where the model is trained on a dataset containing input-
output pairs, known as labeled
data. The goal is for the model
to learn the relationship
between inputs and their
corresponding outputs so it can
accurately predict the output for
new, similar data.
8|P a g e
Unit-1: Introduction to Machine Learning
Example: Consider the following data regarding patients entering a clinic. The data consists of the gender
and age of the patients and each patient is labeled as "healthy" or "sick".
In this example, supervised learning is to use this labeled data to train a model that can predict the label
("healthy" or "sick") for new patients based on their gender and age. For example if a new patient i.e Male
with 50 years old visits the clinic, model can classify whether the patient is "healthy" or "sick" based on the
patterns it learned during training.
Applications
Supervised learning is used in a wide variety of applications, including:
Image, speech and text processing: For tasks like image classification, speech recognition and
sentiment analysis.
Predictive analytics: To forecast sales, customer churn, stock prices and weather conditions.
Recommendation and personalization: Powering systems that suggest products, movies or content.
Healthcare and finance: Used for medical diagnosis, fraud detection and credit scoring.
Automation and control: In autonomous vehicles, manufacturing quality checks and gaming AI.
2. Unsupervised learning:
Unsupervised learning works with unlabeled data where no correct answers or categories are provided. The
model's job is to find the data, hidden patterns, similarities or groups on its own. This is useful in scenarios
where labeling data is difficult or
impossible. It’s mainly used for
clustering, dimensionality
reduction and data visualization.
9|P a g e
Unit-1: Introduction to Machine Learning
M 21
Here unsupervised learning looks for patterns or groups within the data on its own. For example it might
cluster patients by age or gender and grouping them into categories like "younger healthy patients" or "older
patients" without knowing their health status.
3. Reinforcement Learning
Reinforcement Learning (RL) trains an agent to make decisions by interacting with an environment. Instead
of being told the correct answers, agent learns by trial-and-error method and gets rewards for good actions
and penalties for bad ones. Over time it develops a strategy to maximize rewards and achieve goals. This
approach is good for problems having sequential decision making such as robotics, gaming and autonomous
systems.
10 | P a g e
Unit-1: Introduction to Machine Learning
Besides these three main types, modern machine learning also includes two other important approaches: Self-
Supervised Learning and Semi-Supervised Learning.
4. Semi-Supervised Learning
Semi-supervised learning is a hybrid machine learning approach which uses both supervised and
unsupervised learning. It uses a small amount of labelled data combined with a large amount of unlabelled
data to train models. The goal is to learn a function that accurately predicts outputs based on inputs, similar
to supervised learning, but with much less labelled data.
Applications
Image Classification: Combine small labeled and large unlabeled image datasets to improve
accuracy.
Natural Language Processing (NLP): Enhance language models by using a mix of labeled and vast
unlabeled text data.
Speech Recognition: Boost accuracy by leveraging limited transcribed audio and more unlabeled
speech data.
Recommendation Systems: Improve recommendations using sparse labeled data and abundant
unlabeled user behavior.
Healthcare & Medical Imaging: Improve medical image analysis with a mix of labeled and
unlabeled images.
5. Self-Supervised Learning
Self-Supervised Learning (SSL) is a type of machine learning where a model is trained using data that does
not have any labels or answers provided. Instead of needing people to label the data, the model finds patterns
and creates its own labels from the data automatically.
This allows the model to learn useful information by teaching itself from the data. SSL is especially useful
when there is a lot of data but only a small part of it is labelled or labelling the data would take a lot of time
and effort.
Real-life examples include GPT-3/4 predicting missing words in text, AI systems colorizing black-and-white
photos, and autonomous vehicles learning to navigate by predicting future video frames from raw sensor
data.
11 | P a g e
Unit-1: Introduction to Machine Learning
Applications
Natural Language Processing
Computer Vision and Speech Recognition
Video understanding
Pre-training for large AI models
Real-World Applications
12 | P a g e
Unit-1: Introduction to Machine Learning
“A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, it its performance at tasks in T, as measured by P, improves with experience E.”
- Tom M. Mitchell
Certain examples that efficiently defines the well-posed learning problem are -
1. To better filter emails as spam or not
Task - Classifying emails as spam or not
Performance Measure - The fraction of emails accurately classified as spam or not spam
Experience - Observing you label emails as spam or not spam
13 | P a g e
Unit-1: Introduction to Machine Learning
Key steps include defining the learning task (T), performance measure (P), and training experience (E), then
selecting the target function, its representation, and an appropriate approximation algorithm, followed by
deployment and ongoing monitoring.
In order to choose the right training experience for your algorithm, consider these three attributes,
a) Type of Feedback: Check whether the training experience provides direct or indirect feedback
to the algorithm based on the choices of the performance system. In Direct feedback, you get the
feedback of your choice immediately. In the case of indirect feedback, you get a sequence of
moves and the final outcome of the sequence of action.
b) Degree: The degree of a training experience refers to the extent up to which the learner can
control the sequence of training. For example, the learner might rely on constant feedback about
the moves played or it might itself propose a sequence of actions and only ask for help when in
need.
c) The representation of the distribution of samples across which performance will be tested is
the third crucial attribute. This basically means the more diverse the set of training experience
can be the better the performance can get.
Decide on the type of training experience based on the problem and the nature of the data:
Supervised Learning: Best for tasks where labeled data is available, such as predicting loan defaults.
Unsupervised Learning: Suitable for uncovering hidden patterns in unlabeled data, such as
clustering customers.
Reinforcement Learning: Ideal for dynamic environments like robotics or gaming, where the system
learns through trial and error.
14 | P a g e
Unit-1: Introduction to Machine Learning
The nature of a target function depends on the type of machine learning problem that's being solved, and
whether the solution involves regression, binary classification, multiclass classification or more complex
nonlinear mappings.
Choosing the appropriate representation for the target function depends on the problem’s complexity and
data characteristics:
Decision Trees: Effective for hierarchical decision-making tasks.
Neural Networks: Suitable for handling non-linear relationships in large, complex datasets.
Linear Models: Best for interpretable, straightforward problems.
This step requires balancing model complexity with interpretability and computational efficiency.
Common Algorithms
Various machine learning algorithms are used for function approximation, including:
Linear Regression: Uses a linear function to approximate the target function.
Neural Networks (Deep Learning): Can model complex non-linear relationships. They learn
features automatically from data through algorithms like backpropagation and gradient descent.
Support Vector Machines (SVMs): Can be used for both classification and regression
Decision Trees: Can approximate discrete-valued or continuous target functions by using a set of
rules (yes/no questions) to partition the input space.
Basis Function Methods: Transform the input data into a higher-dimensional feature space where
the function approximation problem might be easier to solve (e.g., using Radial Basis Functions or
tile coding)
Train the chosen model using the prepared data. This involves:
Feed the prepared data to the algorithm to train the model
15 | P a g e
Unit-1: Introduction to Machine Learning
algorithms like Gradient Descent and Backpropagation, the model scales input features to determine
their importance, effectively "learning" by strengthening or weakening connections.
The final design is developed after several iterative steps, including data collection, preprocessing, feature
selection, and model training. It represents the fully matured, operational system capable of handling real-
world scenarios, such as in the classic example of designing a checkers-playing machine.
16 | P a g e
Unit-1: Introduction to Machine Learning
1. Classification
Classification in machine learning is a supervised learning technique that categorizes input data into
predefined classes or labels based on trained patterns. It predicts categorical outcomes (e.g., "spam" or
"not spam") rather than continuous values, making it essential for tasks like spam filtering, image
recognition, and medical diagnosis.
For instance, an algorithm can learn to predict whether a given email is spam or ham (no spam), as
illustrated below.
3. Multi-Label Classification: Each instance can be assigned multiple labels simultaneously. A movie
recommendation system could tag a movie as both action and comedy.
17 | P a g e
Unit-1: Introduction to Machine Learning
Classification Algorithms
There are various types of classifiers algorithms. Some of them are:
i) Linear Classifiers: Linear classifier models create a linear decision boundary between classes. They
are simple and computationally efficient. Some of the linear classification models are as follows:
Logistic Regression
Support Vector Machines having kernel = 'linear'
Single-layer Perceptron
Stochastic Gradient Descent (SGD) Classifier
ii) Non-Linear Classifiers: Non-linear models create a non-linear decision boundary between classes.
They can capture more complex relationships between input features and target variable. Some of the
non-linear classification models are as follows:
K-Nearest Neighbours
Kernel SVM
Naive Bayes
Decision Tree Classification
Ensemble learning classifiers:
Random Forests,
AdaBoost,
Bagging Classifier,
Voting Classifier,
Extra Trees Classifier
Multi-layer Artificial Neural Networks
2. Regression
Regression in machine learning is a supervised learning technique used to predict continuous numerical
outcomes by modeling the relationship between dependent (target) and independent (predictor)
variables. It fits a line or curve to data, minimizing errors to forecast trends like prices, sales, or
temperature.
It helps understand how changes in one or more factors influence a measurable outcome and is widely
used in forecasting, risk analysis, decision-making and trend estimation.
Works with real-valued output variables
Helps to identify strengths and the type of relationships
Supports both simple and complex predictive models.
Used for tasks like price prediction, trend forecasting and risk scoring.
18 | P a g e
Unit-1: Introduction to Machine Learning
Types of Regression
Regression can be classified into different types based on the number of predictor variables and the
nature of the relationship between variables:
Applications
Predicting prices: Used to predict the price of a house based on its size, location and other features.
Forecasting trends: Model to forecast the sales of a product based on historical sales data.
Identifying risk factors: Used to identify risk factors for heart patient based on patient medical
data.
Making decisions: It could be used to recommend which stock to buy based on market data.
3. Clustering
Clustering is an unsupervised machine learning technique that automatically groups unlabeled data
points into clusters based on shared similarities or patterns, such as distance, density, or statistical
distribution. It is used for pattern recognition, customer segmentation, image processing, and anomaly
detection by organizing data into meaningful, homogeneous groups without pre-existing labels.
For example, if we have customer purchase data, clustering can group customers with similar shopping
habits. These clusters can then be used for targeted marketing, personalized recommendations or customer
segmentation.
Types of Clustering
Let's see the types of clustering,
i. Hard Clustering: In hard clustering, each data point strictly belongs to exactly one cluster, no overlap
is allowed. This approach assigns a clear membership, making it easier to interpret and use for definitive
segmentation tasks.
ii. Soft Clustering: Soft clustering assigns each data point a probability or degree of membership to
multiple clusters simultaneously, allowing data points to partially belong to several groups.
19 | P a g e
Unit-1: Introduction to Machine Learning
Applications of Clustering:
Customer Segmentation: Grouping customers by purchasing behavior for targeted marketing.
Anomaly Detection: Identifying outliers in fraud detection or network security.
Image Segmentation: Dividing images into distinct regions based on pixel similarities.
Document Classification: Clustering documents or news articles by topic.
Relationship: Regression and Classification are similar in that they both use training data with
known labels to make predictions, while Clustering relies on input data only to find patterns.
20 | P a g e
Unit-1: Introduction to Machine Learning
1. Data Issues:
o Poor Quality/Noisy Data: Poor-quality data, which may be incorrect, incomplete, missing, noisy, or
inconsistent, can lead to inaccurate predictions and flawed outcomes. Data preprocessing is a crucial
step to ensure that data is clean and ready for analysis.
o Data Bias: Bias in machine learning refers to systematic errors where an algorithm produces skewed,
unfair, or inaccurate results, often due to flawed or prejudiced data, incorrect assumptions, or human
biases during development, leading to poor performance or discrimination, especially against
marginalized groups.
o Insufficient Data: Machine learning models require large amounts of high-quality, labelled data to
learn effectively. However, in many domains, obtaining such data is difficult due to factors like privacy
concerns, costs of data collection, and data sparsity. When the training dataset is too small, models can
struggle to capture meaningful patterns, resulting in poor performance on unseen data. This problem
becomes particularly pronounced in fields like healthcare, where collecting large, diverse datasets is
challenging.
o Imbalanced Datasets: Occurs when one class of data is significantly more prevalent than another,
causing the model to perform poorly on the minority class.
2. Model Performance:
o Overfitting: Overfitting occurs when a machine learning model becomes too complex and fits the
noise in the training data rather than the underlying patterns. This results in poor generalization to new
data. Overfitting is caused by models with too many or irrelevant parameters, or when there is
insufficient regularization.
o Underfitting: The model is too simple to capture the underlying patterns in the data. Underfitting
occurs when the model is too simple or lacks the capacity to capture complex patterns.
o Model Drift and Continuous Monitoring: Over time, changes in the data distribution can lead to
model performance degradation, a phenomenon known as model drift. Once a machine learning model
is deployed, continuous monitoring is essential to ensure that it remains accurate and relevant. Models
require periodic updates and retraining to ensure they continue to deliver accurate predictions as new
data becomes available.
4. Operational Challenges:
o Lack of Skilled Resources: Machine Learning isn’t just about algorithms it requires deep
understanding of the domain to interpret results correctly. Without domain insight, even accurate
models can lead to poor business decisions. The demand for skilled machine learning professionals far
exceeds the available supply, creating a skills gap that slows the adoption of machine learning
technologies.
o Privacy and Legal Issues: Machine Learning models often rely on sensitive user data, creating risks
around data leaks, misuse or non-compliance with laws. Privacy issues focus on individual rights
regarding how personal data is collected, used, and shared (e.g., in training data). Legal issues centre
on compliance with regulations (like GDPR, CCPA) governing data handling, algorithmic fairness, and
intellectual property.
o Rapid Technological Evolution and Skill Gaps: ML technology evolves rapidly, making it difficult
for professionals to stay up to date. New tools, frameworks and models appear faster than teams can
adapt, creating significant skill gaps
Machine Learning (ML): Techniques that allow machines to learn from data to improve predictions
without being explicitly programmed (e.g., linear regression, decision trees).
Deep Learning (DL): A specialized subset of ML, using multi-layered neural networks (ANN, CNN,
RNN) to process complex data like images and text, mimicking the human brain.
Data Science (DS): Uses data to gain insights, combining ML, statistics, and domain expertise. It is
often broader than just building models, focusing on data analysis, visualization, and interpretation.
22 | P a g e
Unit-1: Introduction to Machine Learning
Comparison Table
Core Interactions:
Data Science provides the data and insights necessary to build intelligent models.
AI applications are realized through ML and DL algorithms.
Deep Learning is used for the most complex, unstructured data tasks within the broader context
of Data Science and AI.
23 | P a g e
Unit-1: Introduction to Machine Learning
By Data Structure
This classification focuses on how data is organized and stored.
Structured Datasets: Data is organized in a fixed, tabular format with clearly defined rows and
columns, similar to a spreadsheet or relational database table. This format is easy to query and
analyze.
o Examples: Financial records, customer databases, inventory systems.
Unstructured Datasets: This data lacks a predefined format or schema and is stored in its native
form. It often requires more sophisticated tools for processing and analysis.
o Examples: Text documents (emails, reports), images, audio recordings, and videos.
Semi-structured Datasets: This type falls between the two extremes, incorporating some markers
or syntax for organization but not following a rigid schema.
o Examples: JSON, XML, and HTML files used in web applications and APIs.
By Purpose in ML Workflow
In the practical application of machine learning, a single dataset is typically split into three subsets for
different stages of model development.
Training Data: The largest portion of the data used to train the machine learning model, where the
model learns the relationships and patterns between features and target outcomes.
Validation Data: A subset used during the training phase to tune model hyperparameters and prevent
overfitting (when a model performs well on training data but poorly on new data).
Testing Data: A separate, unseen subset of data used to evaluate the final model's performance and
accuracy on new, real-world examples.
24 | P a g e
Unit-1: Introduction to Machine Learning
The choice of evaluation metrics depends on the specific problem domain, the type of data, and the desired
outcome. Evaluation metrics can vary depending on the type of problem: classification or regression, or
clustering.
Classification Metrics
Classification metrics evaluate models that predict discrete outcomes, such as determining whether an email
is spam or not. These metrics help assess how well the model distinguishes between different classes.
In classification problems, we use two types of algorithms (dependent on the kind of output it creates):
1. Class output: Algorithms like SVM and KNN create a class output. For instance, in a binary
classification problem, the outputs will be either 0 or 1. However, today we have algorithms that can
convert these class outputs to probability. But these algorithms are not well accepted by the statistics
community.
2. Probability output: Algorithms like Logistic Regression, Random Forest, Gradient Boosting,
Adaboost, etc., give probability outputs. Converting probability outputs to class output is just a matter
of creating a threshold probability.
1. Confusion Matrix
Confusion matrix is a table that summarizes the performance of a classification model. It’s particularly useful
for visualizing the predicted vs. actual (true) outcomes. Confusion matrix creates a N X N matrix, where N
is the number of classes or categories that are to be predicted. Here we have N = 2, so we get a 2 X 2 matrix.
Suppose there is a problem with our practice which is a binary classification. Samples of that classification
belong to either Yes or No. So, we build our classifier which will predict the class for the new input sample.
After that, we tested our model with 165 samples and we get the following result.
25 | P a g e
Unit-1: Introduction to Machine Learning
Handles Class Imbalance: If your dataset is imbalanced, a confusion matrix helps uncover whether
the model is biased towards predicting the majority class, even if the accuracy might seem deceptively
high.
Calculating Multiple Metrics: It is extremely useful for measuring precision-recall, Specificity,
Accuracy, and most importantly, AUC-ROC curves.
2. Accuracy
Accuracy is one of the simplest metrics. It measures the percentage of correctly predicted labels out of the
total predictions.
Formula:
Or
Limitations
Sensitive to class imbalance: Accuracy can be misleading when working with imbalanced datasets. For
example, in a dataset with 90% class A and 10% class B, a model predicting only class A will still achieve
90% accuracy but it will fail to identify any class B instances.
Strengths
Easy to understand: It has a straightforward interpretation.
Good for balanced classes: When your dataset has roughly equal proportions of each class, accuracy
is a suitable overall performance indicator.
3. Precision
Precision indicates the proportion of correctly predicted positive results out of all predicted positive results.
It’s crucial when minimizing false positives is more important than capturing all positives, such as in spam
detection or financial fraud analysis.
Formula:
Example: If a spam filter correctly identifies 80 spam emails but incorrectly flags 20 non-spam emails as
spam, the precision is 80 / (80 + 20) = 80%.
26 | P a g e
Unit-1: Introduction to Machine Learning
Focus: Precision tells you how reliable the model is when it says something is positive. A high-precision
model is less likely to produce false alarms.
4. Recall (Sensitivity)
Recall or Sensitivity measures how many of the actual positive cases were correctly identified by the model.
It is important when missing a positive case (false negative) is more costly than false positives.
Formula:
In scenarios where catching all positive cases is important (like disease detection), recall is a key metric.
Example: If a model detects 90 out of 100 actual spam emails, its recall is 90 / (90+10) = 90%.
Focus: Recall reflects how good your model is at not missing out on the class you are truly interested in. A
high recall model minimizes false negatives.
5. F1 Score
The F1 Score is the harmonic mean of precision and recall. It is useful when we need a balance between
precision and recall as it combines both into a single number. A high F1 score means the model performs
well on both metrics. Its range is [0,1].
Lower recall and higher precision give us great accuracy but then it misses a large number of instances. More
the F1 score better will be performance. It can be expressed mathematically in this way:
Example: For a model with precision of 80% and recall of 70%, the F1 score is (2 x 0.8 x 0.7) / (0.8 + 0.7)
= 74.3
Intuition:
Balances Precision and Recall: The F1-score provides a single metric that considers both. A high
F1-score implies a model that is good at both classifying positive cases correctly (precision) and
finding most of the positive cases (recall).
27 | P a g e
Unit-1: Introduction to Machine Learning
Penalizes Imbalance: Because it’s a harmonic mean, the F1-Score becomes low if either precision
or recall is low.
Where:
𝑦𝑖𝑗 =Actual class (0 or 1) for sample 𝑖 and class 𝑗
𝑝𝑖𝑗 =Predicted probability for sample 𝑖 and class 𝑗
The goal is to minimize Log Loss, as a lower Log Loss shows higher prediction accuracy.
AUC-ROC measures the model’s ability to distinguish between positive and negative classes across various
threshold levels. It is useful for binary classification tasks. The AUC value represents the probability that the
model will rank a randomly chosen positive example higher than a randomly chosen negative example. AUC
ranges from 0 to 1 with higher values showing better model performance.
Key Components:
28 | P a g e
Unit-1: Introduction to Machine Learning
Interpretation:
AUC = 1: Perfect classification.
AUC = 0.5: No discrimination (random guessing).
AUC between 0.5 and 1: Has some ability to discriminate between classes.
AUC < 0.5: Model performs worse than random guessing (showing that the model is inverted).
ROC Curve
It is a graphical representation of the True Positive Rate (TPR) vs the False Positive Rate (FPR) at different
classification thresholds. The curve helps us visualize the trade-offs between sensitivity (TPR) and specificity
(1 - FPR) across various thresholds. Area Under Curve (AUC) quantifies the overall ability of the model to
distinguish between positive and negative classes.
Axes:
X-axis: False Positive Rate (FPR) = 1 — Specificity (Measures how often a negative instance is
wrongly classified as positive.)
Y-axis: True Positive Rate (TPR) = Recall (Measures how many of the positive cases the model
catches.)
Example:
Imagine two models that detect disease:
Model A has a higher accuracy but a lower AUC than Model B.
The ROC curve reveals that Model B is better at ranking the patients likely to have the disease even
if it might misclassify a few more individuals overall.
29 | P a g e
Unit-1: Introduction to Machine Learning
Regression Metrics
Regression metrics are used to evaluate machine learning models that predict continuous outcomes, such as
housing prices, stock values, or sales forecasts. These metrics measure the difference between the predicted
and actual values to determine how well a model performs in regression tasks. Below is a detailed explanation
of the most commonly used regression metrics.
Formula:
Where:
Example: If the actual values are [3, 5, 7] and the predicted values are [2, 5, 8], the MAE is:
Limitation:
It gives a clear view of the model’s prediction accuracy but it doesn't shows whether the errors are due to
over- or under-prediction. It is simple to calculate and interpret helps in making it a good starting point for
model evaluation.
Formula:
30 | P a g e
Unit-1: Introduction to Machine Learning
Where:
𝑦𝑗 = Actual value
𝑦̂𝑗 = Predicted value
Where:
𝑦𝑗 = Actual value
𝑦̂𝑗 = Predicted value
Formula:
or
Where:
𝑦𝑗 = Actual value
𝑦̂𝑗 = Predicted value
Use Case: RMSLE is often used in scenarios like sales forecasting, where large variations in the target
variable exist.
5. R² (R-squared)
R2 score represents the proportion of the variance in the dependent variable that is predictable from the
independent variables. An R² value close to 1 shows a model that explains most of the variance while a value
close to 0 shows that the model does not explain much of the variability in the data. R² is used to assess the
goodness-of-fit of regression models.
Formula:
Where:
𝑦𝑗 = Actual value
𝑦̂𝑗 = Predicted value
𝑦ˉ= Mean of the actual values
31 | P a g e
Unit-1: Introduction to Machine Learning
Example: An R2 score of 0.85 means the model explains 85% of the variance in the target variable.
Use Case: It’s widely used in linear regression and serves as a quick measure of model performance.
6. Adjusted R² Score
The adjusted R² score modifies the R² score to account for the number of predictors in the model, preventing
overfitting.
Formula:
N: Number of observations.
k: Number of predictors.
Use Case: Essential for evaluating models with multiple features or predictors.
Clustering Metrics
In unsupervised learning tasks such as clustering, the goal is to group similar data points together. Evaluating
clustering performance is often more challenging than supervised learning since there is no explicit ground
truth. However, clustering metrics provide a way to measure how well the model is grouping similar data
points.
1. Silhouette Score
Silhouette Score evaluates how well a data point fits within its assigned cluster considering how close it is
to points in its own cluster (cohesion) and how far it is from points in other clusters (separation). A higher
silhouette score (close to +1) shows well-clustered data while a score near -1 suggests that the data point is
in the wrong cluster.
Formula:
Where:
a = Average distance between a sample and all other points in the same cluster
b = Average distance between a sample and all points in the nearest cluster
2. Davies-Bouldin Index
Davies-Bouldin Index measures the average similarity between each cluster and its most similar cluster. A
lower Davies-Bouldin index shows better clustering as it suggests the clusters are well-separated and
compact. The goal is to minimize the Davies-Bouldin index to achieve optimal clustering.
Formula:
Where:
𝜎𝑖 = Average distance of points in cluster i from the cluster centroid
𝑑(𝑐𝑖 , 𝑐𝑗 ) = Distance between centroids of clusters i and j
By mastering the appropriate evaluation metrics, we upgrade ourselves to fine-tune machine learning models
which helps in ensuring they meet the needs of diverse applications and deliver optimal performance.
32 | P a g e