0% found this document useful (0 votes)

4 views34 pages

Topic 2 - Overview of Machine Learning

The document provides an overview of machine learning, covering core concepts such as classification, prediction, and the differences between supervised and unsupervised learning. It outlines the steps involved in a machine learning project, including data preprocessing and evaluation metrics like accuracy, precision, recall, and F1 score. Additionally, it discusses challenges like overfitting and underfitting, along with strategies to mitigate these issues.

Uploaded by

kasne26o2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views34 pages

Topic 2 - Overview of Machine Learning

Uploaded by

kasne26o2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Overview of

Machine Learning
2

Outline

1. Core Ideas in Machine Learning

2. The steps in a Machine Learning project
3. Preliminary steps
4. Predictive Power and Overfitting
1. Core Ideas in Machine Learning
4

Core Ideas in Machine Learning

- Classification & Prediction
- Association Rules and Recommendation Systems
- Data Reduction and Dimension Reduction
- Data Exploration and Visualization
- Supervised and Unsupervised Learning
5
6

Classification & Prediction

- Classification:
- The most basic form of predictive analysis
• An applicant for a loan can repay on time, repay late, or declare bankruptcy.
• A patient can be diagnosed with cancer or normal
- Common task in machine learning is to examine data where the classification is unknown or will
occur in the future, with the goal of predicting what classification is or will be
- Prediction: Trying to predict the value of a numerical attribute (amount of purchase) rather than a class
(purchase or not)
7

Classification: A Two-Step Process

Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class, as determined by the class label
attribute
 The set of tuples used for model construction is training set
 The model is represented as classification rules, decision trees, or mathematical formulae
Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the classified result from the model
 Accuracy rate is the percentage of test set samples that are correctly classified by the model
 Test set is independent of training set, otherwise over-fitting will occur
 If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known
8

Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

M ik e A ssistan t P ro f 3 no (Model)
M ary A ssistan t P ro f 7 yes
B ill P ro fesso r 2 yes
J im A sso c iate P ro f 7 yes
IF rank = ‘professor’
D ave A ssistan t P ro f 6 no
OR years > 6
Anne A sso c iate P ro f 3 no
THEN tenured = ‘yes’
9

Process (2): Using the model for Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tenured?
Tom A ssistan t P ro f 2 no
M erlisa A sso c iate P ro f 7 no
G eo rg e P ro fesso r 5 yes
J o sep h A ssistan t P ro f 7 yes
10

Association Rules and Recommendation

Systems
• Association rule is designed to find such general associations patterns between items in large databases
• Online recommendation systems, such as those used on Amazon and Netflix, use collaborative filtering,
a method that uses individual users’ preferences and tastes given their historic purchase, rating,
browsing, or any other measurable behavior indicative of preference, as well as other users’ history
11

Data Reduction and Dimension Reduction

• Data Reduction: the process of consolidating a large number of records (or cases) into a smaller set. For
example, rather than dealing with thousands of product types, an analyst might wish to group them into
a smaller number of groups and build separate models for each group
• Dimension Reduction: Reducing the number of attributes. Dimension reduction is a common initial step
before deploying supervised learning methods, intended to improve predictive power, manageability,
and interpretability.
12

Supervised and Unsupervised Learning

Supervised learning involves training a model on Unsupervised learning works with unlabeled data,
labeled data, where each input has a corresponding aiming to find hidden patterns or structures.
output.
Data Requirement: Does not require labeled data;
Data Requirement: Requires labeled datasets with only input features are provided.
input-output pairs (e.g., features and target labels).
Objective: To predict or classify the output based on Objective: To explore the data and identify patterns
new input data. like clusters, associations, or anomalies.

Common Algorithms: Linear Regression; Logistic Common Algorithms: K-Means Clustering;

Regression; Support Vector Machines (SVM); Hierarchical Clustering; Principal Component
Neural Networks; Decision Trees and Random Analysis (PCA); Autoencoders
Forests
Applications: Customer segmentation; Market basket
Applications: Email spam detection; Fraud analysis; Anomaly detection; Dimensionality
detection; Predicting house prices; Image reduction
classification
2. The steps in Machine
Learning project
14

The Steps in Machine Learning project

3. Preliminary steps
16

Preprocessing and Cleaning the Data

• Handling Categorical Attributes

• Feature Selection
• Outliers
• Missing Values
• Normalizing (Standardizing) and Rescaling Data
4. Predictive Power and Overfitting
18

Predictive Power
19

Predictive Power
20

Predictive Power - Accuracy

Predictive Power refers to the ability of a statistical model, machine learning model, or algorithm
to accurately forecast or predict outcomes based on input data.

Accuracy: The ratio of correctly predicted observations to the total observations. It measures the
overall correctness of the model
True Positives TP +True Negatives (TN)
Accuracy =
Total Observation

Limitations: Not reliable for imbalanced datasets, as it may overestimate performance when one
class dominates.
21

Predictive Power - Precision

Precision (Positive Predictive Value): The ratio of correctly predicted positive observations to the
total predicted positives. It focuses on the quality of positive predictions
True Positives TP
Precision =
True Positives TP + False Postive FP
High precision means fewer false positives. Example: In fraud detection, precision ensures
flagged transactions are fraudulent.
22

Predictive Power - Recall

Recall (Sensitivity or True Positive Rate): The ratio of correctly predicted positive observations to
all actual positives. It focuses on capturing all positive instances
True Positives (TP)
Recall =
True Positives TP +False Negative (FN)

High recall means fewer false negatives. Example: In disease diagnosis, recall ensures all patients
with the disease are detected.
23

Predictive Power – F1 Score

F1-Score: The harmonic mean of Precision and Recall, balancing the two metrics. It is best used
when you need a balance between precision and recall, especially in imbalanced datasets.
Precision × Recall
F1 − Score = 2 ×
Precision + Recall
Range: 0 to 1, where 1 indicates perfect precision and recall.
24

When to use which metrics

• Accuracy: When classes are balanced.

• Precision: When the cost of false positives is high (e.g., spam filtering).
• Recall: When the cost of false negatives is high (e.g., cancer diagnosis).
• F1-Score: When you need a trade-off between precision and recall.
25

Example:
Given a confusion matrix
Calculate Accuracy, Precision, Recall and F1-Score.
26

Overfitting
• Overfitting occurs when a machine learning model learns the training data too well, including its noise and irrelevant details.
As a result, the model performs exceptionally well on the training data but poorly on unseen or test data. This happens
because the model becomes overly complex and fails to generalize to new data.
• Key Characteristics of Overfitting:
• High Training Accuracy, Low Test Accuracy: The model performs perfectly or nearly perfectly on the training set
but fails to predict accurately on the test set.
• Complex Models: Models with too many parameters or excessive complexity are more prone to overfitting.
• Causes of Overfitting:
• Insufficient Training Data: When the dataset is too small, the model may struggle to learn general patterns.
• High Model Complexity: Overly complex models (e.g., deep neural networks with too many layers) can overfit the
training data.
• Noise in Data: If the training data contains irrelevant features or random noise, the model might learn these instead
of the true underlying patterns.
• Insufficient Regularization: Without constraints (like regularization), the model might over-optimize for the
training data.
27

How to prevent Overfitting

• Use More Data: Increasing the size of the training dataset helps the model generalize better.
• Simplify the Model: Reduce the complexity of the model by using fewer parameters or layers.
• Regularization Techniques:
• L1 and L2 Regularization: Penalize large weights to discourage over-complex models.
• Dropout: Randomly drop neurons during training to prevent co-adaptation.
• Cross-Validation: Use techniques like k-fold cross-validation to evaluate the model on multiple subsets of data.
• Early Stopping: Stop training the model when performance on a validation set stops improving.
• Data Augmentation: Increase dataset diversity by applying transformations like rotations, flips, or noise.
• Pruning Features: Remove irrelevant or redundant features from the dataset.
28

Underfitting
• Underfitting occurs when a machine learning model is too simple to capture the underlying patterns in
the data. As a result, the model performs poorly on both the training and test datasets. It fails to learn
the relationships in the data, leading to inaccurate predictions or classifications.
• Key Characteristics of Underfitting:
• Low Training and Test Accuracy: The model does not perform well even on the training data.
• Oversimplified Model: The model is too basic, lacking the complexity needed to represent the
data effectively.
• Poor Learning: The model misses important trends or patterns in the data.
29

How to prevent Underfitting

• Increase Model Complexity:
• Use more advanced models (e.g., switch from linear regression to polynomial regression).
• Add more layers or neurons in neural networks.
• Train for Longer: Ensure the model is adequately trained to converge to an optimal solution.
• Add Relevant Features: Include additional informative features in the dataset.
• Reduce Regularization: Adjust regularization parameters to avoid over-constraining the model.
• Hyperparameter Tuning: Optimize hyperparameters like learning rate, depth, or number of trees in
decision tree-based models.
• Train for Longer: Ensure the model is adequately trained to converge to an optimal solution.
30

Evaluating Classification Methods

Accuracy
 classifier accuracy: predicting class label
 predictor accuracy: guessing value of predicted attributes
Speed
 time to construct the model (training time)
 time to use the model (classification/prediction time)
Robustness: handling noise and missing values
Scalability: efficiency in disk-resident databases
Interpretability
 understanding and insight provided by the model
Other measures, e.g., goodness of rules, such as decision tree size or compactness of
classification rules
31

Evaluating Regression Methods

Calculate MSE, MAE, RMSE, R2

Thank You
Pham Thi Viet Huong
0944817152
huongptv@[Link]

Machine Learning Methodology Overview
No ratings yet
Machine Learning Methodology Overview
53 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
14 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
42 pages
Chapter - 01 - Introduction To ML
No ratings yet
Chapter - 01 - Introduction To ML
60 pages
Machine Learning Fundamentals Notes
100% (1)
Machine Learning Fundamentals Notes
4 pages
UNIT III-Machine Learning Full Notes
No ratings yet
UNIT III-Machine Learning Full Notes
18 pages
Machine Learning Types and Techniques
No ratings yet
Machine Learning Types and Techniques
19 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
6 pages
05 Basic Practice
No ratings yet
05 Basic Practice
32 pages
Machine Learning for Business Analytics
No ratings yet
Machine Learning for Business Analytics
43 pages
Machine Learning: Supervised Techniques
No ratings yet
Machine Learning: Supervised Techniques
30 pages
Machine Learning Simplified
No ratings yet
Machine Learning Simplified
24 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
13 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
11 pages
ML Notes
No ratings yet
ML Notes
49 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
5 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
70 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
15 pages
Machine Learning Basics and Applications
No ratings yet
Machine Learning Basics and Applications
22 pages
Machine Learning Challenges and Solutions
No ratings yet
Machine Learning Challenges and Solutions
32 pages
Data Sources for Machine Learning Models
No ratings yet
Data Sources for Machine Learning Models
36 pages
SAMPLE QUESTIONS and ANSWERS-ML
No ratings yet
SAMPLE QUESTIONS and ANSWERS-ML
5 pages
01 Intro To ML
No ratings yet
01 Intro To ML
34 pages
Unit-1 Machine Learning Techniques
No ratings yet
Unit-1 Machine Learning Techniques
10 pages
Machine Learning Overview and Applications
No ratings yet
Machine Learning Overview and Applications
46 pages
Machine Learning Process and Concepts
No ratings yet
Machine Learning Process and Concepts
25 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
84 pages
Management Analytics Course Overview
No ratings yet
Management Analytics Course Overview
69 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
29 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
57 pages
Unit - 1 - MP
No ratings yet
Unit - 1 - MP
113 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
12 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
138 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
15 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
54 pages
MLEUNIT 02 01 Intro MLPDF 2026 02 11 23 03 33
No ratings yet
MLEUNIT 02 01 Intro MLPDF 2026 02 11 23 03 33
29 pages
Applied Machine Learning and MLOps Guide
No ratings yet
Applied Machine Learning and MLOps Guide
9 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
51 pages
Machine Learning Fundamentals Explained
No ratings yet
Machine Learning Fundamentals Explained
24 pages
Unit 1
No ratings yet
Unit 1
50 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
22 pages
1 Overview
No ratings yet
1 Overview
17 pages
04 Machine Learning Fundamentals Notes
No ratings yet
04 Machine Learning Fundamentals Notes
8 pages
Understanding Machine Learning Algorithms
No ratings yet
Understanding Machine Learning Algorithms
10 pages
Understanding Machine Learning Phases
No ratings yet
Understanding Machine Learning Phases
96 pages
Machine Learning Challenges Explained
No ratings yet
Machine Learning Challenges Explained
34 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
87 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
12 pages
Lecture 2
No ratings yet
Lecture 2
35 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
42 pages
Machine Learning: A Comprehensive Guide
No ratings yet
Machine Learning: A Comprehensive Guide
7 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
11 pages
K-Fold vs Stratified K-Fold Explained
No ratings yet
K-Fold vs Stratified K-Fold Explained
12 pages
Module 1
No ratings yet
Module 1
10 pages
Machine Learning MIDTERMS
No ratings yet
Machine Learning MIDTERMS
5 pages
Cellular Path-Gain and Delay Spread Model
No ratings yet
Cellular Path-Gain and Delay Spread Model
9 pages
Mathematics IV Periodical Exam Guide
No ratings yet
Mathematics IV Periodical Exam Guide
5 pages
Timeline of Logic's Historical Development
No ratings yet
Timeline of Logic's Historical Development
7 pages
Error Detection and Correction Overview
No ratings yet
Error Detection and Correction Overview
2 pages
Analyzing Logical Arguments and Fallacies
No ratings yet
Analyzing Logical Arguments and Fallacies
3 pages
Mining Social Network Graphs Insights
No ratings yet
Mining Social Network Graphs Insights
34 pages
High School Math Quiz Bowl Questions
100% (2)
High School Math Quiz Bowl Questions
2 pages
Name - Date - : Warm-Up 5
No ratings yet
Name - Date - : Warm-Up 5
4 pages
Understanding Number System Conversions
No ratings yet
Understanding Number System Conversions
15 pages
Understanding Conditional Propositions
100% (3)
Understanding Conditional Propositions
30 pages
TWS Report: Teaching Teen Numbers
No ratings yet
TWS Report: Teaching Teen Numbers
11 pages
Kinematics 2D Question Bank Solutions
No ratings yet
Kinematics 2D Question Bank Solutions
35 pages
Aesthetic Product Variant Design Ontology
No ratings yet
Aesthetic Product Variant Design Ontology
44 pages
Line Objects in Geometry Explained
No ratings yet
Line Objects in Geometry Explained
3 pages
CTS Placement Papers Overview
No ratings yet
CTS Placement Papers Overview
114 pages
Conjugate Gradient Method Overview
No ratings yet
Conjugate Gradient Method Overview
2 pages
AP Statistics Inference for Proportions
No ratings yet
AP Statistics Inference for Proportions
5 pages
Project Feasibility - Tools For Uncovering Points of Vulnerability by Mesly (2017)
100% (1)
Project Feasibility - Tools For Uncovering Points of Vulnerability by Mesly (2017)
591 pages
Introduction to R for Biological Data
No ratings yet
Introduction to R for Biological Data
2 pages
KISSsoft Tutorial: Gear Pair Verification
No ratings yet
KISSsoft Tutorial: Gear Pair Verification
19 pages
Sugarcane Harvester Chopper Performance Evaluation
No ratings yet
Sugarcane Harvester Chopper Performance Evaluation
13 pages
Computer Oriented Statistical Methods
33% (3)
Computer Oriented Statistical Methods
3 pages
Chapter 5
100% (1)
Chapter 5
22 pages
Vector Analysis in Electromagnetism
100% (2)
Vector Analysis in Electromagnetism
27 pages
Electromagnetic Waves and Fields
No ratings yet
Electromagnetic Waves and Fields
13 pages
African Speech-to-Speech Translation System
No ratings yet
African Speech-to-Speech Translation System
43 pages
Class 10 Maths Project Guidelines
No ratings yet
Class 10 Maths Project Guidelines
6 pages
Vibration Analysis of Beams on Flying Support
No ratings yet
Vibration Analysis of Beams on Flying Support
12 pages
David Deutsch on Knowledge Creation
No ratings yet
David Deutsch on Knowledge Creation
10 pages
MITIT 2025 Spring Contest Editorials
No ratings yet
MITIT 2025 Spring Contest Editorials
14 pages

Topic 2 - Overview of Machine Learning

Uploaded by

Topic 2 - Overview of Machine Learning

Uploaded by

Overview of

1. Core Ideas in Machine Learning

Core Ideas in Machine Learning

Classification & Prediction

Classification: A Two-Step Process

Process (1): Model Construction

NAME RANK YEARS TENURED Classifier

Process (2): Using the model for Prediction

Association Rules and Recommendation

Data Reduction and Dimension Reduction

Supervised and Unsupervised Learning

Common Algorithms: Linear Regression; Logistic Common Algorithms: K-Means Clustering;

The Steps in Machine Learning project

Preprocessing and Cleaning the Data

• Handling Categorical Attributes

Predictive Power - Accuracy

Predictive Power - Precision

Predictive Power - Recall

Predictive Power – F1 Score

When to use which metrics

• Accuracy: When classes are balanced.

How to prevent Overfitting

How to prevent Underfitting

Evaluating Classification Methods

Evaluating Regression Methods

Evaluating Regression Methods

Evaluating Regression Methods

Calculate MSE, MAE, RMSE, R2

You might also like