Overview of
Machine Learning
2
Outline
1. Core Ideas in Machine Learning
2. The steps in a Machine Learning project
3. Preliminary steps
4. Predictive Power and Overfitting
1. Core Ideas in Machine Learning
4
Core Ideas in Machine Learning
- Classification & Prediction
- Association Rules and Recommendation Systems
- Data Reduction and Dimension Reduction
- Data Exploration and Visualization
- Supervised and Unsupervised Learning
5
6
Classification & Prediction
- Classification:
- The most basic form of predictive analysis
• An applicant for a loan can repay on time, repay late, or declare bankruptcy.
• A patient can be diagnosed with cancer or normal
- Common task in machine learning is to examine data where the classification is unknown or will
occur in the future, with the goal of predicting what classification is or will be
- Prediction: Trying to predict the value of a numerical attribute (amount of purchase) rather than a class
(purchase or not)
7
Classification: A Two-Step Process
Model construction: describing a set of predetermined classes
Each tuple/sample is assumed to belong to a predefined class, as determined by the class label
attribute
The set of tuples used for model construction is training set
The model is represented as classification rules, decision trees, or mathematical formulae
Model usage: for classifying future or unknown objects
Estimate accuracy of the model
The known label of test sample is compared with the classified result from the model
Accuracy rate is the percentage of test set samples that are correctly classified by the model
Test set is independent of training set, otherwise over-fitting will occur
If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known
8
Process (1): Model Construction
Classification
Algorithms
Training
Data
NAME RANK YEARS TENURED Classifier
M ik e A ssistan t P ro f 3 no (Model)
M ary A ssistan t P ro f 7 yes
B ill P ro fesso r 2 yes
J im A sso c iate P ro f 7 yes
IF rank = ‘professor’
D ave A ssistan t P ro f 6 no
OR years > 6
Anne A sso c iate P ro f 3 no
THEN tenured = ‘yes’
9
Process (2): Using the model for Prediction
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tenured?
Tom A ssistan t P ro f 2 no
M erlisa A sso c iate P ro f 7 no
G eo rg e P ro fesso r 5 yes
J o sep h A ssistan t P ro f 7 yes
10
Association Rules and Recommendation
Systems
• Association rule is designed to find such general associations patterns between items in large databases
• Online recommendation systems, such as those used on Amazon and Netflix, use collaborative filtering,
a method that uses individual users’ preferences and tastes given their historic purchase, rating,
browsing, or any other measurable behavior indicative of preference, as well as other users’ history
11
Data Reduction and Dimension Reduction
• Data Reduction: the process of consolidating a large number of records (or cases) into a smaller set. For
example, rather than dealing with thousands of product types, an analyst might wish to group them into
a smaller number of groups and build separate models for each group
• Dimension Reduction: Reducing the number of attributes. Dimension reduction is a common initial step
before deploying supervised learning methods, intended to improve predictive power, manageability,
and interpretability.
12
Supervised and Unsupervised Learning
Supervised learning involves training a model on Unsupervised learning works with unlabeled data,
labeled data, where each input has a corresponding aiming to find hidden patterns or structures.
output.
Data Requirement: Does not require labeled data;
Data Requirement: Requires labeled datasets with only input features are provided.
input-output pairs (e.g., features and target labels).
Objective: To predict or classify the output based on Objective: To explore the data and identify patterns
new input data. like clusters, associations, or anomalies.
Common Algorithms: Linear Regression; Logistic Common Algorithms: K-Means Clustering;
Regression; Support Vector Machines (SVM); Hierarchical Clustering; Principal Component
Neural Networks; Decision Trees and Random Analysis (PCA); Autoencoders
Forests
Applications: Customer segmentation; Market basket
Applications: Email spam detection; Fraud analysis; Anomaly detection; Dimensionality
detection; Predicting house prices; Image reduction
classification
2. The steps in Machine
Learning project
14
The Steps in Machine Learning project
3. Preliminary steps
16
Preprocessing and Cleaning the Data
• Handling Categorical Attributes
• Feature Selection
• Outliers
• Missing Values
• Normalizing (Standardizing) and Rescaling Data
4. Predictive Power and Overfitting
18
Predictive Power
19
Predictive Power
20
Predictive Power - Accuracy
Predictive Power refers to the ability of a statistical model, machine learning model, or algorithm
to accurately forecast or predict outcomes based on input data.
Accuracy: The ratio of correctly predicted observations to the total observations. It measures the
overall correctness of the model
True Positives TP +True Negatives (TN)
Accuracy =
Total Observation
Limitations: Not reliable for imbalanced datasets, as it may overestimate performance when one
class dominates.
21
Predictive Power - Precision
Precision (Positive Predictive Value): The ratio of correctly predicted positive observations to the
total predicted positives. It focuses on the quality of positive predictions
True Positives TP
Precision =
True Positives TP + False Postive FP
High precision means fewer false positives. Example: In fraud detection, precision ensures
flagged transactions are fraudulent.
22
Predictive Power - Recall
Recall (Sensitivity or True Positive Rate): The ratio of correctly predicted positive observations to
all actual positives. It focuses on capturing all positive instances
True Positives (TP)
Recall =
True Positives TP +False Negative (FN)
High recall means fewer false negatives. Example: In disease diagnosis, recall ensures all patients
with the disease are detected.
23
Predictive Power – F1 Score
F1-Score: The harmonic mean of Precision and Recall, balancing the two metrics. It is best used
when you need a balance between precision and recall, especially in imbalanced datasets.
Precision × Recall
F1 − Score = 2 ×
Precision + Recall
Range: 0 to 1, where 1 indicates perfect precision and recall.
24
When to use which metrics
• Accuracy: When classes are balanced.
• Precision: When the cost of false positives is high (e.g., spam filtering).
• Recall: When the cost of false negatives is high (e.g., cancer diagnosis).
• F1-Score: When you need a trade-off between precision and recall.
25
Example:
Given a confusion matrix
Calculate Accuracy, Precision, Recall and F1-Score.
26
Overfitting
• Overfitting occurs when a machine learning model learns the training data too well, including its noise and irrelevant details.
As a result, the model performs exceptionally well on the training data but poorly on unseen or test data. This happens
because the model becomes overly complex and fails to generalize to new data.
• Key Characteristics of Overfitting:
• High Training Accuracy, Low Test Accuracy: The model performs perfectly or nearly perfectly on the training set
but fails to predict accurately on the test set.
• Complex Models: Models with too many parameters or excessive complexity are more prone to overfitting.
• Causes of Overfitting:
• Insufficient Training Data: When the dataset is too small, the model may struggle to learn general patterns.
• High Model Complexity: Overly complex models (e.g., deep neural networks with too many layers) can overfit the
training data.
• Noise in Data: If the training data contains irrelevant features or random noise, the model might learn these instead
of the true underlying patterns.
• Insufficient Regularization: Without constraints (like regularization), the model might over-optimize for the
training data.
27
How to prevent Overfitting
• Use More Data: Increasing the size of the training dataset helps the model generalize better.
• Simplify the Model: Reduce the complexity of the model by using fewer parameters or layers.
• Regularization Techniques:
• L1 and L2 Regularization: Penalize large weights to discourage over-complex models.
• Dropout: Randomly drop neurons during training to prevent co-adaptation.
• Cross-Validation: Use techniques like k-fold cross-validation to evaluate the model on multiple subsets of data.
• Early Stopping: Stop training the model when performance on a validation set stops improving.
• Data Augmentation: Increase dataset diversity by applying transformations like rotations, flips, or noise.
• Pruning Features: Remove irrelevant or redundant features from the dataset.
28
Underfitting
• Underfitting occurs when a machine learning model is too simple to capture the underlying patterns in
the data. As a result, the model performs poorly on both the training and test datasets. It fails to learn
the relationships in the data, leading to inaccurate predictions or classifications.
• Key Characteristics of Underfitting:
• Low Training and Test Accuracy: The model does not perform well even on the training data.
• Oversimplified Model: The model is too basic, lacking the complexity needed to represent the
data effectively.
• Poor Learning: The model misses important trends or patterns in the data.
29
How to prevent Underfitting
• Increase Model Complexity:
• Use more advanced models (e.g., switch from linear regression to polynomial regression).
• Add more layers or neurons in neural networks.
• Train for Longer: Ensure the model is adequately trained to converge to an optimal solution.
• Add Relevant Features: Include additional informative features in the dataset.
• Reduce Regularization: Adjust regularization parameters to avoid over-constraining the model.
• Hyperparameter Tuning: Optimize hyperparameters like learning rate, depth, or number of trees in
decision tree-based models.
• Train for Longer: Ensure the model is adequately trained to converge to an optimal solution.
30
Evaluating Classification Methods
Accuracy
classifier accuracy: predicting class label
predictor accuracy: guessing value of predicted attributes
Speed
time to construct the model (training time)
time to use the model (classification/prediction time)
Robustness: handling noise and missing values
Scalability: efficiency in disk-resident databases
Interpretability
understanding and insight provided by the model
Other measures, e.g., goodness of rules, such as decision tree size or compactness of
classification rules
31
Evaluating Regression Methods
32
Evaluating Regression Methods
33
Evaluating Regression Methods
Calculate MSE, MAE, RMSE, R2
Thank You
Pham Thi Viet Huong
0944817152
huongptv@[Link]