Machine Learning
Faculty: Dr Ashwini B
Classroom code: a5okuqw7
Disclaimer: The slides have been prepared/adapted partially from multiple resources (textbooks, presentations, notes etc which are publicly available.
Course Management
● Classroom link: [Link]
● Prerequisites:
○ Linear Algebra
○ Probability & Statistics
○ Advanced Calculus (Vector Differentiation)
○ Programming (Python)
● Textbook references:
○ Machine Learning, Tom Mitchell, McGraw Hill, 1997.
○ Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer, 2006
○ Pattern Classification. Duda, Hart and Stork. 2nd ed., Wiley, 2006
● Course outline
● All communication will be through google classroom. Strict NO to Whatsapp calls or
messages.
● Office hour:
○ Room: CSE 226/M
○ Wednesday : 10.30 am - 11.30 am (Please fix appointment through email before coming)
Course Management
● Assessment Criteria
1. Quiz-1: 10%
2. Quiz-2: 10%
3. Lab: 10%
4. Project/case study/assignment: 10%
5. Mid-Semester Examination: 20%
6. End-Semester Examination: 40%
● Tentative Grading Scheme (Grade buckets are subjected to change as per department policy)
Grade A+ A A- B+ B B- C+ C F
Score >90% 86-90% 81-85% 71-80% 61-70% 51-60% 40-50% 41-30% <30
Rank% 3% 7% 15% 25% 25% 15% 7% 3% NA
Course Management
● Sample Grading Scheme for 110 students (Grade buckets are subjected to change as per department
policy)
Grade A+ A A- B+ B B- C+ C F
Score >90% 86-90% 81-85% 71-80% 61-70% 51-60% 40-50% 41-30% <30
Rank 3 8 17 28 27 16 8 3 NA
Student’s Student’s Student’s
score rank grade
87 9 A
87 15 A-
Code of Conduct
● Zero tolerance will be enforced for any form of academic misconduct.
● Use of mobile phones is strictly prohibited during class hours.
● Academic integrity must be maintained at all times. All submissions will be checked for
plagiarism. Collaboration is good!!! Cheating is bad!!!
○ Discussions among students should be limited to understanding concepts only.
○ Copying code or content from peers, large language models (LLMs), or online resources is
considered plagiarism.
○ Any instance of plagiarism, if detected, will be penalized without further warning or discussion, a
zero will be awarded for the submission.
● Students are expected to arrive on time for all classes and laboratories.
○ Attendance will not be marked if a student arrives more than 5 minutes after the commencement
of the class/lab.
● The assignment submission deadline is 11:59 PM on the due date.
○ A late penalty of 10% per day will be applied for delayed submissions.
○ Submissions later than 3 days after the deadline will not be accepted.
Learning
● How do we learn?
Learning
● Compute 125 +378.
● Sorting one million numbers in increasing order.
○ Merge sort
Machine Learning
● Determining whether an email is spam or not.
○ Generating rules becomes tedious
Machine Learning Definition
● Arthur Samuel (1959):
○ Field of study that gives computers the ability to learn without being explicitly
programmed.
● Tom Mitchell
○ A computer programme is said to learn from experience E with respect to some class
of tasks T and performance measure P if its performance at tasks in T, as measured by
P, improves with experience
Inductive Learning
Machine Learning Definition
● The task could be
○ Classification or Pattern Recognition
○ Regression or Prediction
○ Clustering
○ Synthesis or Sampling
○ Ranking
○ Recommendation Systems
○ Anomaly Detection
○ Data Mining etc.
● Performance: A quantitative measure to evaluate performance
○ Accuracy, error measure
● Experience
○ Supervised learning
○ Unsupervised learning
○ Reinforcement learning
Machine Learning Paradigms
● Supervised learning
○ Learns association between input and output
○ You need data with input and output to learn
the mapping
○ Categorical output: Classification (response is
qualitative)
○ Continuous output: Regression (response in
quantitative)
● Unsupervised learning
○ Discovers pattern in the data, no desired
output
○ No need for output, only input
○ Clustering: cohesive grouping
○ Association: frequent co-occurrence
● Reinforcement learning
○ No examples
○ Reward functions
○ Learning control
Performance Measures
ML Task Measure
Classification Error
Regression Error
Clustering Scatter/purity
Association Support/confidence
Reinforcement learning Cost/reward
Basic ML pipeline
Data
● Data refers to the set of observations or measurements to train a machine learning models.
● Gathering data is the most important step
● Your classifier can only be as good as the dataset it is built from.
● Example
○ Spam Email Detection
■ Email text, Subject line, Sender address, Frequency of certain words (e.g., “free”, “win”), Label: spam / not spam
○ Student Performance Prediction
■ Attendance percentage, Assignment scores, Midterm exam marks, Study hours, Label: pass / fail
○ House Price Prediction
■ Area (in square feet), Number of bedrooms, Location, Age of the house, Past sale prices (target value)
Data Preprocessing
● Data preprocessing is the task of cleaning and transforming raw data to make it suitable for analysis and
modeling.
● Preprocessing steps include
○ data cleaning,
○ data normalization, and
○ data transformation
● The goal of data preprocessing is to improve both the accuracy and efficiency of downstream analysis and
modeling.
● Data preprocessing techniques can be grouped into three main categories:
○ data cleaning,
○ data transformation, and
○ structural operations.
Data Preprocessing
● Data Cleaning: the process of addressing anomalies in the data set using techniques such as:
■ Managing outliers: Identifying and then removing outliers, or replacing them with statistically
estimated values
■ Filling missing data: Identifying missing or invalid data points and replacing them with interpolated
values
■ Smoothing: Filtering out noise using techniques such as moving mean, linear regression, and more
specialized filtering methods
● Data Transformation: the process of modifying a data set into a preferred format by using operations such as:
■ Normalization and rescaling: Standardizing data sets with different scales into a uniform scale
● Structural Operations:
○ Data integration: combining the data into a unified dataset used for combining, reorganizing, and
categorizing data sets
○ Joining: Combining two tables or time tables by rows using a common key variable
○ Stacking and unstacking: Reshaping multidimensional arrays to consolidate or redistribute data within the
table, making it easier for analysis
○ Grouping and binning: Reorganizing the data set to extract valuable insights
Feature Engineering
● Iterative process of turning raw data into features to be used by machine learning
● It involves
○ Feature extraction:
■ Features can be thought of as the attributes of a data object.
■ A "feature" refers to an individual measurable property or characteristic of a data point: a specific attribute of the
data that helps describe the phenomenon being observed.
■ For example,
● A dataset about housing might have features such as “number of bedrooms” and “year of construction.”
● In a dataset of animals, you would expect some numerical features (age, height, weight) and categorical
features (color, species, breed). turns raw data into information suitable for machine learning algorithms
■ First, the model takes in input data, then the feature extractor transforms the data into a numerical
representation that can be used to compute the dimensionality reduction methods for feature extraction.
■ This step can be manual, leveraging domain knowledge for specific data types like images, signals, and text, or
automated through algorithms or deep learning networks.
○ Feature selection: is a dimensionality reduction technique that selects a subset of features (predictor variables)
providing the best predictive power for modeling.
■ concerned with choosing the features to use for the model
○ Feature transformation: changes existing features into new features (predictor variables) while dropping less descriptive
ones.
■ After extraction, it is sometimes necessary to standardize the data using feature normalization, especially when
using certain algorithms that are sensitive to the magnitude and scale of variables
Model Selection and Training
● The model development process involves:
○ Model selection : Model selection is the process of choosing the type of model that is most likely to deliver top
performance in the intended use case.
○ Hyperparameter tuning: Model hyperparameters are external variables that control the model’s behavior during
training.
○ Model training: Model training is the process of optimizing a model’s performance with training datasets that are
similar to the input data the model processes once deployed.
○ Model evaluation: After the model is deemed to be trained its performance is evaluated before deployment.
● Divide the dataset into training, validation and testing sets
● Choose the best algorithm based on the problem includes classification, regression, Clustering etc.
● Train the model using the training dataset.
● Validation: estimates the model’s prediction error: how good is it at making the correct predictions? During training, the
machine learning algorithm often outputs multiple models with various hyperparameter configurations.
○ Validation identifies the model with the optimal hyperparameter configuration.
● Testing: simulates real-world values to evaluate the best-performing model’s generalization error: how well does the model
adapt to new unseen data?
○ Test data is independent of training data and benchmarks the model’s performance after training is complete.
○ Tests reveal whether the model will perform as intended after it is deployed.
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.
Activities (Classes): Walking, Sitting, Standing, Running
Step 1: Data Gathering
● Data Source: Accelerometer sensor in a smartphone
● Measures acceleration along three axes:- x: left–right; y: forward–backward; z: up–down
Time AccX AccY AccZ Activity
t1 0.12 0.98 9.60 Walking
t2 0.15 1.05 9.55 Walking
t3 0.01 0.02 9.80 Standing
t4 1.20 2.10 10.5 Running
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.
Activities (Classes): Walking, Sitting, Standing, Running
Step 2: Data Preprocessing
● Remove sensor noise using smoothing (e.g., moving average)
● Handle missing or corrupted readings
● Normalize accelerometer values
● Segment continuous data into fixed-size time windows (e.g., 2 seconds)
After Windowing: Each data point becomes a window instead of a single reading.
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.
Activities (Classes): Walking, Sitting, Standing, Running
Step 3: Feature Engineering
● Features Extracted from Each Window
● Mean acceleration (x, y, z)
● Standard deviation (x, y, z)
● Signal magnitude area (SMA)
● Energy of the signal
MeanX MeanY MeanZ StdX StdY StdZ SMA Activity
0.14 1.01 9.58 0.02 0.04 0.05 11.2 Walking
0.02 0.03 9.81 0.01 0.01 0.02 9.9 Standing
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.
Activities (Classes): Walking, Sitting, Standing, Running
Step 4: Model Training
● Model Chosen
○ Random Forest / SVM / k-NN (commonly used for HAR)
● Training
○ Split data:
■ 70% Training
■ 15% Validation
■ 15% Testing
● The model learns patterns linking motion features to activities.
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.
Activities (Classes): Walking, Sitting, Standing, Running
Step 5: Validation
● Purpose:
○ Tune hyperparameters (e.g., number of trees, kernel type)
○ Detect overfitting
● Validation Metrics
○ Accuracy
○ Confusion matrix
○ Precision and recall per activity
● Example:
○ Validation Accuracy: 88%
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.
Activities (Classes): Walking, Sitting, Standing, Running
Step 6: Testing
● Final Evaluation: Test the trained model on unseen accelerometer windows.
Challenges
● How to choose a model?
● How good is a model?
● How much data is required?
● Is the data of sufficient quality?
● How confident can I be on the results?
● Have I described/interpreted the data correctly?