0% found this document useful (0 votes)

14 views26 pages

Introduction To ML

The document outlines a Machine Learning course taught by Dr. Ashwini B, detailing prerequisites, assessment criteria, and a grading scheme. It covers fundamental concepts of machine learning, including definitions, paradigms, data preprocessing, feature engineering, model selection, and training processes. Additionally, it emphasizes academic integrity, communication protocols, and the importance of data quality and quantity in machine learning projects.

Uploaded by

anujkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views26 pages

Introduction To ML

Uploaded by

anujkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning

Faculty: Dr Ashwini B
Classroom code: a5okuqw7

Disclaimer: The slides have been prepared/adapted partially from multiple resources (textbooks, presentations, notes etc which are publicly available.
Course Management
● Classroom link: [Link]
● Prerequisites:
○ Linear Algebra
○ Probability & Statistics
○ Advanced Calculus (Vector Differentiation)
○ Programming (Python)
● Textbook references:
○ Machine Learning, Tom Mitchell, McGraw Hill, 1997.
○ Pattern Recognition and Machine Learning, Christopher M. Bishop, Springer, 2006
○ Pattern Classification. Duda, Hart and Stork. 2nd ed., Wiley, 2006
● Course outline
● All communication will be through google classroom. Strict NO to Whatsapp calls or
messages.
● Office hour:
○ Room: CSE 226/M
○ Wednesday : 10.30 am - 11.30 am (Please fix appointment through email before coming)
Course Management
● Assessment Criteria
1. Quiz-1: 10%
2. Quiz-2: 10%
3. Lab: 10%
4. Project/case study/assignment: 10%
5. Mid-Semester Examination: 20%
6. End-Semester Examination: 40%
● Tentative Grading Scheme (Grade buckets are subjected to change as per department policy)

Grade A+ A A- B+ B B- C+ C F
Score >90% 86-90% 81-85% 71-80% 61-70% 51-60% 40-50% 41-30% <30

Rank% 3% 7% 15% 25% 25% 15% 7% 3% NA

Course Management
● Sample Grading Scheme for 110 students (Grade buckets are subjected to change as per department
policy)

Grade A+ A A- B+ B B- C+ C F
Score >90% 86-90% 81-85% 71-80% 61-70% 51-60% 40-50% 41-30% <30

Rank 3 8 17 28 27 16 8 3 NA

Student’s Student’s Student’s

score rank grade

87 9 A

87 15 A-
Code of Conduct
● Zero tolerance will be enforced for any form of academic misconduct.
● Use of mobile phones is strictly prohibited during class hours.
● Academic integrity must be maintained at all times. All submissions will be checked for
plagiarism. Collaboration is good!!! Cheating is bad!!!
○ Discussions among students should be limited to understanding concepts only.
○ Copying code or content from peers, large language models (LLMs), or online resources is
considered plagiarism.
○ Any instance of plagiarism, if detected, will be penalized without further warning or discussion, a
zero will be awarded for the submission.
● Students are expected to arrive on time for all classes and laboratories.
○ Attendance will not be marked if a student arrives more than 5 minutes after the commencement
of the class/lab.
● The assignment submission deadline is 11:59 PM on the due date.
○ A late penalty of 10% per day will be applied for delayed submissions.
○ Submissions later than 3 days after the deadline will not be accepted.
Learning
● How do we learn?
Learning
● Compute 125 +378.
● Sorting one million numbers in increasing order.
○ Merge sort
Machine Learning
● Determining whether an email is spam or not.
○ Generating rules becomes tedious
Machine Learning Definition
● Arthur Samuel (1959):
○ Field of study that gives computers the ability to learn without being explicitly
programmed.
● Tom Mitchell
○ A computer programme is said to learn from experience E with respect to some class
of tasks T and performance measure P if its performance at tasks in T, as measured by
P, improves with experience

Inductive Learning
Machine Learning Definition
● The task could be
○ Classiﬁcation or Pattern Recognition
○ Regression or Prediction
○ Clustering
○ Synthesis or Sampling
○ Ranking
○ Recommendation Systems
○ Anomaly Detection
○ Data Mining etc.
● Performance: A quantitative measure to evaluate performance
○ Accuracy, error measure
● Experience
○ Supervised learning
○ Unsupervised learning
○ Reinforcement learning
Machine Learning Paradigms
● Supervised learning
○ Learns association between input and output
○ You need data with input and output to learn
the mapping
○ Categorical output: Classiﬁcation (response is
qualitative)
○ Continuous output: Regression (response in
quantitative)
● Unsupervised learning
○ Discovers pattern in the data, no desired
output
○ No need for output, only input
○ Clustering: cohesive grouping
○ Association: frequent co-occurrence
● Reinforcement learning
○ No examples
○ Reward functions
○ Learning control
Performance Measures

ML Task Measure

Classiﬁcation Error

Regression Error

Clustering Scatter/purity

Association Support/conﬁdence

Reinforcement learning Cost/reward

Basic ML pipeline
Data

● Data refers to the set of observations or measurements to train a machine learning models.
● Gathering data is the most important step
● Your classiﬁer can only be as good as the dataset it is built from.
● Example
○ Spam Email Detection
■ Email text, Subject line, Sender address, Frequency of certain words (e.g., “free”, “win”), Label: spam / not spam
○ Student Performance Prediction
■ Attendance percentage, Assignment scores, Midterm exam marks, Study hours, Label: pass / fail
○ House Price Prediction
■ Area (in square feet), Number of bedrooms, Location, Age of the house, Past sale prices (target value)
Data Preprocessing

● Data preprocessing is the task of cleaning and transforming raw data to make it suitable for analysis and
modeling.
● Preprocessing steps include
○ data cleaning,
○ data normalization, and
○ data transformation
● The goal of data preprocessing is to improve both the accuracy and efficiency of downstream analysis and
modeling.
● Data preprocessing techniques can be grouped into three main categories:
○ data cleaning,
○ data transformation, and
○ structural operations.
Data Preprocessing
● Data Cleaning: the process of addressing anomalies in the data set using techniques such as:
■ Managing outliers: Identifying and then removing outliers, or replacing them with statistically
estimated values
■ Filling missing data: Identifying missing or invalid data points and replacing them with interpolated
values
■ Smoothing: Filtering out noise using techniques such as moving mean, linear regression, and more
specialized filtering methods
● Data Transformation: the process of modifying a data set into a preferred format by using operations such as:
■ Normalization and rescaling: Standardizing data sets with different scales into a uniform scale
● Structural Operations:
○ Data integration: combining the data into a unified dataset used for combining, reorganizing, and
categorizing data sets
○ Joining: Combining two tables or time tables by rows using a common key variable
○ Stacking and unstacking: Reshaping multidimensional arrays to consolidate or redistribute data within the
table, making it easier for analysis
○ Grouping and binning: Reorganizing the data set to extract valuable insights
Feature Engineering
● Iterative process of turning raw data into features to be used by machine learning
● It involves
○ Feature extraction:
■ Features can be thought of as the attributes of a data object.
■ A "feature" refers to an individual measurable property or characteristic of a data point: a specific attribute of the
data that helps describe the phenomenon being observed.
■ For example,
● A dataset about housing might have features such as “number of bedrooms” and “year of construction.”
● In a dataset of animals, you would expect some numerical features (age, height, weight) and categorical
features (color, species, breed). turns raw data into information suitable for machine learning algorithms
■ First, the model takes in input data, then the feature extractor transforms the data into a numerical
representation that can be used to compute the dimensionality reduction methods for feature extraction.
■ This step can be manual, leveraging domain knowledge for specific data types like images, signals, and text, or
automated through algorithms or deep learning networks.
○ Feature selection: is a dimensionality reduction technique that selects a subset of features (predictor variables)
providing the best predictive power for modeling.
■ concerned with choosing the features to use for the model
○ Feature transformation: changes existing features into new features (predictor variables) while dropping less descriptive
ones.
■ After extraction, it is sometimes necessary to standardize the data using feature normalization, especially when
using certain algorithms that are sensitive to the magnitude and scale of variables
Model Selection and Training
● The model development process involves:
○ Model selection : Model selection is the process of choosing the type of model that is most likely to deliver top
performance in the intended use case.
○ Hyperparameter tuning: Model hyperparameters are external variables that control the model’s behavior during
training.
○ Model training: Model training is the process of optimizing a model’s performance with training datasets that are
similar to the input data the model processes once deployed.
○ Model evaluation: After the model is deemed to be trained its performance is evaluated before deployment.
● Divide the dataset into training, validation and testing sets
● Choose the best algorithm based on the problem includes classification, regression, Clustering etc.
● Train the model using the training dataset.
● Validation: estimates the model’s prediction error: how good is it at making the correct predictions? During training, the
machine learning algorithm often outputs multiple models with various hyperparameter configurations.
○ Validation identifies the model with the optimal hyperparameter configuration.
● Testing: simulates real-world values to evaluate the best-performing model’s generalization error: how well does the model
adapt to new unseen data?
○ Test data is independent of training data and benchmarks the model’s performance after training is complete.
○ Tests reveal whether the model will perform as intended after it is deployed.
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.

Activities (Classes): Walking, Sitting, Standing, Running

Step 1: Data Gathering

● Data Source: Accelerometer sensor in a smartphone

● Measures acceleration along three axes:- x: left–right; y: forward–backward; z: up–down

Time AccX AccY AccZ Activity

t1 0.12 0.98 9.60 Walking

t2 0.15 1.05 9.55 Walking

t3 0.01 0.02 9.80 Standing

t4 1.20 2.10 10.5 Running

ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.

Activities (Classes): Walking, Sitting, Standing, Running

Step 2: Data Preprocessing

● Remove sensor noise using smoothing (e.g., moving average)

● Handle missing or corrupted readings
● Normalize accelerometer values
● Segment continuous data into ﬁxed-size time windows (e.g., 2 seconds)

After Windowing: Each data point becomes a window instead of a single reading.
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.

Activities (Classes): Walking, Sitting, Standing, Running

Step 3: Feature Engineering

● Features Extracted from Each Window

● Mean acceleration (x, y, z)
● Standard deviation (x, y, z)
● Signal magnitude area (SMA)
● Energy of the signal

MeanX MeanY MeanZ StdX StdY StdZ SMA Activity

0.14 1.01 9.58 0.02 0.04 0.05 11.2 Walking

0.02 0.03 9.81 0.01 0.01 0.02 9.9 Standing

ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.

Activities (Classes): Walking, Sitting, Standing, Running

Step 4: Model Training

● Model Chosen
○ Random Forest / SVM / k-NN (commonly used for HAR)
● Training
○ Split data:
■ 70% Training
■ 15% Validation
■ 15% Testing
● The model learns patterns linking motion features to activities.
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.

Activities (Classes): Walking, Sitting, Standing, Running

Step 5: Validation

● Purpose:
○ Tune hyperparameters (e.g., number of trees, kernel type)
○ Detect overﬁtting
● Validation Metrics
○ Accuracy
○ Confusion matrix
○ Precision and recall per activity
● Example:
○ Validation Accuracy: 88%
ML Pipeline: Example
Problem: Given accelerometer readings from a mobile phone, predict the activity being performed by a person.

Activities (Classes): Walking, Sitting, Standing, Running

Step 6: Testing

● Final Evaluation: Test the trained model on unseen accelerometer windows.

Challenges

● How to choose a model?

● How good is a model?
● How much data is required?
● Is the data of sufﬁcient quality?
● How conﬁdent can I be on the results?
● Have I described/interpreted the data correctly?

Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
14 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
Machine Learning Basics and Applications
No ratings yet
Machine Learning Basics and Applications
9 pages
Logistic Regression Applications Explained
No ratings yet
Logistic Regression Applications Explained
59 pages
Unit-1 Machine Learning Techniques
No ratings yet
Unit-1 Machine Learning Techniques
10 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
7 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
42 pages
ML Study Guide
No ratings yet
ML Study Guide
21 pages
Understanding Machine Learning Phases
No ratings yet
Understanding Machine Learning Phases
96 pages
Introduction To Machine Learning - Notes
No ratings yet
Introduction To Machine Learning - Notes
3 pages
Machine Learning Basics and Techniques
No ratings yet
Machine Learning Basics and Techniques
9 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
32 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
68 pages
ML Algorithms Study Notes
No ratings yet
ML Algorithms Study Notes
14 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
71 pages
Advanced Machine Learning Concepts
No ratings yet
Advanced Machine Learning Concepts
6 pages
ML Notes
No ratings yet
ML Notes
17 pages
Rashtrakavi Ramdhari Singh Dinkar College of Engineering
No ratings yet
Rashtrakavi Ramdhari Singh Dinkar College of Engineering
21 pages
Applied AI - Unit 2
No ratings yet
Applied AI - Unit 2
33 pages
Topic 2 - Overview of Machine Learning
No ratings yet
Topic 2 - Overview of Machine Learning
34 pages
Rashtrakavi Ramdhari Singh Dinkar College of Engineering
No ratings yet
Rashtrakavi Ramdhari Singh Dinkar College of Engineering
21 pages
ML Notes
No ratings yet
ML Notes
49 pages
AI Unit1 Unit2 Complete Notes
No ratings yet
AI Unit1 Unit2 Complete Notes
46 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
40 pages
Foundations of Machine Learning Course
No ratings yet
Foundations of Machine Learning Course
44 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
22 pages
Introduction to Machine Learning Basics
No ratings yet
Introduction to Machine Learning Basics
74 pages
Machine Learning Basics and Applications
No ratings yet
Machine Learning Basics and Applications
22 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
6 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
29 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
24 pages
Modul 6-Final
No ratings yet
Modul 6-Final
24 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
138 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
15 pages
Arthur Samuel and Machine Learning Basics
No ratings yet
Arthur Samuel and Machine Learning Basics
30 pages
Machine Learning Basics and Applications
No ratings yet
Machine Learning Basics and Applications
20 pages
Chap 1 ML
No ratings yet
Chap 1 ML
17 pages
Overview of Machine Learning Concepts
No ratings yet
Overview of Machine Learning Concepts
8 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
161 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
62 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
12 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
24 pages
MLE Complete Study Guide
No ratings yet
MLE Complete Study Guide
20 pages
Module 05
No ratings yet
Module 05
19 pages
Machine Learning Overview by Kashif Hanif
No ratings yet
Machine Learning Overview by Kashif Hanif
80 pages
Chapter - 01 - Introduction To ML
No ratings yet
Chapter - 01 - Introduction To ML
60 pages
Machine Learning Overview and Techniques
No ratings yet
Machine Learning Overview and Techniques
16 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
6 pages
Machine Learning Challenges and Solutions
No ratings yet
Machine Learning Challenges and Solutions
48 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
17 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
54 pages
Machine Learning Introduction 15032022 090126pm
No ratings yet
Machine Learning Introduction 15032022 090126pm
75 pages
Machine Learning Applications and Techniques
No ratings yet
Machine Learning Applications and Techniques
53 pages
Machine Learning Overview and Notes
No ratings yet
Machine Learning Overview and Notes
19 pages
Machine Learning Simplified
No ratings yet
Machine Learning Simplified
24 pages
DL Unit-1
No ratings yet
DL Unit-1
20 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
20 pages
Assignmet 2
No ratings yet
Assignmet 2
1 page
Networking Overview
No ratings yet
Networking Overview
24 pages
Assignment 3
No ratings yet
Assignment 3
5 pages
Assignment 4
No ratings yet
Assignment 4
1 page
Assignment3 Solution
No ratings yet
Assignment3 Solution
3 pages
Elementary Education Professional Profile
No ratings yet
Elementary Education Professional Profile
1 page
Digital Textile Printing Techniques Explained
No ratings yet
Digital Textile Printing Techniques Explained
10 pages
Mrs. Johnstone's Dance and Life Story
No ratings yet
Mrs. Johnstone's Dance and Life Story
4 pages
Blended Families: Challenges & Solutions
No ratings yet
Blended Families: Challenges & Solutions
7 pages
FSD Programming Question Bank
No ratings yet
FSD Programming Question Bank
3 pages
Music Instruments Summative Test
No ratings yet
Music Instruments Summative Test
2 pages
JEE Advanced Physics MCQs and Solutions
No ratings yet
JEE Advanced Physics MCQs and Solutions
3 pages
Layer 2 vs Layer 3 Switch Overview
No ratings yet
Layer 2 vs Layer 3 Switch Overview
5 pages
Potato Genotypes Study in Chhattisgarh
No ratings yet
Potato Genotypes Study in Chhattisgarh
5 pages
Understanding India's Constitutional Government
No ratings yet
Understanding India's Constitutional Government
852 pages
Scatter Graphs and Line of Best Fit Worksheet
No ratings yet
Scatter Graphs and Line of Best Fit Worksheet
2 pages
AMUL: India's Dairy Cooperative Model
No ratings yet
AMUL: India's Dairy Cooperative Model
8 pages
History and Impact of Child Labour
No ratings yet
History and Impact of Child Labour
15 pages
Honor Guard of the Sacred Heart Explained
No ratings yet
Honor Guard of the Sacred Heart Explained
2 pages
Picnic Comprehension for Young Learners
No ratings yet
Picnic Comprehension for Young Learners
2 pages
Local Anaesthetic Toxicity Management Guide
No ratings yet
Local Anaesthetic Toxicity Management Guide
1 page
Overview of Monaco's History and Culture
No ratings yet
Overview of Monaco's History and Culture
74 pages
Beginner Barbell Workout Guide
No ratings yet
Beginner Barbell Workout Guide
1 page
PW Summer Camp Data
No ratings yet
PW Summer Camp Data
15 pages
Introduction to Vehicular IoT Systems
No ratings yet
Introduction to Vehicular IoT Systems
15 pages
Who PCPNC PDF
No ratings yet
Who PCPNC PDF
179 pages
OPTHA Imp Notes
No ratings yet
OPTHA Imp Notes
6 pages
Masi's Temptation: A Theater Encounter
No ratings yet
Masi's Temptation: A Theater Encounter
10 pages
Emotional Intelligence in Entrepreneurs
No ratings yet
Emotional Intelligence in Entrepreneurs
6 pages
KISS Portfolio Insights from 42 Macro
No ratings yet
KISS Portfolio Insights from 42 Macro
149 pages
Future of Additive Manufacturing Skills
No ratings yet
Future of Additive Manufacturing Skills
8 pages
Prince & Princess of Hearts 2025 Program
No ratings yet
Prince & Princess of Hearts 2025 Program
2 pages
Geotechnical Innovations Conference 2025
No ratings yet
Geotechnical Innovations Conference 2025
4 pages
Nuclear Graphite: Dimensional Changes & Creep
No ratings yet
Nuclear Graphite: Dimensional Changes & Creep
29 pages
Two-Port Network Parameter Analysis
No ratings yet
Two-Port Network Parameter Analysis
28 pages

Introduction To ML

Uploaded by

Introduction To ML

Uploaded by

Machine Learning

Rank% 3% 7% 15% 25% 25% 15% 7% 3% NA

Student’s Student’s Student’s

Reinforcement learning Cost/reward

Activities (Classes): Walking, Sitting, Standing, Running

Step 1: Data Gathering

● Data Source: Accelerometer sensor in a smartphone

Time AccX AccY AccZ Activity

t1 0.12 0.98 9.60 Walking

t2 0.15 1.05 9.55 Walking

t3 0.01 0.02 9.80 Standing

t4 1.20 2.10 10.5 Running

Activities (Classes): Walking, Sitting, Standing, Running

Step 2: Data Preprocessing

● Remove sensor noise using smoothing (e.g., moving average)

Activities (Classes): Walking, Sitting, Standing, Running

Step 3: Feature Engineering

● Features Extracted from Each Window

MeanX MeanY MeanZ StdX StdY StdZ SMA Activity

0.14 1.01 9.58 0.02 0.04 0.05 11.2 Walking

0.02 0.03 9.81 0.01 0.01 0.02 9.9 Standing

Activities (Classes): Walking, Sitting, Standing, Running

Step 4: Model Training

Activities (Classes): Walking, Sitting, Standing, Running

Activities (Classes): Walking, Sitting, Standing, Running

● Final Evaluation: Test the trained model on unseen accelerometer windows.

● How to choose a model?

You might also like