Machine Learning (316316) - Course
Orientation
Welcome & Course Overview
Hello everyone! Today I want to take you through an exciting journey into Machine
Learning. This is a sixth-semester course designed to equip you with practical skills and
fundamental knowledge that industries are demanding right now. Over the next semester,
you're going to learn algorithms, techniques, and hands-on implementations that
professionals use in healthcare, finance, e-commerce, and countless other domains.
Let me give you a bird's-eye view first. This course is structured around five major units—
from understanding what machine learning actually is, to preparing data, selecting
features, building supervised models, and finally working with unsupervised learning. But
it's not just theory. You'll be writing code, implementing algorithms, working with real
datasets, and solving practical problems.
Why Machine Learning Matters
Before we dive into what we'll study, let's talk about why this matters. Traditional
programming is about writing explicit instructions—"if this, then that." Machine Learning is
fundamentally different. Instead of telling the computer exactly what to do, we show it
examples and let it learn patterns. This is revolutionary because there are problems we
can't solve with traditional programming, but machine learning can.
For instance, how would you write a program to recognize faces? Or predict stock prices?
Or detect credit card fraud? These are problems where machine learning excels. The course
is designed around this real-world applicability, so everything we learn has practical value
in industry.
Course Structure & Learning Approach
Now let me explain how we'll organize this course. We have 5 units covering different
aspects of machine learning:
Unit I: Introduction to Machine Learning
We start with fundamentals. What is machine learning exactly? How does it differ from
traditional programming? We'll compare the two approaches so you understand the
paradigm shift.
Then we explore three types of machine learning:
Supervised Learning: You have labeled examples (input-output pairs), and you
learn to predict outputs for new inputs. Think of it like learning with a teacher who
provides answers. We'll look at classification (predicting categories) and regression
(predicting continuous values).
Unsupervised Learning: You have data but no labels. Your job is to find hidden
patterns, group similar items, or reduce complexity. Imagine exploring a new city
without a guide—you discover patterns on your own.
Reinforcement Learning: An agent learns by taking actions, receiving rewards or
penalties, and improving its policy over time. This is how AI learns games.
We'll also cover real-world applications of ML in healthcare (disease prediction), finance
(fraud detection), e-commerce (recommendation systems), and challenges you'll face when
implementing these systems.
And importantly, we'll introduce Python and essential libraries: NumPy (numerical
computing), Pandas (data manipulation), Matplotlib (visualization), and Scikit-learn
(machine learning algorithms). These are your tools for the semester.
Unit II: Data Preprocessing
Here's a principle you'll hear repeatedly: garbage in, garbage out. The quality of your
model depends entirely on data quality. In real-world scenarios, raw data is messy. This
unit teaches you how to clean it.
Data cleaning covers:
Identifying noisy data (outliers, errors)
Removing duplicates and inconsistencies
Standardizing formats
Handling outliers intelligently
Handling missing values is crucial. Your dataset might have empty cells. We explore
multiple strategies:
Simply removing records with missing values
Filling them with mean, median, or mode
Using predictive imputation (predicting missing values based on other features)
Using algorithms that can handle missing values naturally
Dataset splitting is essential. You can't train and test on the same data—that's cheating!
You'll learn proper train-test splits, cross-validation techniques (like K-Fold), and how to
ensure your test set truly represents unseen data.
Unit III: Feature Engineering
This unit is about asking: "What information actually matters?" Not all data attributes are
equally important. Feature engineering helps you identify, create, and select the most
valuable features.
Feature scaling matters because algorithms behave differently when features have
different ranges. If one feature goes from 0-100 and another from 0-1,000,000, the algorithm
might overemphasize the larger values. We use normalization (scaling to 0-1) or
standardization (adjusting mean and variance).
Feature selection identifies which features actually contribute to predictions. We explore
three approaches:
Filter methods: Statistical tests like correlation or chi-square
Wrapper methods: Forward or backward selection
Embedded methods: Algorithms like decision trees or Lasso that inherently select
important features
Feature extraction creates new features from existing ones:
PCA (Principal Component Analysis) reduces dimensions by finding directions of
maximum variance
LDA (Linear Discriminant Analysis) finds features that best separate classes
These techniques help when you have too many features (curse of dimensionality)
Unit IV: Supervised Learning
Here's where algorithms come in. You'll implement three classification algorithms:
Decision Trees: Imagine a flowchart where each node asks a question ("Is age > 30?") and
branches lead to predictions. They're intuitive, interpretable, and powerful.
KNN (K-Nearest Neighbors): The idea is simple—to predict for a new point, look at its K
nearest neighbors in your training data and take a vote. Surprisingly effective!
SVM (Support Vector Machines): These find the optimal boundary between classes.
Mathematically sophisticated but incredibly powerful for complex classification problems.
For regression (predicting continuous values):
Linear Regression: Fits a straight line (or hyperplane) through data
Logistic Regression: Despite the name, it's for classification using a sigmoid function
Ridge Regression: Adds regularization to prevent overfitting
Model evaluation is critical. You need metrics:
Confusion Matrix: Shows true positives, false positives, true negatives, false
negatives
Accuracy: Overall correctness
Precision: Of predicted positives, how many are actually positive?
Recall: Of actual positives, how many did we find?
Unit V: Unsupervised Learning
When you have no labels, what do you do? Unsupervised learning finds structure.
Clustering groups similar items:
K-Means: Partition data into K clusters where each point belongs to nearest centroid.
Iterative and fast.
Hierarchical Clustering: Build a tree of clusters showing relationships at different
levels. More interpretable but slower.
Dimensionality Reduction:
When you have hundreds of features but only need the most important ones
PCA finds principal components (directions of variation)
You choose how many components to keep based on explained variance
This reduces computational cost and often improves model performance
Practical Work Throughout
Now let's talk about the hands-on portion. You're not just learning theory—you're
implementing everything.
Mandatory Practicals
There are 15 practical exercises aligned with the units. Some are mandatory (marked with
*):
1. Installation (2 hours): Set up Python, Jupyter Notebook or Google Colab, install
libraries. This is your foundation.
2. Data Preprocessing (2 hours): Write Python programs to handle missing values,
normalize/standardize data, encode categorical variables, split datasets. You'll work
with real messy data.
3. Reading Datasets (2 hours): Load data from multiple formats—CSV, JSON, XML—and
understand structure.
4. Classification (4 hours): Implement Decision Trees, KNN, and evaluate using
accuracy, precision, recall, F1-score. You'll see how different algorithms perform on
the same problem.
5. Regression (4 hours): Build Linear, Logistic, and Ridge Regression models. Evaluate
performance using appropriate metrics.
6. Clustering (2 hours): Implement K-Means, visualize results using Matplotlib and
Seaborn.
7. Feature Importance (2 hours): Identify which features contribute most to model
accuracy.
8. KNN Variations (2 hours): Experiment with different K values, understand the trade-
off between bias and variance.
9. SVM Implementation (2 hours): Train SVM models and understand hyperparameter
tuning.
10. Logistic Regression (2 hours): Build classification models for binary outcomes.
11. PCA (2 hours): Apply dimensionality reduction while retaining important
information.
12-14. Real Dataset Projects (6 hours total): Apply complete ML pipeline on real datasets
like Boston Housing Dataset.
15. Customer Segmentation (4 hours): Use K-Means to segment customers based on
purchasing behavior—a real-world business problem.
Minimum 80% Completion
I expect you to complete at least 80% of these practicals. Some are mandatory, but you'll
choose others based on your interests and learning pace. The faculty will ensure a mix that
develops all required skills.
Assessment & Evaluation
Let me be clear about how you'll be evaluated. This is important information:
Theory Assessment (100 marks)
Formative Assessment (FA): Two class tests during semester, average of 30 marks
each = 30 marks
Summative Assessment (SA): End-semester exam of 70 marks, 3-hour duration
Total Theory: 100 marks, passing minimum is 40 marks
Practical Assessment (75 marks)
Formative Assessment (FA): Continuous assessment of practicals = 25 marks
(minimum 10 to pass)
Summative Assessment (SA): End-semester lab exam = 25 marks (minimum 10 to
pass)
Lab Assignments (SLA): Self-learning assessment based on projects/assignments =
25 marks (minimum 10 to pass)
Total Practical: 75 marks
Combined Assessment
Total course marks = 100 (theory) + 75 (practical) = 175 marks
Important: If you don't secure minimum marks in any practical component, you'll be
detained in this semester. So take practicals seriously—they're not optional.
Self-Learning & Micro Projects
Beyond the scheduled practicals, you have self-learning assignments. These include:
Micro Projects (choose one or get assigned):
1. Waiter's Tip Prediction: Predict tip amounts based on restaurant visit features
2. Stock Inventory Prediction: Forecast stock levels for different stores
3. Stock Price Prediction: Build models on historical market data
4. Employee Attrition: Analyze HR data to predict who might leave
5. Human Scream Detection: Interesting application for crime control
These micro projects should be completed as group activities, and they count toward your
self-learning assessment.
Additionally, we encourage you to register for free online courses from Coursera, Swayam,
Google, Infosys Springboard, or Great Learning Academy to deepen your understanding.
Stanford's CS229 and Google's Machine Learning Crash Course are excellent resources.
Tools & Resources You'll Need
Here's what you should set up:
Essential:
Python (ideally through Anaconda for easy setup)
Jupyter Notebook OR Google Colab (free, cloud-based)
Libraries: NumPy, Pandas, Matplotlib, Scikit-learn
Recommended:
VS Code or PyCharm for code development
Git and GitHub for version control and collaboration
Kaggle account for datasets and computing environment
Computer Requirements: Minimum 8GB RAM. Most modern laptops are sufficient.
Learning Resources
Here are references we'll use:
Books:
"Python Machine Learning by Examples" by Yuxi Liu
"Machine Learning" by Saikat Dutt and team (in Hindi/Marathi available)
"Machine Learning" by Tom M. Mitchell (classic)
"Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien
Géron
Online Resources:
Machine Learning Mastery ([Link])
GeeksforGeeks Machine Learning section
W3Schools Python ML basics
Google's Machine Learning Crash Course
NPTEL Machine Learning videos (in Hindi)
Course Expectations & Timeline
Total Learning Hours: 45 contact hours (classroom) + notional learning hours for lab,
tutorials, and self-study
Semester Structure: 15 weeks
Pace:
Weeks 1-2 : Unit I (Introduction)
Weeks 3-4 : Unit II (Data Preprocessing) + practical setup
Weeks 5-6 : Unit III (Feature Engineering)
Weeks 7-9 : Unit IV (Supervised Learning) - most time intensive
Weeks 10-12 : Unit V (Unsupervised Learning)
Weeks 13-14 : Practical revisions, micro projects, concept reinforcement
Week 15 : Exam preparation
Key Principles to Remember
As we start, keep these principles in mind:
1. Understand the intuition first, then worry about mathematics. ML is intuitive—
focus on understanding what algorithms do before memorizing formulas.
2. Code along. Don't just watch demonstrations. Write code yourself, make mistakes,
debug, learn.
3. Experiment: Change parameters, use different algorithms on same data, visualize
results. This is how you develop intuition.
4. Use real data: Don't just work with toy datasets. Kaggle has thousands of real-world
datasets.
5. Focus on the workflow: Data → Preprocessing → Feature Engineering → Model
Building → Evaluation → Iteration. This workflow matters more than memorizing
specific algorithms.
6. Practice practicals seriously: Practicals account for 75 marks. They're not
afterthought—they're central to the course.
Questions Before We Begin?
I want to make sure we're aligned on expectations. This course requires effort—you'll code,
debug, learn from failures, and build confidence gradually. But the skills you develop are
exactly what industry demands.
The support is there—I'm available for consultations, practical guidance, and project
mentoring. My door is open. We also have the online resources and Kaggle community if
you get stuck.
Questions about course structure, assessment, expectations, or anything else?
Next Steps
Here's what happens next:
1. Today: Course orientation (this session)
2. Next class: We start Unit I with Python fundamentals
3. Before next week: Set up your environment (Anaconda, Jupyter, libraries)
4. By end of week 1: First practical—installation and basic Python
Come prepared to code. Bring your laptops. We'll be hands-on from day one.
I'm excited about this semester. Machine Learning is transforming industries, and I want
you to be equipped with skills that matter. Let's make this a practical, engaging, and skill-
building journey together.
Welcome to Machine Learning! Let's dive in.