Machine Learning notes
Machine Learning (ML) is a branch of Artificial Intelligence (AI) that enables systems to
learn from data, improve their performance over time without being explicitly programmed.
Instead of writing code with specific instructions, in ML we feed data to algorithms which
then discover patterns and make decisions or predictions.
Machine Learning is a field of study that gives the computers the ability to learn without
being explicitly programmed.
ML is a subset of AI that focuses on building systems that can learn and improve from
experience.
ML algorithms use data to train models to recognize patterns and make predictions or
decisions.
Difference Between a Program and a
Machine Learning Model
Here is a clear comparison between a traditional program and a machine learning model:
Aspect Traditional Program Machine Learning Model
Definition A set of rules written by a A system that learns patterns
programmer from data
Logic Source Human-defined logic and Automatically learned from
rules data
Input Data + Rules Data (training data)
Output Result based on fixed logic Prediction or decision based
on learned patterns
Learning Ability No learning; behavior is fixed Learns and improves with
more data
Example A calculator app coded to A spam filter trained on
add/subtract emails
Flexibility Rigid; must reprogram to Flexible; retrain to adapt to
change behavior new data
Error Handling Errors must be handled by Can tolerate noise and
the developer uncertainty in data
Example
Traditional Program Example:
def is_even(number):
if number % 2 == 0:
return True
else:
return False
Machine Learning Model:
Given a dataset of numbers labeled "even" or "odd", the model learns the
pattern and then predicts whether a new number is even or odd — without
explicitly being programmed how.
In a program, the logic is coded by a human.
In machine learning, the logic is learned by the machine from data.
# Traditional program to check if a number is even
def is_even(number):
if number % 2 == 0:
return True
else:
return False
# Example usage
print(is_even(4)) # Output: True
print(is_even(7)) # Output: False
# Machine Learning model to classify even or odd numbers
from sklearn.linear_model import LogisticRegression
import numpy as np
# Training data: numbers and their labels (0 = even, 1 = odd)
X = [Link]([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
y = [Link]([0, 1, 0, 1, 0, 1, 0, 1, 0, 1]) # 0=even, 1=odd
# Train logistic regression model
model = LogisticRegression()
[Link](X, y)
# Predict whether a number is even or odd
def predict_even_or_odd(n):
prediction = [Link]([[n]])[0]
return "Even" if prediction == 0 else "Odd"
# Example usage
print(predict_even_or_odd(4)) # Output: Even
print(predict_even_or_odd(7)) # Output: Odd
Types of Machine Learning
Machine Learning is broadly categorized into three main types (plus one emerging type):
1. Supervised Learning
Data: Labeled (each input has a correct output)
Goal: Learn a function that maps inputs to outputs
Examples:
o Email spam detection (spam/not spam)
o Predicting house prices
Algorithms:
o Linear Regression
o Logistic Regression
o Decision Trees
o Support Vector Machines (SVM)
o K-Nearest Neighbors (KNN)
2. Unsupervised Learning
Data: Unlabeled (no output provided)
Goal: Discover hidden patterns or groupings
Examples:
o Customer segmentation
o Market basket analysis
Algorithms:
o K-Means Clustering
o Hierarchical Clustering
o Principal Component Analysis (PCA)
o Association Rules
3. Semi-Supervised Learning
Data: Mix of labeled and unlabeled data
Goal: Use a small amount of labeled data to guide learning on larger unlabeled data
Use Case: Medical imaging (few labeled scans, many unlabeled)
4. Reinforcement Learning
Goal: Train an agent to make sequences of decisions by interacting with an
environment
Based on: Reward and punishment
Examples:
o Game playing (Chess, Go)
o Robotics
o Self-driving cars
Algorithms:
o Q-Learning
o Deep Q Networks (DQN)
o Policy Gradient Methods
Types of Learning
1. Supervised Learning
The model learns from labeled data (input + correct output).
Type Description Example
Classification Predict a category/class Spam vs. Not Spam
Regression Predict a continuous value Predict house prices
Sequence Labeling Label each item in a sequence POS tagging, Named Entity Recognition
Ranking Predict relative order of items Search engine results ranking
2. Unsupervised Learning
The model finds patterns in unlabeled data.
Type Description Example
Clustering Group similar items Customer segmentation
Dimensionality Reduction Reduce number of features PCA for visualization
Anomaly Detection Detect rare/unusual data Fraud detection
Association Rule Learning Discover rules between items Market basket analysis
Generative Models Learn to generate new data GANs, Variational Autoencoders
3. Semi-Supervised Learning
The model is trained on a small amount of labeled data + a large amount of unlabeled
data.
🔹 Key Applications:
Speech recognition
Text classification
Image recognition with limited labeled data
Supervised learning is a type of machine learning where a model is trained on
a labeled dataset. In this approach, each training example is a pair consisting of
an input and a desired output (label). The model learns to map inputs to outputs,
and its goal is to generalize this mapping to new, unseen data.
Key Characteristics:
Labeled Data: Training data includes input-output pairs.
Goal: Predict the output for new inputs based on learned patterns.
Applications: Spam detection, sentiment analysis, fraud detection, image
classification, etc.
🔸 Types of Supervised Learning
Supervised learning is mainly divided into two types:
1. Classification
Objective: Predict a discrete label or category.
Output: Categorical (e.g., yes/no, spam/ham, disease present/absent).
Examples:
o Email spam detection (spam or not spam)
o Image recognition (cat, dog, car, etc.)
o Sentiment analysis (positive, negative, neutral)
Common algorithms:
Logistic Regression
Decision Trees
Random Forest
Support Vector Machines (SVM)
k-Nearest Neighbors (k-NN)
Neural Networks
2. Regression
Objective: Predict a continuous value.
Output: Numeric (e.g., price, temperature, age).
Examples:
o Predicting house prices
o Forecasting sales
o Estimating medical costs
Common algorithms:
Linear Regression
Decision Tree Regression
Random Forest Regression
Support Vector Regression (SVR)
Gradient Boosting Regressors
📝 Summary Table
Output
Type Examples Algorithms
Type
Spam detection, Logistic Regression, SVM,
Classification Categorical
disease diagnosis k-NN
Price prediction, Linear Regression, SVR,
Regression Continuous
temperature Gradient Boosting
Types of Classification
Classification in machine learning can be categorized into several types based
on the number of classes and the nature of data. Here's a breakdown of the
main types:
🔹 1. Binary Classification
Definition: Classifies inputs into two distinct categories.
Examples:
o Spam vs. Not Spam
o Disease vs. No Disease
o Pass vs. Fail
Algorithms Used: Logistic Regression, SVM, Decision Trees
🔹 2. Multiclass Classification
Definition: Classifies inputs into more than two classes.
Examples:
o Handwritten digit recognition (0–9)
o Classifying types of animals (cat, dog, horse, etc.)
Algorithms Used: Softmax Regression, Random Forest, k-NN, Neural
Networks
🔹 3. Multilabel Classification
Definition: Each input can be assigned multiple labels at once.
Examples:
o Tagging a news article with multiple topics (e.g., "politics",
"economy", "health")
o Movie genre classification (e.g., a movie being both "comedy" and
"romance")
Algorithms Used: Adapted Logistic Regression, Binary Relevance,
Classifier Chains, Deep Learning
🔹 4. Imbalanced Classification
Definition: One class significantly outweighs the others in quantity.
Challenge: Standard models may be biased toward the majority class.
Examples:
o Fraud detection (fraudulent transactions are rare)
o Medical diagnosis for rare diseases
Solutions:
o Resampling (oversampling/undersampling)
o Using metrics like F1-score, AUC-ROC instead of accuracy
Regression
Regression analysis is a fundamental statistical technique used to model and
analyze the relationships between a dependent variable and one or more
independent variables. It helps in understanding how the typical value of the
dependent variable changes when any one of the independent variables is
varied, while the others are held fixed.
🔹 Types of Regression
1. Linear Regression
Description: Models the relationship between the dependent and
independent variables as a straight line.
Use Case: Predicting outcomes like sales based on advertising spend.
Variants:
o Simple Linear Regression: Involves one independent variable.
o Multiple Linear Regression: Involves multiple independent
variables.
2. Polynomial Regression
Description: Extends linear regression by considering polynomial
relationships between the dependent and independent variables.
Use Case: Modeling nonlinear relationships, such as the growth rate of a
plant over time.
3. Ridge Regression
Description: A type of linear regression that includes a regularization
term to prevent overfitting by penalizing large coefficients.
Use Case: When multicollinearity exists among independent variables.
4. Lasso Regression
Description: Similar to ridge regression but can shrink some coefficients
to zero, effectively performing variable selection.
Use Case: When we want to identify and select a subset of predictors.
5. Elastic Net Regression
Description: Combines penalties of both ridge and lasso regressions.
Use Case: When there are multiple features correlated with each other.
6. Logistic Regression
Description: Used when the dependent variable is categorical; models
the probability of a certain class or event.
Use Case: Predicting binary outcomes like pass/fail, win/lose.
7. Quantile Regression
Description: Estimates the conditional median or other quantiles of the
response variable.
Use Case: When the conditions of linear regression are not met,
especially with outliers.
8. Bayesian Regression
Description: Incorporates prior distributions into the regression analysis.
Use Case: When prior information about the parameters is available.
9. Support Vector Regression (SVR)
Description: Uses the principles of support vector machines for
regression problems.
Use Case: When the relationship between variables is nonlinear and
complex.
10. Decision Tree Regression
Description: Uses a tree-like model of decisions for regression tasks.
Use Case: When the data has a hierarchical structure or when
interpretability is important.
11. Random Forest Regression
Description: An ensemble of decision trees that improves predictive
accuracy.
Use Case: When dealing with large datasets with higher dimensionality.
12. Gradient Boosting Regression
Description: Builds models sequentially, each correcting the errors of its
predecessor.
Use Case: When high predictive accuracy is required.
13. Poisson Regression
Description: Used for modeling count data and contingency tables.
Use Case: Predicting the number of times an event occurs in a fixed
interval.
14. Nonparametric Regression
Description: Makes no assumptions about the functional form of the
relationship between variables.
Use Case: When the data structure is unknown or complex.
15. Semiparametric Regression
Description: Combines parametric and nonparametric models.
Use Case: When some variables have a known relationship and others do
not.
📝 Summary Table
Regression Type Description Use Case Example
Predicting sales based on
Linear Regression Models linear relationship
advertising
Polynomial Models nonlinear
Modeling growth rates
Regression relationships
Ridge Regression Penalizes large coefficients Handling multicollinearity
Lasso Regression Performs variable selection Feature selection in models
Combines ridge and lasso Complex models with
Elastic Net Regression
penalties many predictors
Logistic Regression Models binary outcomes Email spam detection
Quantile Regression Models conditional quantiles Dealing with outliers
Regression Type Description Use Case Example
Incorporates prior When prior knowledge is
Bayesian Regression
information available
Support Vector Uses support vector Complex, nonlinear
Regression machines for regression relationships
Decision Tree
Tree-based modeling Hierarchical data structures
Regression
Random Forest
Ensemble of decision trees High-dimensional data
Regression
Gradient Boosting High predictive accuracy
Sequentially corrects errors
Regression needs
Predicting event
Poisson Regression Models count data
occurrences
Nonparametric No assumptions about data Unknown or complex data
Regression structure structures
Semiparametric Mix of parametric and Partial knowledge about
Regression nonparametric models data structure
Algorithms used in Supervised Learning
In supervised learning, various algorithms are used depending on whether the
task is classification or regression. Here's a categorized list of the most
common and widely used algorithms:
Classification Algorithms (Predict discrete labels)
Algorithm Description Best Use Cases
Models probability of a Spam detection, medical
Logistic Regression
binary outcome diagnosis
Tree-like model of Interpretability,
Decision Tree
decisions categorical data
Ensemble of decision High accuracy, reduces
Random Forest
trees overfitting
Support Vector Machine Finds optimal boundary High-dimensional data,
(SVM) between classes margin-based separation
k-Nearest Neighbors (k- Classifies based on Simple datasets,
NN) majority of neighbors recommendation systems
Naive Bayes Probabilistic classifier Text classification, spam
based on Bayes' filtering
Algorithm Description Best Use Cases
theorem
Gradient Boosting Builds models
High performance in
Machines (e.g., XGBoost, sequentially to reduce
competitions
LightGBM) errors
Layers of nodes to Image, speech, and text
Neural Networks (MLP)
model complex patterns classification
Regression Algorithms (Predict continuous values)
Algorithm Description Best Use Cases
Price prediction, trend
Linear Regression Models linear relationships
analysis
Ridge Regression Adds L2 regularization Multicollinearity issues
Adds L1 regularization Sparse models, high-
Lasso Regression
(feature selection) dimensional data
Combines L1 and L2 When both Lasso and Ridge
Elastic Net Regression
penalties are suitable
Decision Tree Tree structure for Interpretable models,
Regression regression tasks nonlinear data
Random Forest Ensemble method, General purpose, high
Regression averages multiple trees accuracy
Support Vector Regression with margins Nonlinear regression, small
Regression (SVR) like SVM datasets
Gradient Boosting Sequentially improves
Predictive analytics
Regression performance
Complex, nonlinear
Neural Networks Can model any function
regression tasks
(ANNs) with enough data
Summary
Task Common Algorithms
Logistic Regression, SVM, Decision Trees, Random Forest, k-
Classification
NN, Naive Bayes, Neural Networks
Linear Regression, Lasso/Ridge, Decision Tree Regression,
Regression
Random Forest, SVR, Neural Networks
Algorithms Used in Both Classification and Regression
These algorithms have flexible formulations that support both task types:
Algorithm Classification Use Regression Use
Predict class labels (e.g., Predict numeric values
Decision Trees
"Yes/No") (e.g., price)
Ensemble of classification Ensemble of regression
Random Forest
trees trees
Support Vector Machines Class separation via Predict a value using
(SVM/SVR) hyperplane margins
k-Nearest Neighbors (k- Classifies based on Predicts value by
NN) neighbors' majority averaging neighbors
Output softmax/sigmoid Output linear/activation
Neural Networks (ANNs)
for classification for regression
Gradient Boosting (e.g., Classification trees in Regression trees in
XGBoost, LightGBM) sequence sequence
These algorithms are task-agnostic — they adjust their loss functions and
output layers depending on the problem.
Algorithms Used in Only One Type
🟦 Used Only in Classification:
Algorithm Reason / Limitation
Naive Bayes Based on categorical probability distributions
Softmax Regression Specific to multiclass classification tasks
Perceptron Designed only for binary classification
Used Only in Regression:
Algorithm Reason / Limitation
Linear Regression Directly predicts a continuous value
Lasso/Ridge/Elastic Net Variants of linear regression, not suited for
Regression classification without major adaptation
Used for modeling count-based dependent
Poisson Regression
variables
Why Some Algorithms Are Dual-Purpose
It comes down to how the algorithm is structured:
If it can accept a flexible loss function (like MSE for regression or cross-
entropy for classification),
and adjust the output layer/structure (e.g., a probability for
classification vs. a continuous value for regression), then it can handle
both.
Machine Learning Framework
A machine learning (ML) framework provides tools, libraries, and interfaces
that simplify and standardize the process of building, training, evaluating, and
deploying machine learning models.
Key Roles of an ML Framework:
1. Abstraction and Simplification:
o Provides high-level APIs to define models easily without writing
complex mathematical code.
o Simplifies data preprocessing, model training, and evaluation.
2. Support for Model Building:
o Allows you to build models using predefined layers, loss functions,
and optimizers.
o Often includes pre-trained models and transfer learning tools.
3. Efficient Computation:
o Optimized for performance using CPU, GPU, or even TPU.
o Handles parallel processing and large-scale data training
efficiently.
4. Experimentation and Tuning:
o Includes tools for tracking experiments, hyperparameter tuning,
and model versioning.
5. Deployment and Scalability:
o Helps package and deploy models into production environments
(cloud, mobile, web, etc.).
o Supports model serving and APIs.
6. Interoperability:
o Integrates with other tools like visualization libraries (e.g.,
TensorBoard), data pipelines, and cloud services.
Popular ML Frameworks:
TensorFlow (by Google)
PyTorch (by Meta)
scikit-learn (for traditional ML)
Keras (high-level API often used with TensorFlow)
XGBoost/LightGBM (for gradient boosting)
Core Mathematical Concepts in Machine Learning
1. Linear Algebra
Used in: Representing data, model parameters, and transformations.
Key Topics:
o Vectors, matrices, tensors
o Matrix multiplication and inversion
o Eigenvalues and eigenvectors (e.g., PCA)
Examples:
o Representing images as matrices
o Linear regression weights as a vector
2. Calculus (Mostly Differential Calculus)
Used in: Optimization, especially during model training.
Key Topics:
o Derivatives and gradients
o Partial derivatives
o Chain rule (especially for backpropagation in neural networks)
Examples:
o Gradient descent for minimizing loss functions
3. Probability and Statistics
Used in: Understanding data distributions, making predictions, evaluating
models.
Key Topics:
oProbability distributions (e.g., Gaussian, Bernoulli)
o Bayes' theorem
o Mean, variance, standard deviation
o Hypothesis testing and confidence intervals
Examples:
o Naive Bayes classifier
o Probabilistic models and uncertainty estimation
4. Optimization
Used in: Finding the best parameters (weights) for a model.
Key Topics:
o Convex functions
o Gradient descent and variants (SGD, Adam)
o Loss functions (e.g., MSE, cross-entropy)
Examples:
o Training a neural network involves optimizing the loss function
5. Discrete Mathematics
Used in: Logic, algorithms, and sometimes graph-based models.
Key Topics:
o Sets, functions, and relations
o Graph theory
o Combinatorics
Examples:
o Decision trees
o Graph neural networks
6. Information Theory
Used in: Understanding data entropy, model uncertainty, and
communication.
Key Topics:
o Entropy
o Information gain
o KL-divergence
Examples:
o Feature selection using information gain in decision trees
7. Numerical Methods
Used in: Efficient and stable computation, especially for large-scale
problems.
Key Topics:
o Approximation techniques
o Numerical stability
o Iterative algorithms
Role in ML Example Use Case
Math Area
Data representation, model
Linear Algebra Neural networks, image processing
computation
Optimization via gradient-based
Calculus Backpropagation in deep learning
methods
Naive Bayes, regression
Probability/Stats Inference, modeling uncertainty
assumptions
Optimization Parameter tuning Gradient descent
Discrete Math Logic, decision-making models Decision trees, rule-based systems
Information Theory Understanding and quantifying data Feature selection, loss metrics
Training large models with big
Numerical Methods Efficient algorithm implementation
datasets
What Is a Loss Function?
A loss function is a way for your machine learning model to measure how wrong it is
during training.
Think of it like a "report card" for each prediction:
The bigger the loss, the worse the model's prediction was.
🧪 Why Do We Need a Loss Function?
The model:
1. Makes a prediction (e.g., price = $200, or class = "cat").
2. Compares it to the actual answer (ground truth).
3. Calculates the difference (the "loss").
4. Learns by trying to minimize that loss over time.
This process is like telling the model:
“Hey, you were off by this much. Next time, do better.”
📊 Common Loss Functions
🔹 For Regression Problems (predicting numbers):
1. Mean Squared Error (MSE)
o Formula: MSE=1n∑(ytrue−ypred)2\text{MSE} = \frac{1}{n} \sum (y_{\
text{true}} - y_{\text{pred}})^2MSE=n1∑(ytrue−ypred)2
o Penalizes large errors more heavily.
o Example: Predicting house prices.
2. Mean Absolute Error (MAE)
o Formula: MAE=1n∑∣ytrue−ypred∣\text{MAE} = \frac{1}{n} \sum |y_{\
text{true}} - y_{\text{pred}}|MAE=n1∑∣ytrue−ypred∣
o More tolerant to outliers than MSE.
🔹 For Classification Problems (predicting categories):
1. Binary Cross-Entropy (for binary classification)
o Used when output is yes/no, true/false.
o It measures how well predicted probabilities match actual labels (0 or 1).
2. Categorical Cross-Entropy (for multi-class classification)
o Used when there are more than two classes (e.g., digit 0–9).
o Measures the “distance” between the predicted probability distribution and the
actual class.
🧭 How Loss Helps Learning
The loss is used by gradient descent to update the model’s weights.
👉 Smaller loss = better predictions.
👉 Model keeps adjusting until loss can’t go much lower.
🎯 Simple Example (Regression)
Suppose:
True value: 10
Predicted value: 8
MSE:
Loss=(10−8)2=4\text{Loss} = (10 - 8)^2 = 4Loss=(10−8)2=4
MAE:
Loss=∣10−8∣=2\text{Loss} = |10 - 8| = 2Loss=∣10−8∣=2
Structure of ML programs
Step Purpose
Import Libraries Access tools and algorithms
Load Data Get the input dataset
Preprocess Clean and prepare data
Split Data Create training/test sets
Train Model Fit the algorithm to the data
Predict Use the model to infer outputs
Evaluate Measure model accuracy
Save Export the model for reuse
Example
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from [Link] import RandomForestClassifier
from [Link] import accuracy_score
data = pd.read_csv('[Link]')
[Link](0, inplace=True)
data['label'] = data['label'].astype('category').[Link]
X = [Link]('label', axis=1)
y = data['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
model = RandomForestClassifier()
[Link](X_train, y_train)
y_pred = [Link](X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
import joblib
[Link](model, '[Link]')
Complete ML Program: Iris Classification using Random Forest
# 1. Import Libraries
import numpy as np
import pandas as pd
from [Link] import load_iris
from sklearn.model_selection import train_test_split
from [Link] import RandomForestClassifier
from [Link] import accuracy_score
import joblib
# 2. Load the Dataset
iris = load_iris()
X = [Link]([Link], columns=iris.feature_names)
y = [Link]([Link])
# 3. Preprocess the Data
# (In this dataset, there's no missing data or categorical encoding required)
# 4. Split the Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# 5. Choose and Train a Model
model = RandomForestClassifier(n_estimators=100, random_state=42)
[Link](X_train, y_train)
# 6. Make Predictions
y_pred = [Link](X_test)
# 7. Evaluate the Model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
# 8. Save the Model
[Link](model, 'iris_model.pkl')