0% found this document useful (0 votes)

7 views8 pages

DS ML Machine Learning I

This document is a refresher guide for Data Science and Machine Learning interviews, covering essential mathematical and technical concepts across various algorithms. It includes topics such as data types, exploratory data analysis, data preprocessing, feature engineering, and different machine learning algorithms. The material is designed for individuals with a foundational knowledge in the field and serves as a revision tool for key concepts tested in interviews.

Uploaded by

Albeniz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

DS ML Machine Learning I

Uploaded by

Albeniz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Machine Learning: Data Science and ML Refresher

Shubhankar Agrawal

Abstract
This document serves as a quick refresher for Data Science and Machine Learning interviews. It covers mathematical and technical concepts
across a range of algorithms. This requires the reader to have a foundational level knowledge with tertiary education in the field. This PDF
contains material for revision over key concepts that are tested in interviews.

Contents
Table 1. Data Types

1 Machine Learning 1 Data Type Description

1.1 Key Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Nominal Categorical No order
Ordinal Categorical Ordered
2 Exploratory Data Analysis 2 Interval Numerical Can be negative
2.1 General Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 2 Ratio Numerical Has a defined 0
2.2 Univariate Analyses . . . . . . . . . . . . . . . . . . . . . . . . 2
Target Variable Distribution • Feature Distributions
1. Machine Learning
2.3 Bi-variate Analyses . . . . . . . . . . . . . . . . . . . . . . . . 2
2.4 Multi-variate Analyses . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Key Concepts
Data Splits:
3 Data Pre-Processing 2
Train: The model learns from it.
3.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Valid: Tune hyper-parameters, prevent over-fitting.
Missing Values • Outliers • Erroneous Values • Others Test: Unseen data, evaluate performance.
3.2 Feature Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Standardization • Normalization • Box Cox Transform
Table 2. Bias Variance Trade-Off
3.3 Target Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Bias Variance
4 Feature Engineering 2
What? Error Prediction Variability
4.1 Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Complexity Too Simple Too Complex
Text • Pixels • Temporal Fitting Under Over
4.2 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Train Error High Low
Temporal • Complexity Test Error High High
4.3 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Formula Bias(𝜃)
̂ = 𝔼[𝜃]
̂ −𝜃 Var(𝜃)
̂ = 𝔼[(𝜃̂ − 𝔼[𝜃])
̂ 2]

4.4 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Types of Learning:
4.5 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Supervised: Labelled data
Iterative • Model Based • Statistical Tests
Unsupervised: Unlabelled data
4.6 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . 3
Reinforcement: Learn with feedback from environment
5 Algorithms 4 Other terms:
Parameters: Weights the model learns
5.1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
OLS • Gradient Descent • Metrics • Regularization • Assumptions
Hyper-parameters: Weights to adjust performance
Cross Validation: Expose all data (K Fold, LOOCV, Temporal)
5.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Logistic Regression • Naive Bayes • Metrics • Assumptions

5.3 More Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

K Nearest Neighbours • Support Vector Machines • Decision Trees

5.4 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . 6

Bagging • Boosting • Stacking

5.5 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
K-Means • DB Scan • GMM • Metrics

6 More Techniques 7
6.1 Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . 7
6.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 7
Policy Learning • Q-Learning • Exploration Exploitation

7 Nuances 8
7.1 Imbalanced Data . . . . . . . . . . . . . . . . . . . . . . . . . 8 Figure 1. Bias Variance Trade-off
7.2 Biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 [2]
7.3 More Data Unhelpful . . . . . . . . . . . . . . . . . . . . . . . 8

1–8
Machine Learning: Data Science and ML Refresher

2. Exploratory Data Analysis • Fill with Mean / Median by groups (or) nearest neighbours
• Analyse temporal patterns to fill by dates
2.1. General Statistics
• Fit a regression line to fill missing values
Pandas operations to analyse dataframes:
info: Information on null values, data types, and memory 3.1.2. Outliers
describe: Descriptive statistics of mean, median and IQR Identify by Box Plots, Anomaly Detection
value_counts: Counts for categorical columns
• Remove if not too many.
2.2. Univariate Analyses • Winsorize (clip by IQR)
2.2.1. Target Variable Distribution
3.1.3. Erroneous Values
Continuous: Plot distribution, identify outliers
Some values might be errors and can be fixed
Discrete: Value counts to check for imbalanced data
2.2.2. Feature Distributions • Amounts scaled by 10 / 100
Continuous: Box plots, Histograms • Negative values in a non-negative space
Discrete: Bar charts of value counts • Invalid values (such as co-ordinates)

3.1.4. Others
2.3. Bi-variate Analyses
• Drop duplicates
• Fix entities (NYC vs New York City)
• Date formats
Table 3. Bi-variate Analyses
Comparing Compared Plots 3.2. Feature Scaling
Continuous Continuous Scatter 3.2.1. Standardization
Continuous Discrete Violin, Bar, Line • Calculate Z Scores
Discrete Discrete Grouped Bars • Useful to bring features to similar distributions
• Robust to outliers
Pair plots from Seaborn can plot different types of features against • Better for SVM, Linear Regression
each other in a single graphic.
Can also use IQR to scale more robustly.

3.2.2. Normalization
• MinMax Scaling
• Good for data without outliers
• Better for KNN, Neural Networks

3.2.3. Box Cox Transform

Log (or) Power transform maximizing normality

𝑛
𝑛 ∑
max 𝓁(𝜆) = − log(𝜎2 ) + (𝜆 − 1) log(𝑦𝑖 ) (1)
𝜆 2 𝑖=1

Table 4. Feature Scaling

Method Formula
𝑥−𝜇
Standardization 𝑥std =
𝜎
Figure 2. Pair Plots 𝑥 − min(𝑥)
[4] Normalization 𝑥norm =
max(𝑥) − min(𝑥)
⎧ (𝑥𝜆 − 1)
2.4. Multi-variate Analyses if 𝜆 ≠ 0
Box Cox Transform 𝑦(𝜆) = 𝜆
• Heat maps ⎨
log(𝑥) if 𝜆 = 0
• Violin Plots (Binary categories) ⎩
• Scatter Plots (with Sizes)
• Correlation matrices
3.3. Target Scaling
3. Data Pre-Processing Log Scale: When extremely skewed, but reduces interpretability
3.1. Data Cleaning
(With co-efficients)
NOTE: Use median instead of mean when the data is skewed.
4. Feature Engineering
3.1.1. Missing Values
Remove if not too many (especially in Target) 4.1. Extraction
Impute: Fill values 4.1.1. Text
Interpolate: Estimate the line • TF-IDF scores
Ways to fill values: • Sentence Embeddings

2–8
Machine Learning: Data Science and ML Refresher

4.1.2. Pixels
Table 7. Model based Feature Selection
• Intensity
• Hue Model Identification
• Brightness Lasso (L1) 0 co-efficients - Remove
Random Forest Feature Importances - Descending
4.1.3. Temporal
• Year, Month, Day
• Part of day, Weekday
4.5.2. Model Based
• Fourier Transforms (sin, cos) for periodicity
Use model capabilities to identify important features
4.2. Transformation
4.2.1. Temporal 4.5.3. Statistical Tests

Add temporal changes and history Statistical tests can provide comparison of significance by looking at:

• Test Statistic (direction of influence)

Lags @ 𝑥𝑡 = 𝑥𝑡−7 • P-value (Acceptance)
Rolling Mean @ 𝑥𝑡 = (𝑥𝑡−2 + 𝑥𝑡−3 + 𝑥𝑡−4 )∕3 (2)
Difference @ 𝑥𝑡 = 𝑥𝑡 ∕𝑥𝑡−1
Table 8. Statistical Test
Differencing also brings stationarity
Feature Target Test
4.2.2. Complexity
Continuous Continuous T-Test / Z-Test
Introduce non-linearity
Continuous Discrete ANOVA (F-Test)
Discrete Continuous ANOVA (F-Test)
Polynomial @ 𝑥𝑖 = 𝑥𝑖2 + 𝑥𝑖3 Discrete Discrete Chi Square
(3)
Interaction @ 𝑥𝑡 = 𝑥𝑖 𝑥𝑗 + 𝑥𝑖 𝑥𝑗2
Multicollinearity can also be removed
4.3. Encoding
Converting categorical to numerical variables. Some models can • Correlation matrix (Identify highly correlated features)
handle categorical data, so not necessary. • Variance Inflation Factor (VIF > 5 or 10)

VIF is calculated by regressing each feature on the other features

Table 5. Encoding
1
Type How Cardinality Columns VIF𝑖 = (4)
One Hot 0∕1 Dummies Low # distinct values 1 − 𝑅𝑖2
Ordinal Order + Scale Any -
Target Group Mean High - 4.6. Dimensionality Reduction
Principal Components Analysis:
NOTE: Encoding should only be done using train data
• Standardize data (Z Score)
• Identify data types
• Compute covariance matrix
• Generate lags, averages, aggregate features
• Eigenvalue Decomposition
• Feature Pre-processing

4.4. Sampling Linear Discriminant Analysis:

Usually performed when data is imbalanced.
Downsample: Reduce instances of majority class • Group by classes
Upsample: Increase instances of minority class • SSW (Sum of Squares Within classes)
Upsampling involves creating instances: • SSB (Sum of Squares Between classes) [Multiply # class sam-
ples]
• SMOTE - Interpolate points in space • SSB / SSW
• Variational Auto Encoder - Learn Distribution • Eigenvalue Decomposition

4.5. Selection
4.5.1. Iterative
Table 9. Dimensionality Reduction
Sequentially add or remove features one by one to optimize perfor-
mance PCA LDA t-SNE UMAP
Labels? No Yes No Both
Table 6. Iterative Feature Selection
Linear Yes Yes No No
Preserves Global Classes Local Global
Step Forward Backward Linear Clustering
Best For Classify Visualize
Start 0 features All features Patterns Embeddings
Step Feature to add Feature to remove Issues Non-linear Labels Local Tuning

3–8
Machine Learning: Data Science and ML Refresher

5. Algorithms
Examples of different types: Pred Int = 𝑦̂ ∗ ± 𝑡𝛼∕2,𝑛−𝑘−1 ⋅ SE(𝑦̂ ∗ )
Supervised: Most models with labelled data √ ( ) (11)
Unsupervised: Clustering, Variational Auto-Encoders (VAE) SE(𝑦̂ ∗ ) = 𝜎̂ 2 ⋅ 1 + 𝐱⊤∗ (𝐗⊤ 𝐗)−1 𝐱∗
Parametric: Most models with weights to learn
5.1.3. Metrics
Non-Parametric: K Nearest Neighbours, Decision Trees 𝑛
Discriminative: Most models that predict 1∑
MAE = |𝑦 − 𝑦̂ 𝑖 |
Generative: Naive Bayes, Latent Dirichlet Analysis, VAEs 𝑛 𝑖=1 𝑖
𝑛
1 ∑ 𝑦𝑖 − 𝑦̂ 𝑖
5.1. Regression MAPE = | | ⋅ 100
𝑛 𝑖=1 𝑦𝑖
Predicting a continuous variable. 𝑛
1∑
MSE = (𝑦 − 𝑦̂ 𝑖 )2
𝑦 = 𝑋𝛽 + 𝜀 (5) 𝑛 𝑖=1 𝑖
√ (12)
Interpretation: A change in X by 1 unit, increases / decreases y √
√1 ∑
𝑛

by 𝛽 units. RMSE = (𝑦 − 𝑦̂ 𝑖 )2
𝑛 𝑖=1 𝑖
5.1.1. OLS ∑𝑛
(𝑦 − 𝑦̂ 𝑖 )2
𝑖=1 𝑖
2
Ordinary Least Squares, closed Form 𝑅 = 1 − ∑𝑛
(𝑦 − 𝑦)
𝑖=1 𝑖
̄ 2

2 (1 − 𝑅2 )(𝑛 − 1)
𝛽̂ = arg min(𝑦 − 𝑋𝛽)′ (𝑦 − 𝑋𝛽) 𝑅adj =1−( )
𝑛−𝑝−1
𝛽
(6)
𝛽̂ = (𝑋 𝑋)−1 𝑋 ′ 𝑦
′
Notes:

5.1.2. Gradient Descent • 𝑅2 increases with variables; use adjusted 𝑅2

Iterative Solution • Use RMSE instead of MSE to stay in the same scale

5.1.4. Regularization
Table 10. Gradient Descent Terminology Control co-efficients preventing them from getting too large
Variable Symbol Description
Loss/Cost Function 𝐽 Penalizes predictions Table 11. Regression Regularization
Learning Rate 𝛼 Learning step size
Type Term Gradient
∑
Lasso (L1) 𝜆 |𝛽𝑗 | 𝜆 ⋅ sign(𝛽𝑗 )
𝜆 ∑ 2
𝑛
Ridge (L2) 𝛽𝑗 2𝜆𝛽𝑗
2
1 ∑ 2 𝜆 ∑ 2 ∑
MSE 𝐽(𝛽) = (𝑦 − 𝑋𝑖 𝛽) Elastic Net 𝛽𝑗 + 𝜆1 |𝛽𝑗 | 2𝜆𝛽𝑗 + 𝜆1 ⋅ sign(𝛽𝑗 )
2𝑛 𝑖=1 𝑖 2

𝛽 (𝑡+1) = 𝛽 (𝑡) − 𝛼∇𝐽(𝛽 (𝑡) )

(7)
1
∇𝐽(𝛽) = − 𝑋 ′ (𝑦 − 𝑋𝛽) 5.1.5. Assumptions
𝑛
1 • Data follows linear relationship
𝛽 (𝑡+1) = 𝛽 (𝑡) + 𝜂 ⋅ 𝑋 ′ (𝑦 − 𝑋𝛽 (𝑡) ) • Errors are normally distributed
𝑛
• Errors are homo-skedastic (Constant variance)
Standard Error • No multicollinearity of features
∑𝑛 • No auto-correlation of errors
𝑖=1
(𝑦𝑖 − 𝑦̂ 𝑖 )2
2
𝜎̂ = (8)
𝑛−𝑘−1 Fixes: Log transforms, outlier removals
Confidence Interval (Co-efficients)
5.2. Classification
Predicting a discrete variable, a binary or a multi-class.
Conf Int 𝛽 = 𝛽𝑗 ± 𝑡𝛼∕2,𝑛−𝑘−1 ⋅ SE(𝛽𝑗 )
√
𝜎̂ 2 (9) 5.2.1. Logistic Regression
SE(𝛽𝑗 ) = ⊤ Fits a Linear Regression to the log odds
𝐗 𝐗𝑗𝑗
Terminology
Confidence Interval
Range of mean predicted value 𝑝
Odds =
1−𝑝
Conf Int = 𝑦̂ ∗ ± 𝑡𝛼∕2,𝑛−𝑘−1 ⋅ SE(𝑦̂ ∗ ) 𝑝 (13)
Log-Odds = ln ( ) = 𝛽0 + ⋯ + 𝛽𝑘 𝑥𝑘
√ ( ) (10) 1−𝑝
SE(𝑦̂ ∗ ) = 𝜎̂ 2 ⋅ 𝐱⊤∗ (𝐗⊤ 𝐗)−1 𝐱∗ Odds Ratio (OR) = 𝑒𝛽𝑗

Prediction Interval Interpretation: A change in X by 1 unit, increases / decreases the

Range of newly predicted value, includes observation noise. Log Odds by 𝛽 units.

4–8
Machine Learning: Data Science and ML Refresher

𝑧 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 𝑇𝑃 + 𝑇𝑁
Accuracy =
1 (14) 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
𝑝 = 𝜎(𝑧) = [Sigmoid] 𝑇𝑃
1 + 𝑒−𝑧 Precision =
𝑇𝑃 + 𝐹𝑃
𝑇𝑃
Recall / Sensitivity (TPR) =
𝑇𝑃 + 𝐹𝑁
Table 12. Logistic Regression Co-efficients 𝑇𝑁
Specificity (TNR) =
Beta Odds Ratio Effect 𝑇𝑁 + 𝐹𝑃
Precision ⋅ Recall
0 1 No effect (𝑝 = 1 − 𝑝) F1 Score = 2 ⋅
Precision + Recall
<0 <1 Decreases 𝑝
Precision ⋅ Recall
>0 >1 Increases 𝑝 F𝛽 = (1 + 𝛽 2 ) ⋅ 2
(𝛽 ⋅ Precision) + Recall
(18)
𝑛
1∑
Log Loss = − [𝑦 log(𝑦̂ 𝑖 ) + (1 − 𝑦𝑖 ) log(1 − 𝑦̂ 𝑖 )]
𝑛 𝑖=1 𝑖
𝑛
1∑
∇𝛽𝑗 Log Loss = (𝑦̂ − 𝑦𝑖 ) 𝑥𝑖𝑗 (15)
𝑛 𝑖=1 𝑖
𝑛
1∑
𝛽𝑗 = 𝛽𝑗 − 𝜂 ⋅ (𝑦̂ − 𝑦𝑖 ) 𝑥𝑖𝑗
𝑛 𝑖=1 𝑖

Multi-class Classification

𝑒𝑧𝑘
𝑝𝑖𝑘 = ∑𝐾
𝑗=1
𝑒 𝑧𝑗
𝑛 𝐾
1 ∑∑ Figure 3. AUC ROC (and) AUC PR
Cross Entropy Loss = − 𝑦 log(𝑝𝑖𝑘 )
𝑛 𝑖=1 𝑘=1 𝑖𝑘 (16) [1]

∇𝑧𝑘 Loss = 𝑝𝑖𝑘 − 𝑦𝑖𝑘

5.2.4. Assumptions
𝑛
(𝑘) (𝑘) 1∑ Similar assumptions as linear regression on fit
𝛽𝑗 = 𝛽𝑗 −𝜂⋅ (𝑝 − 𝑦𝑖𝑘 )𝑥𝑖𝑗
𝑛 𝑖=1 𝑖𝑘
• Logits follow linear relationship
• Data is linearly separable
• Categories are mutually exclusive
Table 13. Multi Class Classification
Binary Multiclass 5.3. More Methods
Methods that can be used for both regression and classification
Loss Binary Log Loss Cross Entropy
Activation Sigmoid Softmax 5.3.1. K Nearest Neighbours
• Non-parametric method
• Does not have any training
5.2.2. Naive Bayes • Evaluation done by aggregating nearest points
Key Assumptions: • Cross-validate to get best K value

• Features are independent 5.3.2. Support Vector Machines

• Continuous features follow Gaussian distribution Fits a hyperplane
Support Vectors: Points on margin

∏𝑛
𝑃(𝐶𝑘 ) 𝑖=1
𝑃(𝑋𝑖 ∣ 𝐶𝑘 ) 𝑓(𝑥) = 𝑤 𝑇 𝑥 + 𝑏
𝑃(𝐶𝑘 ∣ 𝑋1 , 𝑋2 , … , 𝑋𝑛 ) =
𝑃(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) 1 (19)
𝑛
(17) Minimize: ‖𝑤‖2
∏ 2
𝐶̂ = arg max 𝑃(𝐶𝑘 ) 𝑃(𝑋𝑖 ∣ 𝐶𝑘 )
𝐶𝑘
𝑖=1 Classification
Constraints
5.2.3. Metrics
𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) ≥ 1, ∀𝑖 (20)

Hinge Loss
Table 14. Confusion Matrix
𝑛
Real / Pred True False 1∑
L= max(0, 1 − 𝑦𝑖 𝑦̂ 𝑖 ) (21)
𝑛 𝑖=1
True True Positive (TP) False Negative (FN)
False False Positive (FP) True Negative (TN)
Multi-class classification

5–8
Machine Learning: Data Science and ML Refresher

One vs One (OvO): Fit 𝑛2 models • Maximize Information Gain

One vs Rest (OvR: Fit 𝑛 models • Minimize Entropy
Regression • Minimize Gini
Constraints • Minimize MSE (Regression)

Discrete: Evaluate each categorical value vs others

𝑦𝑖 − (𝑤 𝑇 𝑥𝑖 + 𝑏) ≤ 𝜖 + 𝜉𝑖+ ,
(22) Continuous: Evaluate boundaries where predictions change
(𝑤 𝑇 𝑥𝑖 + 𝑏) − 𝑦𝑖 ≤ 𝜖 + 𝜉𝑖−
𝑘
∑
Gini(𝑡) = 1 − 𝑝𝑖2
𝑖=1
𝑘
∑
Entropy(𝑡) = − 𝑝𝑖 log2 (𝑝𝑖 )
𝑖=1
∑ |𝑡𝑣 |
Information Gain(𝑡, 𝐴) = Entropy(𝑡) − Entropy(𝑡𝑣 )
𝑣∈Values(𝐴)
|𝑡|
∑ |𝑡𝑣 | |𝑡𝑣 |
Intrinsic Information(𝐴) = − log2 ( )
𝑣∈Values(𝐴)
|𝑡| |𝑡|
Figure 4. Support Vectors
[5] Information Gain(𝐴)
Gain Ratio(𝐴) =
Intrinsic Information(𝐴)
Kernels (23)

• Linear C 5.0 Algorithm

• Polynomial
• Radial Basis Function (RBF): Exponential equation • Use Gain Ratio for optimized splits
• Winnowing (Remove features least used)
NOTE: RBF Non-parametric (dimensions ∝ samples) • Prune with Cost Complexity (# Leaf Nodes, Entropy)

Hyper-parameters
Table 15. Regularization with C in SVM Can be used for Regularization (with Pruning)
C E.g. Margin Slack / Errors
Low 0.1 Wide High Slack
Moderate 1.0 Balanced Moderate Slack Table 16. Hyper-parameters
High 10 Narrow Low Slack Hyper-parameter Use
Very High 06 Very Narrow Very Low Slack
max_depth Maximum depth of tree
min_samples_split Minimum samples to split
5.3.3. Decision Trees min_samples_leaf Minimum samples at leaf
Tree-based structure to perform splits.
Can be considered a non-parametric model for not making assump-
tions on data distribution. 5.4. Ensemble Methods
• Identify best feature to split on 5.4.1. Bagging
• Recursively continue splits Bootstrap Aggregating models
• Calculate predictions by average of node values
• Run parallel
• Minimize variance

𝑛 𝑛
1∑ 1 ∑ 1
Var(𝑦)
̂ = Var ( 𝑦̂ 𝑖 ) = 2 Var(𝑦̂ 𝑖 ) = 2 ⋅ 𝑛 ⋅ Var(𝑦̂ 𝑖 ) (24)
𝑛 𝑖=1 𝑛 𝑖=1 𝑛

Random Forest

• Randomly subset samples at each node

• Randomly subset features to test at each node

Out of Bag: Remaining samples not used for validation score

5.4.2. Boosting
Sequentially built models

Figure 5. Decision Tree • Run iteratively

[3] • Minimize bias
• Gradient Descent
Split Calculations • Sum predictions from weighted models

6–8
Machine Learning: Data Science and ML Refresher

Adaptive Boosting (AdaBoost) • Assumes spherical clusters

Weight samples higher for misclassification • Results depend on initialization
• Follows Expectation (Assign cluster) Maximization (recalculate
𝑀
centroid)
∑
𝐹(𝑥) = 𝛼𝑚 ℎ𝑚 (𝑥)
𝑚=1 𝐾
∑ ∑
1 1 − 𝜖𝑚 𝐽(𝐾) = ‖𝑥𝑖 − 𝜇𝑘 ‖2 (28)
𝛼𝑚 = ln ( ) 𝑘=1 𝑥𝑖 ∈𝐶𝑘
2 𝜖𝑚
𝐷𝑚 (𝑥) = 𝐷𝑚−1 (𝑥) ⋅ exp (−𝛼𝑚 𝑦𝑚 ℎ𝑚 (𝑥)) NOTE: KMeans++ can be used to distance centroid initialization
where (25)
5.5.2. DB Scan
𝐹(𝑥) = Final Prediction
Density based clustering
ℎ𝑚 (𝑥) = Prediction from the 𝑚th model
𝛼𝑚 = Weight for the 𝑚th model, based on its accuracy • Needs # Points and Minimum Distance
𝜖𝑚 = Weighted error rate of the 𝑚th model • Does not need number of clusters
• Identifies outliers
𝐷𝑚 (𝑥) = Weight of sample 𝑥 after the 𝑚th model update

Gradient Boosting
Add learners on residuals from previous models
core point: Number of points within distance 𝜖 ≥ minPts (29)

𝐹𝑀 (𝑥) = 𝐹𝑀−1 (𝑥) + 𝜂ℎ𝑀 (𝑥) 5.5.3. GMM

𝑟𝑖 = −∇𝐿(𝐹𝑀−1 (𝑥𝑖 )) Gaussian Mixture Models: Soft clustering

𝑀
∑ • Assume several Gaussian distributions
𝐹(𝑥) = 𝜂ℎ𝑚 (𝑥) • Model latent parameters
𝑚=1

where
𝐾
∑
𝐹𝑀 (𝑥) = Prediction from the 𝑀 th iteration of the model 𝑃(𝑥) = 𝜋𝑘 𝒩(𝑥 ∣ 𝜇𝑘 , Σ𝑘 ) (30)
𝐹𝑀−1 (𝑥) = Prediction from the (𝑀 − 1)th iteration of the model 𝑘=1

ℎ𝑚 (𝑥) = Model at the 𝑚th iteration (fit on residuals) 5.5.4. Metrics

𝑟𝑖 = Residual (negative gradient of the loss function) Silhouette Score: Intra-cluster vs inter-cluster distance.
𝐿 = Loss function, used to compute the residuals Range: -1 to 1
(26)
𝑏(𝑖) − 𝑎(𝑖)
𝑠(𝑖) = (31)
max(𝑎(𝑖), 𝑏(𝑖))

Table 17. Gradient Boosting

Where:

LightGBM XGBoost • 𝑠(𝑖): Silhouette score for point 𝑖,

• 𝑎(𝑖): Mean intra-cluster distance (average distance of 𝑖 to all
Growth Leaf Wise Depth Wise
other points in the same cluster),
Categorical Direct Encoding
• 𝑏(𝑖): Mean nearest-cluster distance (average distance of 𝑖 to
Memory Efficient Not so much
points in the nearest cluster).

LightGBM: Adjusted Rand Index: Similarities by pairwise points

Range: 0 to 1
• Histogram based approach to optimize splits
• Gradient Based One Sided Sampling (GOSS) Index − Expected Index
𝐴𝑅𝐼 = (32)
• Exclusive Feature Bundling (EFB) Max Index − Expected Index

∑ Where:
𝑖∈𝐼𝑗
𝑔𝑖
Leaf Value 𝑤𝑗 = − ∑ (27) • Index: Number of point pairs assigned to the same or different
𝑖∈𝐼𝑗
ℎ𝑖 + 𝜆 clusters in both ground truth and predicted clusters
• Expected Index: The expected value of the Index if clusters were
5.4.3. Stacking
randomly assigned
Combine advantages of many models • Max Index: The maximum possible value of the Index
• Train several base models
• Train meta model - cross validated predictions of base models 6. More Techniques

Hard Voting: Majority prediction 6.1. Anomaly Detection

Soft Voting: Weighted average (accuracies) • Isolation Forest (Random Forest - Average depth)
• DB Scan
5.5. Clustering
5.5.1. K-Means 6.2. Reinforcement Learning
Unsupervised approach to group data Receptive environment-based algorithm with feedback

7–8
Machine Learning: Data Science and ML Refresher

6.2.1. Policy Learning [5] Suport Vectors. [Online]. Available: [Link]

Bellman Equation: Model-based recursive equation for state value net / figure / Overview - of - SVM - algorithm - a - SVM - for -
updates. classification-b-SVM-for-regression_fig1_347831458.

∑
𝑉(𝑠) ← max 𝑃(𝑠′ |𝑠, 𝑎) [𝑅(𝑠, 𝑎, 𝑠′ ) + 𝛾𝑉 ∗ (𝑠′ )]
𝑎
𝑠′
∑ ∑ (33)
𝑉 𝜋 (𝑠) = 𝜋(𝑎|𝑠) 𝑃(𝑠′ |𝑠, 𝑎) [𝑅(𝑠, 𝑎, 𝑠′ ) + 𝛾𝑉 𝜋 (𝑠′ )]
𝑎 𝑠′

NOTE: Requires known probabilities and rewards

6.2.2. Q-Learning
Model-free algorithm to decide best actions from trial and error.

𝑄(𝑠, 𝑎) ← 𝑄(𝑠, 𝑎) + 𝛼 [𝑅(𝑠, 𝑎, 𝑠′ ) + 𝛾 max 𝑄(𝑠′ , 𝑎′ ) − 𝑄(𝑠, 𝑎)]

𝑎′

∑ ∑
𝑄𝜋 (𝑠, 𝑎) = 𝑃(𝑠′ |𝑠, 𝑎) [𝑅(𝑠, 𝑎, 𝑠′ ) + 𝛾 𝜋(𝑎′ |𝑠′ )𝑄𝜋 (𝑠′ , 𝑎′ )] (34)
𝑠′ 𝑎′

𝜋(𝑠) = arg max 𝑄(𝑠, 𝑎)

𝑎

6.2.3. Exploration Exploitation

Balance exploration (to search new paths) and exploitation (capitalize
on high rewards)

𝜖-Greedy

random action with probability 𝜖,

𝑎={ (35)
arg max 𝑎 𝑄(𝑠, 𝑎) with probability 1 − 𝜖.

7. Nuances
7.1. Imbalanced Data
• Data Sampling
• Weighted Loss Functions
• Tree Based Methods (Robust)
• Precision-Recall instead of ROC (False Positives)

7.2. Biases
• Identify difference in distributions for features
• Up-sample data across biased attributes
• Normalize with respect to groups
• Mask / Group together data
• Embed to lower dimension with VAE

7.3. More Data Unhelpful

• Data is white noise
• Model is restrictive (cannot learn more)
• Second round of data collection might bring biases
• Increase the imbalance

References
[1] Area Under Curves. [Online]. Available: [Link]
[Link]/how-and-why-i-switched-from-the-roc-curve-
to- the - precision- recall- curve - to- analyze - my- imbalanced-
6171da91c6b8.
[2] Bias Variance Tradeoff. [Online]. Available: https : / / en .
[Link]/wiki/Bias%E2%80%93variance_tradeoff.
[3] Decision Trees. [Online]. Available: [Link]
com / machine - learning - decision - tree - classification -
algorithm.
[4] Pair Plots. [Online]. Available: [Link]
python-seaborn-pairplot-method/.

8–8

ML Study Guide
No ratings yet
ML Study Guide
21 pages
Steps for Machine Learning Projects
No ratings yet
Steps for Machine Learning Projects
9 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
14 pages
Applied Machine Learning and MLOps Guide
No ratings yet
Applied Machine Learning and MLOps Guide
9 pages
Machine Learning Fundamentals Guide
No ratings yet
Machine Learning Fundamentals Guide
7 pages
Spark Neural Network Overview
No ratings yet
Spark Neural Network Overview
43 pages
Machine Learning & Data Prep Guide
No ratings yet
Machine Learning & Data Prep Guide
38 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
80 pages
Comprehensive Machine Learning Notes
No ratings yet
Comprehensive Machine Learning Notes
85 pages
Feature Engineering and Selection Guide
No ratings yet
Feature Engineering and Selection Guide
32 pages
Data Science Notes: Numpy, Pandas, Machine Learning, and More
No ratings yet
Data Science Notes: Numpy, Pandas, Machine Learning, and More
53 pages
Mastering AI/ML: Concepts & Examples
No ratings yet
Mastering AI/ML: Concepts & Examples
31 pages
Machine Learning Basics and Data Preprocessing
No ratings yet
Machine Learning Basics and Data Preprocessing
35 pages
Machine Learning Concepts and Python Guide
No ratings yet
Machine Learning Concepts and Python Guide
589 pages
Comprehensive Machine Learning Guide
No ratings yet
Comprehensive Machine Learning Guide
17 pages
Supervised Machine Learning Overview
No ratings yet
Supervised Machine Learning Overview
38 pages
Understanding Data in Machine Learning
No ratings yet
Understanding Data in Machine Learning
53 pages
Machine Learning With Python
100% (3)
Machine Learning With Python
137 pages
Data Science Fundamentals Explained
No ratings yet
Data Science Fundamentals Explained
29 pages
U1 Int395
No ratings yet
U1 Int395
38 pages
Data Modeuling & Evaluation - Practical - List
No ratings yet
Data Modeuling & Evaluation - Practical - List
5 pages
Machine Learning Feature Selection Guide
100% (1)
Machine Learning Feature Selection Guide
5 pages
ML Cheatsheets
No ratings yet
ML Cheatsheets
22 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
24 pages
Preprocessing
No ratings yet
Preprocessing
10 pages
Machine Learning - White BG
No ratings yet
Machine Learning - White BG
5 pages
Machine Learning Project Steps Guide
No ratings yet
Machine Learning Project Steps Guide
8 pages
CSE274 - Applied Machine Learning: Complete Theory Notes With Subtopics Units I & II (CA1 - Examiner Level)
No ratings yet
CSE274 - Applied Machine Learning: Complete Theory Notes With Subtopics Units I & II (CA1 - Examiner Level)
8 pages
Data Preparation in Machine Learning
No ratings yet
Data Preparation in Machine Learning
11 pages
Hands On Machine Learning With R 1st Edition Brad Boehmke (Author) Ebook Testbank Solutions Full Detail PDF
100% (1)
Hands On Machine Learning With R 1st Edition Brad Boehmke (Author) Ebook Testbank Solutions Full Detail PDF
153 pages
Machine Learning Process and Data Types
No ratings yet
Machine Learning Process and Data Types
4 pages
Introduction to Data Science Basics
No ratings yet
Introduction to Data Science Basics
10 pages
Data Splitting and Transformation Methods
No ratings yet
Data Splitting and Transformation Methods
96 pages
Feature Engineering & Selection Techniques
No ratings yet
Feature Engineering & Selection Techniques
27 pages
Feature Engineering in Machine Learning
No ratings yet
Feature Engineering in Machine Learning
6 pages
Data Pre-Processing Techniques Explained
No ratings yet
Data Pre-Processing Techniques Explained
13 pages
06 Machine Learning Fundamentals
No ratings yet
06 Machine Learning Fundamentals
13 pages
Feature Engineering Basics in Python
No ratings yet
Feature Engineering Basics in Python
33 pages
Machine Learning Basics and Preprocessing
No ratings yet
Machine Learning Basics and Preprocessing
52 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
23 pages
AI Feature Extraction & Model Building
No ratings yet
AI Feature Extraction & Model Building
35 pages
Python OOP and Data Analysis Concepts
No ratings yet
Python OOP and Data Analysis Concepts
31 pages
Machine Learning Basics and Steps
No ratings yet
Machine Learning Basics and Steps
13 pages
Machine Learning & AI Course Overview
No ratings yet
Machine Learning & AI Course Overview
67 pages
05 Basic Practice
No ratings yet
05 Basic Practice
32 pages
Data Preprocessing Guide for ML
No ratings yet
Data Preprocessing Guide for ML
9 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
25 pages
Machine Learning Project Guidance
No ratings yet
Machine Learning Project Guidance
17 pages
Machine Learning Methodology Overview
No ratings yet
Machine Learning Methodology Overview
53 pages
MLE Exam Notes
No ratings yet
MLE Exam Notes
16 pages
Feature Selection Techniques in ML
No ratings yet
Feature Selection Techniques in ML
20 pages
Understanding Decision Tree Classification
No ratings yet
Understanding Decision Tree Classification
16 pages
Classification and Prediction in Data Mining
No ratings yet
Classification and Prediction in Data Mining
21 pages
MTech CSE Course Structure Syllabus 2025-26
No ratings yet
MTech CSE Course Structure Syllabus 2025-26
36 pages
Señorita Banana Transport Classification
No ratings yet
Señorita Banana Transport Classification
6 pages
Data Analytics Masterbook 2025 Guide
No ratings yet
Data Analytics Masterbook 2025 Guide
16 pages
Deep Learning for Stock Prediction
No ratings yet
Deep Learning for Stock Prediction
14 pages
Machine Learning Exam Questions 2023
No ratings yet
Machine Learning Exam Questions 2023
3 pages
BCS602 Machine Learning Model Paper
25% (4)
BCS602 Machine Learning Model Paper
36 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
51 pages
Decision Trees for Product Testing Analysis
No ratings yet
Decision Trees for Product Testing Analysis
13 pages
Yield Prediction Using Machine Learning
0% (1)
Yield Prediction Using Machine Learning
8 pages
Machine Learning for Terrorism Prediction
No ratings yet
Machine Learning for Terrorism Prediction
5 pages
Football Player Performance Prediction
No ratings yet
Football Player Performance Prediction
11 pages
Handwritten Digit Recognition Techniques
No ratings yet
Handwritten Digit Recognition Techniques
9 pages
IoT Botnet Attack Detection Methods
No ratings yet
IoT Botnet Attack Detection Methods
10 pages
Cs3491 Aiml Lab - Batch 2
No ratings yet
Cs3491 Aiml Lab - Batch 2
59 pages
Optimization for Machine Learning Guide
No ratings yet
Optimization for Machine Learning Guide
21 pages
Intelligent Crop & Fertilizer Recommender
No ratings yet
Intelligent Crop & Fertilizer Recommender
6 pages
Anemia Detection via Machine Learning
No ratings yet
Anemia Detection via Machine Learning
7 pages
AICTE Curriculum for B.E. CSE 2024-25
No ratings yet
AICTE Curriculum for B.E. CSE 2024-25
77 pages
Digital App Enhances Olive Farming in Saudi
No ratings yet
Digital App Enhances Olive Farming in Saudi
22 pages
1 s2.0 S0038092X2200843X Main
No ratings yet
1 s2.0 S0038092X2200843X Main
12 pages
Decision Tree Analysis Steps Guide
No ratings yet
Decision Tree Analysis Steps Guide
2 pages
Bayes' Theorem and Bayesian Networks
No ratings yet
Bayes' Theorem and Bayesian Networks
40 pages
Landslide Susceptibility Mapping with RF
No ratings yet
Landslide Susceptibility Mapping with RF
14 pages
Rule-Based Classification Techniques
No ratings yet
Rule-Based Classification Techniques
21 pages
Symptoms Based Multiple Disease Prediction Model Using Machine Learning Approach
No ratings yet
Symptoms Based Multiple Disease Prediction Model Using Machine Learning Approach
7 pages
Depreciation Methods Explained for Students
No ratings yet
Depreciation Methods Explained for Students
5 pages
Crime Prediction with Machine Learning
No ratings yet
Crime Prediction with Machine Learning
10 pages
Mmpc010-Managerial Economics Exam Guessing Question and Answers
No ratings yet
Mmpc010-Managerial Economics Exam Guessing Question and Answers
61 pages

DS ML Machine Learning I

Uploaded by

DS ML Machine Learning I

Uploaded by

Machine Learning: Data Science and ML Refresher

1 Machine Learning 1 Data Type Description

5.3 More Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5.4 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2.3. Box Cox Transform

Table 4. Feature Scaling

• Test Statistic (direction of influence)

VIF is calculated by regressing each feature on the other features

4.4. Sampling Linear Discriminant Analysis:

5.1.2. Gradient Descent • 𝑅2 increases with variables; use adjusted 𝑅2

𝛽 (𝑡+1) = 𝛽 (𝑡) − 𝛼∇𝐽(𝛽 (𝑡) )

Prediction Interval Interpretation: A change in X by 1 unit, increases / decreases the

∇𝑧𝑘 Loss = 𝑝𝑖𝑘 − 𝑦𝑖𝑘

• Features are independent 5.3.2. Support Vector Machines

One vs One (OvO): Fit 𝑛2 models • Maximize Information Gain

Discrete: Evaluate each categorical value vs others

• Linear C 5.0 Algorithm

• Randomly subset samples at each node

Out of Bag: Remaining samples not used for validation score

Figure 5. Decision Tree • Run iteratively

Adaptive Boosting (AdaBoost) • Assumes spherical clusters

𝐹𝑀 (𝑥) = 𝐹𝑀−1 (𝑥) + 𝜂ℎ𝑀 (𝑥) 5.5.3. GMM

𝑟𝑖 = −∇𝐿(𝐹𝑀−1 (𝑥𝑖 )) Gaussian Mixture Models: Soft clustering

ℎ𝑚 (𝑥) = Model at the 𝑚th iteration (fit on residuals) 5.5.4. Metrics

Table 17. Gradient Boosting

LightGBM XGBoost • 𝑠(𝑖): Silhouette score for point 𝑖,

LightGBM: Adjusted Rand Index: Similarities by pairwise points

Hard Voting: Majority prediction 6.1. Anomaly Detection

6.2.1. Policy Learning [5] Suport Vectors. [Online]. Available: [Link]

NOTE: Requires known probabilities and rewards

𝑄(𝑠, 𝑎) ← 𝑄(𝑠, 𝑎) + 𝛼 [𝑅(𝑠, 𝑎, 𝑠′ ) + 𝛾 max 𝑄(𝑠′ , 𝑎′ ) − 𝑄(𝑠, 𝑎)]

𝜋(𝑠) = arg max 𝑄(𝑠, 𝑎)

6.2.3. Exploration Exploitation

random action with probability 𝜖,

7.3. More Data Unhelpful

You might also like