200+ algorithms across 6 domains of artificial intelligence — all implemented from scratch using pure NumPy.
No TensorFlow. No PyTorch. No HuggingFace. No sklearn (except in benchmarks). Just math, logic, and working code.
| # | Repo | Algorithms | Focus |
|---|---|---|---|
| 1 | ML-From-Scratch | 48 | Regression, Classification, Ensembles, Clustering, Dimensionality Reduction, Neural Nets, Optimization, Recommenders, RL, Probabilistic |
| 2 | DL-From-Scratch | 28 | CNNs, RNNs, LSTMs, GRUs, Attention, Transformers, GANs, VAEs, Normalization, Residual Networks |
| 3 | NLP-From-Scratch | 33 | Tokenization, Word Embeddings (Word2Vec, GloVe, FastText, ELMo), Sequence Labeling (HMM, CRF), Text Classification, Topic Modeling (LDA, NMF, LSA), Text Generation, Evaluation Metrics |
| 4 | LLM-From-Scratch | 28 | GPT Architectures, Attention Variants, Positional Encodings, KV Cache, MoE, RLHF, PPO, DPO, Quantization, Fine-Tuning, RAG |
| 5 | GenAI-From-Scratch | 30 | VAEs, GANs, Diffusion Models, Flow Matching, Autoregressive Models, Normalizing Flows, Energy-Based Models, Score Matching |
| 6 | TimeSeries-From-Scratch | 33 | ARIMA, SARIMA, ETS, Holt-Winters, State Space Models, Kalman Filter, Structural Time Series, Decomposition (STL), Feature Extraction, Backtesting |
Total: 200+ algorithms and growing.
Most ML education stops at model.fit(). This series exists to answer the question:
"What actually happens when I call
.fit()?"
Each repo in this series is built on three principles:
Every algorithm is written in raw NumPy. You can step through the code, examine every matrix multiplication, and understand exactly how predictions are made.
Despite being educational, every repo includes:
- Type hints and NumPy-style docstrings
pyproject.tomlfor pip-installable packages- Unit tests with pytest
- Ruff for linting
- Pre-commit hooks
- CI/CD with GitHub Actions
- Runnable examples for every algorithm
- Jupyter notebooks for visualization
Every algorithm folder includes a 12-section README with:
- Intuition and mathematical formulation
- Pseudocode
- Time and space complexity analysis
- Interview Q&A — real questions from real interviews
ML-From-Scratch
├── Linear Regression
├── Logistic Regression
├── Decision Trees
├── Random Forest
├── SVM
├── K-Means
├── PCA
└── Neural Networks (MLP)
DL-From-Scratch
├── CNN
├── RNN / LSTM / GRU
├── Attention & Transformers
├── Autoencoders & VAEs
└── GANs
NLP-From-Scratch
├── Tokenization & Preprocessing
├── Word Embeddings (Word2Vec, GloVe, FastText)
├── Sequence Labeling (HMM, CRF)
├── Text Classification
└── Topic Modeling (LDA, NMF)
LLM-From-Scratch
├── GPT Architecture
├── Attention Variants (Grouped, Flash, Multi-Latent)
├── Training (Pre-training, SFT, RLHF)
├── Inference (KV Cache, Speculative Decoding)
└── Applications (RAG, Agents, Quantization)
GenAI-From-Scratch
├── VAEs & GANs
├── Diffusion Models
├── Flow Matching
├── Normalizing Flows
└── Score-Based Models
TimeSeries-From-Scratch
├── Classical (ARIMA, SARIMA, ETS)
├── State Space Models (Kalman, DLM)
├── Decomposition (STL, Seasonal)
├── Feature Extraction
└── Evaluation & Backtesting
Regression: Linear Regression (Normal Equation, GD), Polynomial, Ridge, Lasso, Elastic Net
Classification: Logistic Regression, KNN, Naive Bayes (Gaussian, Multinomial), Perceptron, SVM (Linear, Kernel), Decision Tree (Classifier, Regressor)
Ensembles: Bagging, Random Forest, AdaBoost, Gradient Boosting, Stacking
Clustering: K-Means, K-Means++, Hierarchical, DBSCAN, Mean Shift, GMM
Dimensionality Reduction: PCA, SVD, LDA, t-SNE
Neural Networks: Perceptron, MLP, CNN, RNN, LSTM, Autoencoder
Optimization: Batch GD, SGD, Mini-Batch GD, Momentum, RMSProp, Adam
Recommender Systems: Collaborative Filtering, Matrix Factorization
Reinforcement Learning: Q-Learning, SARSA
Probabilistic: HMM, Apriori
Foundations: Dense Layer, Activation Functions (ReLU, Sigmoid, Tanh, Softmax, GELU, Swish), Weight Initialization (Xavier, He), Loss Functions, Batch Normalization, Layer Normalization, Dropout
Convolutional: Conv2D, MaxPooling, Flatten
Recurrent: RNN Cell, LSTM Cell, GRU Cell, Bidirectional RNN
Attention: Scaled Dot-Product Attention, Multi-Head Attention, Self-Attention, Cross-Attention
Transformer: Transformer Encoder, Transformer Decoder, Positional Encoding
Advanced: Residual Block, Skip Connection, GAN, VAE
Preprocessing: Tokenizer, Subword Tokenizer (BPE), Text Normalizer, Edit Distance, Spell Checker
Feature Extraction: Bag of Words, TF-IDF, N-Gram Language Model, PMI, Text Vectorizer
Word Embeddings: Word2Vec CBOW, Word2Vec Skip-Gram, GloVe, FastText, ELMo (LSTM)
Sequence Labeling: HMM, Viterbi Decoder, CRF, Maximum Entropy Classifier
Classification: Naive Bayes (Text), Logistic Regression (Text), SVM (Text), Perceptron (Text)
Topic Modeling: LDA, NMF, LSA
Generation: Beam Search, Temperature Sampling, Top-K & Top-P Sampling
Metrics: BLEU, ROUGE, Perplexity, Word Error Rate
Architecture: GPT, Causal Attention, Grouped Query Attention, Multi-Query Attention, Multi-Latent Attention, Flash Attention, ALiBi, RoPE, Relative Positional Encoding
Scaling: Mixture of Experts, Sparse MoE, Model Parallelism, Tensor Parallelism, Pipeline Parallelism
Training: Pre-training, Causal LM Loss, Curriculum Learning, Warmup-Cosine Schedule
Fine-Tuning: SFT, LoRA, QLoRA, Adapter
RLHF: PPO, DPO, Reward Modeling, Rejection Sampling
Inference: KV Cache, Speculative Decoding, GQA, Quantization (GPTQ, AWQ)
Applications: RAG, Tool Use, Agent Loop
VAEs: VAE, Beta-VAE, Conditional VAE, VQ-VAE, VQ-VAE-2
GANs: GAN, DCGAN, Conditional GAN, InfoGAN, Wasserstein GAN, WGAN-GP, LSGAN, CycleGAN, StyleGAN, Progressive GAN, SAGAN, BigGAN
Diffusion: DDPM, DDIM, Classifier-Free Guidance, Latent Diffusion, Stable Diffusion
Advanced: Flow Matching, Autoregressive Models (PixelCNN, PixelRNN), Normalizing Flows (RealNVP, Glow), EBM, Score Matching (SMLD, NCSN)
Foundations: White Noise, Random Walk, Autocorrelation Function, Partial ACF, Stationarity Tests, Differencing, Lag Features, Rolling Statistics
Baselines: Naive Forecast, Seasonal Naive, Drift Method, Mean Forecast
Classical: AR, MA, ARMA, ARIMA, SARIMA, SARIMAX
Exponential Smoothing: Simple Exponential, Holt's Linear, Holt-Winters, Damped Trend, ETS
Advanced Statistical: GARCH, ARCH, VAR, Dynamic Regression
State Space: Kalman Filter, Kalman Smoother, DLM, Structural Time Series
Decomposition: Classical Decompose, STL, Seasonal Decompose, Moving Average Smoothing
Features: Time Series Features (mean, variance, trend, seasonality, entropy), Feature Engineering
Evaluation: Time Series Cross-Validation, Walk-Forward Validation, Metrics (MSE, MAE, MAPE, SMAPE, MASE)
# Clone any repo
git clone https://github.com/rohanmistry231/ML-From-Scratch.git
cd ML-From-Scratch
# Install dependencies (numpy + matplotlib + sklearn for benchmarks)
pip install -r requirements.txt
# Run any algorithm
cd algorithms/01_supervised_regression/linear_regression_normal_equation/
python example.pyEach repo is fully standalone. Clone one, clone all — they share no dependencies between them.
| Principle | Why |
|---|---|
| No ML libraries in algorithm code | The whole point is learning what's inside the box |
| NumPy only | Vectorized math is how industry implements these algorithms |
scikit-learn only in example.py |
For validation and benchmarking against known implementations |
| Every algorithm has a README | Code without understanding teaches nothing |
| Every README has Interview Q&A | Make your study immediately useful for job prep |
| Type hints & docstrings | Professional-grade, readable code |
| Tests & CI | The code actually works, not just looks good |
All repos are MIT licensed. Use them, learn from them, build on them.
If this series helps you learn or land a job — drop a star on any (or all) of the repos. It helps others find them.
"You don't truly understand an algorithm until you can implement it from scratch."