LEARNING ROADMAP
Data Science Roadmap
From Python Basics to Production-Ready ML Models
Intensive Balanced Flexible
40-50 hrs/week 15-20 hrs/week 8-10 hrs/week
■ 7 Phases ★ Beginner to Advanced
[Link]
Data Science Roadmap
Overview
This roadmap will guide you through the complete data science journey. You'll build a strong
foundation in Python and statistics, understand the mathematics that powers machine learning
(and why it matters), master data analysis and visualization, and learn to build, evaluate, and
deploy ML models. By the end, you'll be equipped to solve real-world problems with data.
Who is this for?
Aspiring data scientists, analysts looking to upskill, software engineers transitioning to ML, students in
STEM fields, or anyone curious about making sense of data and building intelligent systems.
Prerequisites
Basic computer literacy
High school mathematics (comfortable with algebra)
Curiosity about patterns in data
Willingness to think analytically
A computer with Python installed
What you'll learn
Write Python code for data manipulation and analysis
Apply statistical methods to draw insights from data
Understand the math behind ML algorithms (not just use them as black boxes)
Build and evaluate machine learning models
Create compelling data visualizations
Deploy models to production
Communicate findings to stakeholders
Overview
Choose Your Pace
Learning Plans
Choose a plan that fits your schedule. All plans cover the same content.
Intensive Fast Track Balanced Recommended Flexible Self-paced
40-50 240- 15-20 270- 8-10 288-
hrs/week 300 hrs/week 360 hrs/week 360
total hrs total hrs total hrs
Full-time immersive learning. Balanced pace allowing deep Flexible learning for those with
Ideal for career changers who understanding while maintaining limited availability. More time to
can dedicate themselves other commitments. The absorb complex mathematical
completely to this transition. recommended path for most concepts.
learners.
Ideal for: Ideal for:
Career changers with Ideal for: Full-time employees
dedicated time Working professionals Parents and caregivers
Recent graduates Students with coursework Those learning multiple skills
Those preparing for DS Those who prefer thorough Weekend learners
interviews understanding
Bootcamp-style learners Self-paced learners
Learning Plans
Schedule Breakdown
Phase Schedule by Plan
Phase Topic Intensive Balanced Flexible
1 Python Programming Foundation Week 1 Weeks 1-3 Weeks 1-5
2 Statistics & Probability Week 2 Weeks 4-6 Weeks 6-11
3 Mathematics for Machine Learning Weeks 3-4 Weeks 7-10 Weeks 12-19
4 Data Analysis & Visualization Week 5 Weeks 11-13 Weeks 20-25
5 Machine Learning Weeks 6-7 Weeks 14-18 Weeks 26-35
6 Deep Learning & Specialization Week 8 Weeks 19-22 Weeks 36-43
7 Production & Career Week 9 Weeks 23-26 Weeks 44-52
Schedule
Phase 1 of 7
1 Python Programming Foundation
Intensive: Week 1 Balanced: Weeks 1-3 Flexible: Weeks 1-5
Python is the lingua franca of data science. Before diving into statistics and ML, you need to be comfortable
writing Python code, working with data structures, and using the core libraries that power the data science
ecosystem.
Learning Objectives
Write clean, readable Python code
Work with lists, dictionaries, and other data structures
Use functions and classes effectively
Navigate Jupyter notebooks fluently
Install and manage packages with pip/conda
Understand Python's role in the data ecosystem
Topics to Cover
Python Fundamentals Data Structures
Core language features Organizing and storing data
Variables & Data Types Lists & Tuples
Control Flow (if/else, loops) Dictionaries & Sets
Functions & Lambda Strings & String Methods
List Comprehensions Working with Files
Error Handling JSON Handling
Development Environment Python for Data
Tools of the trade Introduction to data libraries
Jupyter Notebooks NumPy Basics
VS Code Setup Pandas Introduction
Virtual Environments Reading CSV/Excel Files
pip & conda Basic Data Inspection
Git Basics Simple Plots with Matplotlib
Hands-on Projects
Data File Processor Personal Expense Analyzer
Build a CLI tool that reads CSV files, performs basic Create a notebook that analyzes your bank statement or
cleaning (removing duplicates, handling missing values), expense data, categorizes spending, and visualizes
and outputs summary statistics trends over time
File I/O Data Structures Functions Pandas Data Cleaning Matplotlib Jupyter
Error Handling
Resources
DOCUMENTATION Python Official Tutorial
BOOK Automate the Boring Stuff with Python
TUTORIAL Real Python Tutorials
Milestone
⚑ Complete both projects. You should be able to write Python scripts that read, process, and analyze
data files without constantly Googling basic syntax.
Phase 1
Phase 2 of 7
2 Statistics & Probability
Intensive: Week 2 Balanced: Weeks 4-6 Flexible: Weeks 6-11
Statistics is the backbone of data science. It's how we make sense of data, quantify uncertainty, and make
decisions with confidence. This isn't just theory — every concept here directly applies to real data science work.
Skip this phase at your peril; without statistics, you're just guessing.
Learning Objectives
Summarize datasets using descriptive statistics
Understand probability and its role in prediction
Work with different probability distributions
Conduct and interpret hypothesis tests
Quantify relationships with correlation and covariance
Apply Bayes' theorem to update beliefs with data
Topics to Cover
Descriptive Statistics Probability Fundamentals
WHY: Before building any model, you must understand WHY: Data science is about making predictions under
your data. Descriptive stats help you spot outliers, uncertainty. Probability gives us the language and tools
understand distributions, and identify data quality to quantify 'how likely' something is.
issues. Basic Probability Rules
Mean, Median, Mode Conditional Probability
Variance & Standard Deviation Independence
Percentiles & Quartiles Bayes' Theorem
Skewness & Kurtosis Expected Value
Data Visualization for Stats
Probability Distributions Inferential Statistics
WHY: Real-world phenomena follow patterns WHY: You rarely have all the data. Inference lets you draw
(distributions). Heights are normal, customer arrivals are conclusions about populations from samples — essential
Poisson, click-through rates are binomial. Knowing for A/B testing, surveys, and any real-world analysis.
distributions lets you model reality. Sampling & Sampling Distributions
Normal Distribution Confidence Intervals
Binomial & Bernoulli Hypothesis Testing
Poisson Distribution p-values & Significance
Uniform Distribution t-tests & Chi-square Tests
Central Limit Theorem
Hands-on Projects
A/B Test Analyzer Sales Data Statistical Report
Given data from a website A/B test (conversion rates for Analyze a retail dataset: compute descriptive statistics
control vs treatment), determine if the difference is by category, identify correlations between variables, test
statistically significant and calculate the confidence hypotheses about seasonal effects
interval
Descriptive Statistics Correlation
Hypothesis Testing Confidence Intervals
Distribution Fitting Statistical Visualization
Statistical Significance Python [Link]
Resources
BOOK Think Stats (Free Book)
COURSE Khan Academy Statistics
INTERACTIVE Seeing Theory (Visual Stats)
Milestone
⚑ Complete the A/B Test Analyzer project. You should be able to look at any dataset and immediately
compute relevant statistics, identify distributions, and make statistically-grounded conclusions.
Phase 2
Phase 3 of 7
3 Mathematics for Machine Learning
Intensive: Weeks 3-4 Balanced: Weeks 7-10 Flexible: Weeks 12-19
This is where many learners struggle or skip entirely — and it's why they later hit a ceiling. Understanding the
math doesn't mean deriving proofs; it means knowing WHY algorithms work, WHEN they'll fail, and HOW to
debug them. This knowledge separates practitioners who can only call sklearn functions from those who can
actually solve novel problems.
Learning Objectives
Understand vectors, matrices, and their operations
Grasp how linear transformations relate to ML
Know why gradient descent works (calculus)
Connect probability theory to model uncertainty
Recognize the math when reading ML papers or documentation
Topics to Cover
Linear Algebra Calculus for Optimization
WHY: Machine learning IS linear algebra. Your data is a WHY: How does a model 'learn'? It minimizes a loss
matrix. Features are vectors. Neural networks are matrix function using gradient descent — which is calculus.
multiplications. PCA uses eigenvalues. Recommendation Understanding derivatives tells you why learning rates
systems use matrix factorization. Without this, ML is a matter, why models get stuck, and how to fix them.
black box. Derivatives & Gradients
Vectors & Vector Operations Partial Derivatives
Matrices & Matrix Multiplication Chain Rule (Backpropagation!)
Linear Transformations Gradient Descent Intuition
Eigenvalues & Eigenvectors Convexity & Local Minima
Matrix Decomposition (SVD, PCA)
Probability for ML
WHY: ML models don't just predict — they estimate
probabilities. Understanding MLE helps you grasp
logistic regression. Bayesian thinking enables
uncertainty quantification. This is how you know when
your model is confident vs guessing.
Maximum Likelihood Estimation
Bayesian Inference Basics
Prior, Likelihood, Posterior
Probabilistic Graphical Models (intro)
Information Theory Basics
Hands-on Projects
Implement Linear Regression from Scratch PCA Visualization Tool
Build linear regression using only NumPy — implement Implement PCA from scratch using eigendecomposition.
the normal equation AND gradient descent. Visualize how Apply it to a high-dimensional dataset (like faces or
gradient descent converges. Compare with sklearn's MNIST) and visualize the principal components.
implementation.
Eigenvalues/Eigenvectors
Matrix Operations Gradient Descent
Dimensionality Reduction Matrix Decomposition
Loss Functions NumPy
Visualization
Resources
VIDEO 3Blue1Brown: Essence of Linear Algebra
VIDEO 3Blue1Brown: Essence of Calculus
BOOK Mathematics for Machine Learning (Book)
COURSE Khan Academy Linear Algebra
Milestone
Implement linear regression from scratch using gradient descent. You should be able to explain
⚑ WHY the gradient points toward the minimum, WHAT eigenvalues represent, and HOW probability
connects to model training.
Phase 3
Phase 4 of 7
4 Data Analysis & Visualization
Intensive: Week 5 Balanced: Weeks 11-13 Flexible: Weeks 20-25
Now we combine your Python skills, statistical knowledge, and mathematical foundation into practical data
analysis. This is the bread and butter of data science — exploring datasets, cleaning messy data, engineering
features, and creating visualizations that tell stories.
Learning Objectives
Master Pandas for data manipulation
Handle missing data, outliers, and data quality issues
Perform exploratory data analysis (EDA) systematically
Create publication-quality visualizations
Engineer features that improve model performance
Work with different data formats (CSV, JSON, SQL, APIs)
Topics to Cover
Pandas Mastery Data Cleaning
The workhorse of data analysis 80% of data science is cleaning data
DataFrames & Series Handling Missing Values
Indexing & Selection Outlier Detection & Treatment
Groupby & Aggregation Data Type Conversions
Merging & Joining String Cleaning & Regex
Pivot Tables Deduplication
Time Series in Pandas Data Validation
Exploratory Data Analysis Data Visualization
Understanding data before modeling Communicating insights visually
Univariate Analysis Matplotlib Deep Dive
Bivariate Analysis Seaborn for Statistical Plots
Correlation Analysis Plotly for Interactive Viz
Distribution Analysis Choosing the Right Chart
Automated EDA Tools Dashboard Basics
Documenting Findings Visualization Best Practices
Hands-on Projects
End-to-End EDA: Titanic or Housing Dataset Interactive Dashboard
Perform comprehensive EDA on a classic dataset. Build an interactive dashboard using Plotly Dash or
Document every finding, handle all data quality issues, Streamlit that allows users to explore a dataset, filter by
create visualizations for each insight, and prepare the dimensions, and view dynamic visualizations
data for modeling.
Plotly/Streamlit Interactive Viz User Interface
Pandas Data Cleaning Statistical Analysis
Data Filtering
Visualization Documentation
Resources
DOCUMENTATION Pandas Documentation
BOOK Python Data Science Handbook
TUTORIAL Seaborn Tutorial
Milestone
⚑ Complete a full EDA project with a written report. You should be able to take any messy dataset and,
within hours, have it cleaned, analyzed, and visualized with key insights documented.
Phase 4
Phase 5 of 7
5 Machine Learning
Intensive: Weeks 6-7 Balanced: Weeks 14-18 Flexible: Weeks 26-35
This is what you've been building toward. With your foundation in Python, statistics, and math, you'll now
understand ML algorithms deeply — not just how to call them, but why they work, when to use them, and how
to evaluate them properly.
Learning Objectives
Understand the ML workflow end-to-end
Master supervised learning (regression & classification)
Apply unsupervised learning (clustering & dimensionality reduction)
Evaluate models properly (avoiding common pitfalls)
Tune hyperparameters effectively
Handle imbalanced data and other real-world challenges
Topics to Cover
ML Fundamentals Regression Algorithms
Core concepts that apply everywhere Predicting continuous values
Supervised vs Unsupervised Linear Regression (revisited)
Train/Validation/Test Split Polynomial Regression
Bias-Variance Tradeoff Regularization (Ridge, Lasso)
Overfitting & Underfitting Decision Tree Regression
Cross-Validation Random Forest Regression
Feature Scaling Gradient Boosting (XGBoost)
Classification Algorithms Unsupervised Learning
Predicting categories Finding patterns without labels
Logistic Regression K-Means Clustering
Decision Trees Hierarchical Clustering
Random Forests DBSCAN
Support Vector Machines PCA (applied)
K-Nearest Neighbors t-SNE & UMAP
Naive Bayes Anomaly Detection
Gradient Boosting Classifiers
Model Evaluation & Selection
Knowing if your model is actually good
Regression Metrics (MSE, MAE, R²)
Classification Metrics (Accuracy, Precision, Recall, F1)
ROC-AUC & PR Curves
Confusion Matrix Analysis
Hyperparameter Tuning
Model Selection Strategies
Hands-on Projects
House Price Prediction Customer Churn Prediction
Build a complete ML pipeline for predicting house prices. Predict which customers will churn using classification.
Compare multiple algorithms, perform feature Handle class imbalance, interpret feature importance,
engineering, tune hyperparameters, and create a final and provide business recommendations based on your
report with model interpretation. model.
Regression Feature Engineering Classification Imbalanced Data
Model Comparison Hyperparameter Tuning Model Interpretation Business Insights
Customer Segmentation
Use clustering to segment customers based on behavior.
Determine optimal number of clusters, profile each
segment, and visualize the results.
Clustering Unsupervised Learning
Segment Profiling Visualization
Resources
DOCUMENTATION Scikit-learn Documentation
BOOK Hands-On Machine Learning (Aurélien Géron)
VIDEO StatQuest ML Playlist
Milestone
Complete all three projects with documented notebooks. You should be able to take a business
⚑ problem, frame it as an ML task, build and evaluate models, and communicate results to non-
technical stakeholders.
Phase 5
Phase 6 of 7
6 Deep Learning & Specialization
Intensive: Week 8 Balanced: Weeks 19-22 Flexible: Weeks 36-43
Deep learning has revolutionized AI. While not every problem needs neural networks, understanding them
opens doors to computer vision, NLP, and cutting-edge applications. Choose a specialization based on your
interests.
Learning Objectives
Understand neural network fundamentals
Build models with TensorFlow or PyTorch
Apply transfer learning for quick results
Choose and explore a specialization (NLP, CV, or Time Series)
Know when deep learning is appropriate vs overkill
Topics to Cover
Neural Network Fundamentals Computer Vision Track
How deep learning actually works If you're interested in image/video
Perceptrons & Activation Functions Convolutional Neural Networks
Forward & Backpropagation Image Classification
Loss Functions & Optimizers Object Detection
Regularization (Dropout, BatchNorm) Transfer Learning (ResNet, VGG)
TensorFlow/Keras Basics Data Augmentation
PyTorch Basics OpenCV Basics
NLP Track Time Series Track
If you're interested in text/language If you're interested in forecasting
Text Preprocessing Time Series Decomposition
Word Embeddings ARIMA Models
RNNs & LSTMs Prophet
Transformers & Attention LSTMs for Time Series
BERT & GPT (using, not training) Feature Engineering for Time
Hugging Face Ecosystem Evaluation Metrics for Forecasting
Hands-on Projects
Image Classifier with Transfer Learning Sentiment Analysis Pipeline
Build a custom image classifier using a pretrained model Build an NLP pipeline that classifies sentiment in product
(ResNet/VGG). Fine-tune on your own dataset and deploy reviews. Compare traditional ML with transformer-based
as a simple web app. approaches.
CNNs Transfer Learning Data Augmentation Text Processing Transformers
Model Deployment Model Comparison NLP
Resources
COURSE [Link] Practical Deep Learning
COURSE Deep Learning Specialization (Coursera)
COURSE Hugging Face Course
Milestone
⚑ Complete one specialization track project. You should understand when deep learning adds value
and be able to implement solutions using modern frameworks and pretrained models.
Phase 6
Phase 7 of 7
7 Production & Career
Intensive: Week 9 Balanced: Weeks 23-26 Flexible: Weeks 44-52
The final step: taking your models from notebooks to production and yourself from learner to professional. This
phase covers deployment, MLOps basics, and building the portfolio that gets you hired.
Learning Objectives
Deploy ML models as APIs
Understand MLOps fundamentals
Build a compelling portfolio
Prepare for data science interviews
Contribute to the DS community
Topics to Cover
Model Deployment MLOps Fundamentals
Getting models into production Maintaining ML systems
Flask/FastAPI for ML APIs ML Pipelines
Docker Basics Experiment Tracking (MLflow)
Cloud Deployment (AWS/GCP/Azure) Model Versioning
Streamlit for ML Apps Monitoring & Drift Detection
Model Serialization CI/CD for ML
API Best Practices Data Versioning (DVC)
Portfolio & Career
Getting hired
GitHub Portfolio Best Practices
Project Documentation
Technical Blogging
LinkedIn Optimization
Interview Preparation
Networking in DS Community
Hands-on Projects
End-to-End ML Application Portfolio Website
Take one of your previous projects and deploy it as a full Create a portfolio showcasing your best 3-5 projects.
web application. Include data pipeline, model training, Each project should have a clear problem statement,
API, and frontend. Document everything. methodology, results, and lessons learned.
Full Stack ML Deployment Documentation Communication Documentation
DevOps Basics Personal Branding Web Development
Resources
COURSE Made With ML (MLOps)
COURSE Full Stack Deep Learning
BOOK ML System Design Interview Book
Milestone
⚑ Deploy at least one ML application publicly and have a portfolio with 3-5 documented projects. You
should be able to discuss your projects in depth and demonstrate end-to-end ML capabilities.
Phase 7
Summary
Your Path Forward
Career Opportunities
✓ Data Scientist ✓ Machine Learning Engineer ✓ Data Analyst
Business Intelligence
✓ ✓ Research Scientist ✓ MLOps Engineer
Analyst
Tips for Success
Don't skip the math — it's what separates good data scientists from great ones
Build projects with real, messy data — Kaggle competitions are good but not enough
Learn to communicate findings to non-technical audiences
Statistics before machine learning, always
Understand algorithms deeply before using AutoML
Join communities: Kaggle, Reddit r/datascience, Twitter DS community
Read papers, but focus on understanding intuition over mathematical rigor
Contribute to open source or write about what you learn
Practice explaining your projects — interviews are about communication
Stay curious and keep learning — the field evolves rapidly
Next Steps
1. Set up your Python environment (Anaconda recommended)
2. Choose your learning pace (intensive, standard, or part-time)
3. Start with Phase 1 — don't skip ahead
4. Join a community for accountability and help
5. Begin building your GitHub profile from day one
6. Commit to consistency over intensity
Ready to Start Your Journey?
Everything in this roadmap is free to learn on your own. But if
you'd like structured guidance, hands-on projects, and mentor
support — we'd love to have you at [Link]
Explore [Link]
[Link]