Python for Data Science
Web M. Tech Course DA5301W
Module 2: Python Environments and PEP Standards
Dr. P.S Jayadev
Contents
➢Versions in Python
➢Modules, Packages and Libraries in Python
➢Dependencies and Requirements
➢Python Virtual Environments
➢Tools for Managing Virtual Environments
➢Python Scripts Vs Python Notebooks
➢PEP Standards and Linting
Python for DS 2
Motivation for this Module
Reproducibility : Same code + same data = same results
Experimentation : ML requires quick library/model switching
Collaboration : Ensuring all teams, testing and deployment
environments are consistent
Portability : From Jupyter prototyping to production APIs, Docker,
cloud deployment
This module is one of the foundation aspects of professional AI/DS
practice
Python for DS 3
Versions in Python
Python Software Foundation (PSF) oversees Python’s development and
releases.
Development work includes:
◦ Adding new features (syntax, libraries, performance improvements)
◦ Fixing bugs and security issues
◦ Deprecating outdated features
◦ Releasing updates regularly based on community proposals
Python follows semantic versioning:
◦ Major version → Python 3
◦ Minor version → Python 3.11, 3.12
◦ Micro/patch version → Python 3.12.2 (bug fixes, security updates)
3.13.3 is the latest stable version of python released in April 2025
Python for DS 4
Modules, Packages and Libraries
Python for DS 5
Modules, Packages and Libraries in Python
Python Module:
◦ A single Python file (.py) containing code, such as functions, classes, or
variables
◦ Modules help you organize and reuse code by allowing import of the
file in other programs
Python Package:
◦ A folder containing multiple modules a special __init__.py file
◦ Packages help group related modules together to keep large projects
organized
Python for DS 6
Modules, Packages and Libraries in Python
Python Sub-Package:
◦ A folder within a package that contains its own __init__.py and
modules or further sub-packages
◦ Sub-packages create hierarchical organization of moduleswithin
packages
Python Sub-Module:
◦ A module placed inside a sub-package
◦ Sub-modules are Python files (.py) nested deeper inside the package
hierarchy
Python for DS 7
Modules, Packages and Libraries in Python
Python Library:
◦ A collection of modules and/or packages
offering various functionalities as a whole
◦ Libraries include popular packages like NumPy,
pandas, which provide wide-ranging tools
Python for DS 8
Example of Numpy Library in Python
Package: numpy - the top-level folder containing many modules and
sub-packages.
Sub-Package: [Link] - folder inside numpy organizing linear
algebra-related code
Sub-Module: [Link] - Python file module in
the [Link] sub-package.
Module (not a sub-module): [Link] - Module directly inside
the numpy package for Fourier transforms
Reference on how to create a package in python: [Link]
[Link]/[Link]
Python for DS 9
Example of Numpy Library in Python
Q: What is contained in numpy library apart from numpy
package?
◦ Compiled binary extensions (C/C++/Fortran) for fast numerical
computation.
◦ Additional resources, metadata, and scripts supporting building,
testing, and package management
◦ Dependencies and supporting files that ensure full functionality and
performance
Python for DS 10
Versions in Python Libraries
Python Libraries also have versions just like python itself
◦ Eg: NumPy 1.23 vs NumPy 1.26
Further, Library compatibility is tied to Python version
◦ Example: TensorFlow 2.11 works only up to Python 3.10
Therefore, when choosing Python version, you must also check library
versions
Mismatch results in errors, installation failures and inconsistent results
Python for DS 11
Dependencies and Requirements
Python for DS 12
What are Dependencies?
Dependency: Any python packages or library your project relies on
Example:
◦ NumPy : for arrays related operations
◦ pandas : for data wrangling
◦ scikit-learn : for ML models
AI/DS projects may have dozens of dependencies
Dependencies also have their own dependencies leading to a dependency
tree
Example: scikit-learn depends on numpy, scipy, and joblib libraries
Python for DS 13
What are Requirements?
List of specific dependencies (libraries and their versions) needed for your
project
Generally managed as [Link] file
Eg:
Requirements file is very critical for reproducible results
Without requirements: one teammate may run on numpy 1.25, another on
numpy 1.26 – may lead to inconsistent outputs
Python for DS 14
Problem of Dependency Conflicts
Different projects often need different versions of the same library
◦ Each project may rely on a feature, bug fix, or behavior specific to a certain
library version
◦ Newer versions may remove deprecated functions, change APIs, or optimize
performance
Example:
◦ Project A might need torch==2.0
◦ Project B might need torch==1.12
Issue: Installing two versions of same library can results in conflicts &
broken projects
Solution: Isolated Environments is the way to go
Python for DS 15
Python Virtual Environments
Python for DS 16
Python Virtual Environments
Virtual Environment: Isolated space in a system with its own Python version and
dependencies installed
Benefits:
◦ Each project can have its virtual environment with its own library versions
◦ No clashes between projects
◦ Easy to replicate setup on another machine or server
Tools to manage Python environments:
◦ venv: Simple built-in Python tool for creating Python environments
◦ poetry: Python dependency and project manager that simplifies packaging, publishing,
and environment handling for projects.
◦ conda: Powerful environment and package manager that handles both Python and non-
Python dependencies (Say CUDA)
◦ uv: Fast, modern alternative offering improved speed and streamlined environment
management.
Python for DS 17
Python Virtual Environments
Feature/Tool venv conda uv poetry
Python + non-Python Python-only dependencies +
Scope Python-only environments
dependencies (CUDA,etc.)
Python-only environments
package management
Speed Moderate Slower (complex solver) Very fast (Rust-based) Moderate
Structured workflow, but
Ease of Use Simple, good for beginners Rich features but heavier Easy with modern CLI
has learning curve
Basic: Ensures only Strong: Provides
dependency list is captured Strong: Captures versions deterministic builds by
([Link]). May not and environment details pinning exact versions,
Reproducibility guarantee reproducibility more fully
Basic ([Link])
ensuring reproducibility
(across OS, libraries with ([Link]) across systems
C,C++ dependencies) ([Link])
Ideal for Python data
ML/AI with heavy non-
For small Python-only Fast set up and Python-only science projects, package
AI/DS Use Case projects
python dependencies
DS workflows management, and
(CUDA)
publishing to PyPI
Python for DS 18
Lifecycle of Using Environments
Use → Run scripts,
Create → Set up a Activate → Switch into Install → Add
notebooks, or ML
new environment for that environment to dependencies/libraries
models inside the
your project work you need
environment
Reuse/Move → Switch/Deactivate →
Share → Export
Rebuild the same Move between
requirements
environment on environments
([Link]) so
another machine or depending on project
others can replicate
cloud needs
Python for DS 19
Jupyter Notebooks Vs Python Scripts
Python for DS 20
Python Notebooks Vs Python Scripts
Aspect Python Notebooks Python Scripts
.ipynb (interactive, cell-based)
Format .py (linear code file)
Jupyter notebooks or Google Colab
Best For Exploration, EDA, teaching, quick demos Production code, automation, deployment
Highly interactive (cell execution, visual
Interactivity Minimal interactivity (runs top to bottom)
output)
Documentation Code + Markdown + plots in one place External docs required
Version Control Harder to manage (not Git friendly) Easy (plain text, Git-friendly)
Suitable for manual debugging but limited Manual testing is cumbersome but
Debugging/Testing
tools comprehensive tools are available
AI/DS Use Case Prototypes, Experimentation, etc. ML pipelines, APIs, CI/CD, scaling, etc.
Python for DS 21
From Notebooks to Scripts
Step 2: Move stable
Step 4: Continue
code into functions
• Load datasets • Import functions from Experimentation • Repeat above steps
& scripts
• Try code snippets .py files into until exploration is
step by step • Collect repeated logic notebooks for reuse • Continue on done
• Visualize data & into reusable • Bridges notebooks by • Finally, all code can
results functions experimentation with importing stable code be moved to scripts if
• Put functions into .py structured code from scripts needed
files (scripts) • Keeps the notebook
Step 3: Import light and clean
Step 1: Explore in
Scripts into Step 5: Repeat
Jupyter Notebooks
Notebooks
Python for DS 22
PEP Standards and Linting
Python for DS 23
What is PEP?
PEP = Python Enhancement Proposal
A design document that describes:
◦ New features or changes in Python
◦ Guidelines and best practices
◦ Python development processes
Reviewed and approved by the Python community & PSF
Ensures Python evolves in a structured and transparent way
Ref: [Link]
Python for DS 24
Types of PEP Documents
Standards Track PEPs
◦ Propose new features or changes to Python’s syntax, libraries, or interpreter
◦ Example: PEP 572 – Proposal to introduce assignment expressions (:= operator)
Informational PEPs
◦ Provide guidelines, conventions, or general information
◦ Example: PEP 8 – Style Guide for Python Code ([Link]
Process PEPs
◦ Define or change processes for Python development itself
◦ Example: PEP 1 – How PEPs are structured and submitted
Python for DS 25
PEP Workflow
Idea Draft PEP Discussion
• A community member • Author writes a draft in the • Shared on Python mailing
(developer, contributor, PEP format (motivation, lists, forums, or GitHub for
researcher) proposes an specification, rationale) community feedback.
idea
PEP Review Steering Council Implementation
• PEP Editor is assigned to Decision (Standard PEP)
check formatting, clarity, • Reviews and approves or • If accepted, contributors
and completeness rejects the PEP implement the feature
• Consists of elected core • It ships in a future Python
developers release.
Python for DS 26
PEP 8 Style Guide for DS
To Ensure Consistency & Readability: Critical when sharing notebooks,
scripts, and ML pipelines
Key points:
◦ Naming conventions: snake_case for variables/functions, CamelCase for classes
◦ Indentation: 4 spaces
◦ Line length: ≤ 79 chars (keeps notebooks & Git differences clean)
◦ Imports: one per line, grouped logically (standard → third-party → local)
◦ Whitespace: use around operators for clarity (a = b + c)
◦ Comments & Docstrings: Explain data transformations, models, assumptions
Overall improves collaboration, reduces errors, and eases debugging ML
pipelines
IDEs like VSCode aid in following PEP 8 standards
Python for DS 27
Linting
Linting : Process of checking the code
Finds:
◦ Syntax errors
◦ Formatting issues
◦ Unused variables/imports
◦ Style violations based on PEP8 standards
Helps you write clean, error-free, and consistent code
In AI&DS projects, linting ensures:
◦ Code is readable & sharable across the team
◦ Catch bugs early (typos, missing imports, unused code)
◦ Smooth transition from exploration → production pipelines
Python for DS 28
Ruff for Data Science Projects
What is Ruff?
◦ A modern, ultra-fast linter and formatter for Python
◦ Written in Rust → blazing fast compared to older tools
◦ Combines multiple tools in one
Features of Ruff relevant to Data Science
◦ Unused import/variable detection (can auto-fix)
◦ Automatic import sorting: standard → third-party → project-specific imports
◦ PEP 8 style enforcement (can auto-fix)
◦ Docstring and comment checks
◦ Duplicate code detection
◦ Nudges to refactor into reusable functions
◦ Easy to integrate in VSCode, Jupyter notebook, or CI/CD pipelines
Python for DS 29
PEP 20 : Zen of Python for DS
19 Guiding Principles for Writing Pythonic Code written by Tim Peters in 1999
Shapes the philosophy of Python development: simplicity, readability, and clarity
“Beautiful is better than ugly” : Write clean notebooks, clear plots, and well-
structured ML code
“Simple is better than complex” : Don’t over-engineer pipelines; keep preprocessing &
models understandable
“Readability counts” : Document feature engineering steps; teammates must
understand your code
“Errors should never pass silently” : Handle missing data, NaNs, data inconsistencies
and model errors explicitly
“There should be one—preferably only one—obvious way to do it” : Follow
consistent or standard ML workflows
Python for DS 30
Summary
Versions of python and versions of libraries in python are closely connected and combability
between needs to be ensured for any python project
Virtual environments are the key to reproducibility, portability and collaborative coding in data
science projects
Different tools can be used to manage virtual environments depending on the needs of the
project
A balanced approach between Python Notebooks and Pythons scripts can be taken to explore,
experiment and manage code properly in data science & AI projects
PEP documents ensure that python enhancements and standards are managed in a structured
manner
PEP 8 is the global standard for style and conventions to be followed in python coding
Linting is an automated process to identify potential issues and deviations from coding
standards
Ruff is a popular modern tool to manage and auto-correct issues identified during linting
Python for DS 31