0% found this document useful (0 votes)
14 views31 pages

Python Environments and PEP Standards

This document covers Python environments and PEP standards, emphasizing the importance of version compatibility between Python and its libraries for data science projects. It discusses the use of virtual environments for reproducibility and collaboration, the differences between Python scripts and notebooks, and the significance of PEP documents and linting in maintaining coding standards. Additionally, it highlights Ruff as a modern tool for linting and formatting Python code.

Uploaded by

pjenith51
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views31 pages

Python Environments and PEP Standards

This document covers Python environments and PEP standards, emphasizing the importance of version compatibility between Python and its libraries for data science projects. It discusses the use of virtual environments for reproducibility and collaboration, the differences between Python scripts and notebooks, and the significance of PEP documents and linting in maintaining coding standards. Additionally, it highlights Ruff as a modern tool for linting and formatting Python code.

Uploaded by

pjenith51
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Python for Data Science

Web M. Tech Course DA5301W


Module 2: Python Environments and PEP Standards

Dr. P.S Jayadev


Contents
➢Versions in Python
➢Modules, Packages and Libraries in Python
➢Dependencies and Requirements
➢Python Virtual Environments
➢Tools for Managing Virtual Environments
➢Python Scripts Vs Python Notebooks
➢PEP Standards and Linting

Python for DS 2
Motivation for this Module
 Reproducibility : Same code + same data = same results
 Experimentation : ML requires quick library/model switching

 Collaboration : Ensuring all teams, testing and deployment


environments are consistent
 Portability : From Jupyter prototyping to production APIs, Docker,
cloud deployment
 This module is one of the foundation aspects of professional AI/DS
practice

Python for DS 3
Versions in Python
 Python Software Foundation (PSF) oversees Python’s development and
releases.
 Development work includes:
◦ Adding new features (syntax, libraries, performance improvements)
◦ Fixing bugs and security issues
◦ Deprecating outdated features
◦ Releasing updates regularly based on community proposals
 Python follows semantic versioning:
◦ Major version → Python 3
◦ Minor version → Python 3.11, 3.12
◦ Micro/patch version → Python 3.12.2 (bug fixes, security updates)
 3.13.3 is the latest stable version of python released in April 2025
Python for DS 4
Modules, Packages and Libraries

Python for DS 5
Modules, Packages and Libraries in Python
 Python Module:
◦ A single Python file (.py) containing code, such as functions, classes, or
variables
◦ Modules help you organize and reuse code by allowing import of the
file in other programs
 Python Package:
◦ A folder containing multiple modules a special __init__.py file
◦ Packages help group related modules together to keep large projects
organized

Python for DS 6
Modules, Packages and Libraries in Python
 Python Sub-Package:
◦ A folder within a package that contains its own __init__.py and
modules or further sub-packages
◦ Sub-packages create hierarchical organization of moduleswithin
packages
 Python Sub-Module:
◦ A module placed inside a sub-package
◦ Sub-modules are Python files (.py) nested deeper inside the package
hierarchy

Python for DS 7
Modules, Packages and Libraries in Python
 Python Library:
◦ A collection of modules and/or packages
offering various functionalities as a whole
◦ Libraries include popular packages like NumPy,
pandas, which provide wide-ranging tools

Python for DS 8
Example of Numpy Library in Python
 Package: numpy - the top-level folder containing many modules and
sub-packages.
 Sub-Package: [Link] - folder inside numpy organizing linear
algebra-related code
 Sub-Module: [Link] - Python file module in
the [Link] sub-package.
 Module (not a sub-module): [Link] - Module directly inside
the numpy package for Fourier transforms
 Reference on how to create a package in python: [Link]
[Link]/[Link]

Python for DS 9
Example of Numpy Library in Python
 Q: What is contained in numpy library apart from numpy
package?
◦ Compiled binary extensions (C/C++/Fortran) for fast numerical
computation.
◦ Additional resources, metadata, and scripts supporting building,
testing, and package management
◦ Dependencies and supporting files that ensure full functionality and
performance

Python for DS 10
Versions in Python Libraries
 Python Libraries also have versions just like python itself
◦ Eg: NumPy 1.23 vs NumPy 1.26
 Further, Library compatibility is tied to Python version
◦ Example: TensorFlow 2.11 works only up to Python 3.10
 Therefore, when choosing Python version, you must also check library
versions
 Mismatch results in errors, installation failures and inconsistent results

Python for DS 11
Dependencies and Requirements

Python for DS 12
What are Dependencies?
 Dependency: Any python packages or library your project relies on
 Example:
◦ NumPy : for arrays related operations
◦ pandas : for data wrangling
◦ scikit-learn : for ML models
 AI/DS projects may have dozens of dependencies
 Dependencies also have their own dependencies leading to a dependency
tree
 Example: scikit-learn depends on numpy, scipy, and joblib libraries

Python for DS 13
What are Requirements?
 List of specific dependencies (libraries and their versions) needed for your
project
 Generally managed as [Link] file
 Eg:

 Requirements file is very critical for reproducible results


 Without requirements: one teammate may run on numpy 1.25, another on
numpy 1.26 – may lead to inconsistent outputs

Python for DS 14
Problem of Dependency Conflicts
 Different projects often need different versions of the same library
◦ Each project may rely on a feature, bug fix, or behavior specific to a certain
library version
◦ Newer versions may remove deprecated functions, change APIs, or optimize
performance
 Example:
◦ Project A might need torch==2.0
◦ Project B might need torch==1.12
 Issue: Installing two versions of same library can results in conflicts &
broken projects
 Solution: Isolated Environments is the way to go

Python for DS 15
Python Virtual Environments

Python for DS 16
Python Virtual Environments
 Virtual Environment: Isolated space in a system with its own Python version and
dependencies installed
 Benefits:
◦ Each project can have its virtual environment with its own library versions
◦ No clashes between projects
◦ Easy to replicate setup on another machine or server
 Tools to manage Python environments:
◦ venv: Simple built-in Python tool for creating Python environments
◦ poetry: Python dependency and project manager that simplifies packaging, publishing,
and environment handling for projects.
◦ conda: Powerful environment and package manager that handles both Python and non-
Python dependencies (Say CUDA)
◦ uv: Fast, modern alternative offering improved speed and streamlined environment
management.
Python for DS 17
Python Virtual Environments
Feature/Tool venv conda uv poetry

Python + non-Python Python-only dependencies +


Scope Python-only environments
dependencies (CUDA,etc.)
Python-only environments
package management

Speed Moderate Slower (complex solver) Very fast (Rust-based) Moderate

Structured workflow, but


Ease of Use Simple, good for beginners Rich features but heavier Easy with modern CLI
has learning curve

Basic: Ensures only Strong: Provides


dependency list is captured Strong: Captures versions deterministic builds by
([Link]). May not and environment details pinning exact versions,
Reproducibility guarantee reproducibility more fully
Basic ([Link])
ensuring reproducibility
(across OS, libraries with ([Link]) across systems
C,C++ dependencies) ([Link])
Ideal for Python data
ML/AI with heavy non-
For small Python-only Fast set up and Python-only science projects, package
AI/DS Use Case projects
python dependencies
DS workflows management, and
(CUDA)
publishing to PyPI
Python for DS 18
Lifecycle of Using Environments

Use → Run scripts,


Create → Set up a Activate → Switch into Install → Add
notebooks, or ML
new environment for that environment to dependencies/libraries
models inside the
your project work you need
environment

Reuse/Move → Switch/Deactivate →
Share → Export
Rebuild the same Move between
requirements
environment on environments
([Link]) so
another machine or depending on project
others can replicate
cloud needs

Python for DS 19
Jupyter Notebooks Vs Python Scripts

Python for DS 20
Python Notebooks Vs Python Scripts
Aspect Python Notebooks Python Scripts
.ipynb (interactive, cell-based)
Format .py (linear code file)
Jupyter notebooks or Google Colab

Best For Exploration, EDA, teaching, quick demos Production code, automation, deployment

Highly interactive (cell execution, visual


Interactivity Minimal interactivity (runs top to bottom)
output)

Documentation Code + Markdown + plots in one place External docs required

Version Control Harder to manage (not Git friendly) Easy (plain text, Git-friendly)

Suitable for manual debugging but limited Manual testing is cumbersome but
Debugging/Testing
tools comprehensive tools are available

AI/DS Use Case Prototypes, Experimentation, etc. ML pipelines, APIs, CI/CD, scaling, etc.

Python for DS 21
From Notebooks to Scripts

Step 2: Move stable


Step 4: Continue
code into functions
• Load datasets • Import functions from Experimentation • Repeat above steps
& scripts
• Try code snippets .py files into until exploration is
step by step • Collect repeated logic notebooks for reuse • Continue on done
• Visualize data & into reusable • Bridges notebooks by • Finally, all code can
results functions experimentation with importing stable code be moved to scripts if
• Put functions into .py structured code from scripts needed
files (scripts) • Keeps the notebook
Step 3: Import light and clean
Step 1: Explore in
Scripts into Step 5: Repeat
Jupyter Notebooks
Notebooks

Python for DS 22
PEP Standards and Linting

Python for DS 23
What is PEP?
 PEP = Python Enhancement Proposal
 A design document that describes:
◦ New features or changes in Python
◦ Guidelines and best practices
◦ Python development processes
 Reviewed and approved by the Python community & PSF
 Ensures Python evolves in a structured and transparent way
 Ref: [Link]

Python for DS 24
Types of PEP Documents
 Standards Track PEPs
◦ Propose new features or changes to Python’s syntax, libraries, or interpreter
◦ Example: PEP 572 – Proposal to introduce assignment expressions (:= operator)
 Informational PEPs
◦ Provide guidelines, conventions, or general information
◦ Example: PEP 8 – Style Guide for Python Code ([Link]
 Process PEPs
◦ Define or change processes for Python development itself
◦ Example: PEP 1 – How PEPs are structured and submitted

Python for DS 25
PEP Workflow
Idea Draft PEP Discussion
• A community member • Author writes a draft in the • Shared on Python mailing
(developer, contributor, PEP format (motivation, lists, forums, or GitHub for
researcher) proposes an specification, rationale) community feedback.
idea

PEP Review Steering Council Implementation


• PEP Editor is assigned to Decision (Standard PEP)
check formatting, clarity, • Reviews and approves or • If accepted, contributors
and completeness rejects the PEP implement the feature
• Consists of elected core • It ships in a future Python
developers release.

Python for DS 26
PEP 8 Style Guide for DS
 To Ensure Consistency & Readability: Critical when sharing notebooks,
scripts, and ML pipelines
 Key points:
◦ Naming conventions: snake_case for variables/functions, CamelCase for classes
◦ Indentation: 4 spaces
◦ Line length: ≤ 79 chars (keeps notebooks & Git differences clean)
◦ Imports: one per line, grouped logically (standard → third-party → local)
◦ Whitespace: use around operators for clarity (a = b + c)
◦ Comments & Docstrings: Explain data transformations, models, assumptions
 Overall improves collaboration, reduces errors, and eases debugging ML
pipelines
 IDEs like VSCode aid in following PEP 8 standards

Python for DS 27
Linting
 Linting : Process of checking the code
 Finds:
◦ Syntax errors
◦ Formatting issues
◦ Unused variables/imports
◦ Style violations based on PEP8 standards
 Helps you write clean, error-free, and consistent code
 In AI&DS projects, linting ensures:
◦ Code is readable & sharable across the team
◦ Catch bugs early (typos, missing imports, unused code)
◦ Smooth transition from exploration → production pipelines

Python for DS 28
Ruff for Data Science Projects
 What is Ruff?
◦ A modern, ultra-fast linter and formatter for Python
◦ Written in Rust → blazing fast compared to older tools
◦ Combines multiple tools in one
 Features of Ruff relevant to Data Science
◦ Unused import/variable detection (can auto-fix)
◦ Automatic import sorting: standard → third-party → project-specific imports
◦ PEP 8 style enforcement (can auto-fix)
◦ Docstring and comment checks
◦ Duplicate code detection
◦ Nudges to refactor into reusable functions
◦ Easy to integrate in VSCode, Jupyter notebook, or CI/CD pipelines

Python for DS 29
PEP 20 : Zen of Python for DS
 19 Guiding Principles for Writing Pythonic Code written by Tim Peters in 1999
 Shapes the philosophy of Python development: simplicity, readability, and clarity
 “Beautiful is better than ugly” : Write clean notebooks, clear plots, and well-
structured ML code
 “Simple is better than complex” : Don’t over-engineer pipelines; keep preprocessing &
models understandable
 “Readability counts” : Document feature engineering steps; teammates must
understand your code
 “Errors should never pass silently” : Handle missing data, NaNs, data inconsistencies
and model errors explicitly
 “There should be one—preferably only one—obvious way to do it” : Follow
consistent or standard ML workflows

Python for DS 30
Summary
 Versions of python and versions of libraries in python are closely connected and combability
between needs to be ensured for any python project
 Virtual environments are the key to reproducibility, portability and collaborative coding in data
science projects
 Different tools can be used to manage virtual environments depending on the needs of the
project
 A balanced approach between Python Notebooks and Pythons scripts can be taken to explore,
experiment and manage code properly in data science & AI projects
 PEP documents ensure that python enhancements and standards are managed in a structured
manner
 PEP 8 is the global standard for style and conventions to be followed in python coding
 Linting is an automated process to identify potential issues and deviations from coding
standards
 Ruff is a popular modern tool to manage and auto-correct issues identified during linting

Python for DS 31

You might also like