Report on Python Data Science Libraries
Prepared by: Maryam Zahra Naqvi
Date: 17th December, 2025
This report summarizes key concepts and practical examples of three fundamental Python
libraries used in data science and scientific computing:
1. NumPy – Fast numerical computing with arrays
2. Pandas – Data manipulation and analysis (especially tabular data)
3. Scikit-Learn – Machine learning toolkit
These libraries form the backbone of almost every data science project in Python.
NumPy Library
What is NumPy?
NumPy (Numerical Python) provides the ndarray object – a fast, multidimensional container for
homogeneous numerical data.
Much faster than Python lists due to fixed-type storage and vectorized operations.
Installation
pip install numpy
Check version:
ONE DIMENSIONAL ARRAYS
Declaration of Array
Using arrange()
Using linespace()
Making all zeros arrays
Making all ones arrays
Indexing Arrays
Slicing of 1D array
Applying mathematical operations
TWO DIMENSIONAL ARRAYS
Declaration of array:
Making all zero arrays
Making all ones arrays
Making an array with same number
Indexing an Array
Slicing an Array
Transpose of an array
Mathematical operations
Image Manipulation
Calling an image
Resizing an image
Change into Grayscale
Rotated 90º
Flip horizontal
Flip vertically
PANDAS LIBRARY
Pandas is a Python library used for data manipulation, analysis, and cleaning.
Built on top of NumPy, so it’s fast for numerical operations.
Key data structures in Pandas:
Series – 1-dimensional labeled array
DataFrame – 2-dimensional labeled table (like Excel or SQL table)
Pandas provides Series (1D) and DataFrame (2D) for working with labeled data.
Example:
Series → like a labeled list
DataFrame → like an Excel spreadsheet or SQL table
This structure allows you to index, select, and manipulate data using labels, not just integer
positions.
Installing Pandas
pip install pandas
import pandas as pd
Pandas series
A Series is like a column in a table.
Pandas Data Frame
A DataFrame is like a spreadsheet or SQL table
Pandas Data Filtering
SCI-KIT LIBRARY
Scikit-Learn is an open-source Python library for machine learning (ML).
It provides tools for data preprocessing, model building, evaluation, and prediction.
Built on top of NumPy, SciPy, and matplotlib, making it efficient and easy to use.
Ideal for classification, regression, clustering, dimensionality reduction, and more.
Why use it?
Easy-to-use ML algorithms
Preprocessing tools
Model evaluation metrics
Works with data science ecosystem
Installation
pip install scikit-learn
Basic Workflow:
Load data
Train model
Test model
Evaluate
Improve model
Common algorithm
Linear Regression
Decision Trees
Random Forest