0% found this document useful (0 votes)
10 views38 pages

Libraries in Python

Python libraries are collections of modules that provide pre-written code for common tasks, enhancing coding efficiency across various domains. They are categorized into built-in libraries that come with Python installations and external libraries that can be installed via pip. Notable libraries include NumPy for numerical computing, Pandas for data manipulation, and Matplotlib for data visualization.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views38 pages

Libraries in Python

Python libraries are collections of modules that provide pre-written code for common tasks, enhancing coding efficiency across various domains. They are categorized into built-in libraries that come with Python installations and external libraries that can be installed via pip. Notable libraries include NumPy for numerical computing, Pandas for data manipulation, and Matplotlib for data visualization.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Libraries in Python

In Python, a library is a group of modules that contain functions, classes and methods to perform common
tasks like data manipulation, math operations, web scraping and more. Python libraries make coding faster,
cleaner and more efficient by providing ready-to-use solutions for different domains such as data science,
web development, machine learning and automation.
Libraries in Python

Working of Python Library


When you import a library in Python, it gives access to pre-written code stored in separate modules. In
simple terms instead of writing the logic for a task, you import the library that already has it.

For example, on Windows, libraries are stored as .dll (Dynamic Link Libraries) and on Linux/macOS as .so
files. When you run your code, Python automatically loads these modules and makes their functions
available to use.
Libraries in Python

Types of Python Library


Python libraries are divided into two main types:

1. Built-in Python Standard Library

It is a collection of modules that come bundled with every Python installation, we don’t need to install
anything separately. Most of these modules are written in C for better performance.
Examples of built-in modules:
● math: Mathematical operations
● os: Interact with the operating system
● datetime: Date and time operations
● random: Generate random numbers
● json: Handles JSON data encoding and decoding.
Libraries in Python
Types of Python Library

2. External Python Libraries

External (third-party) libraries are not included with Python by default. You can install them easily using the pip package manager. Popular
External Python Libraries:

1. NumPy: It, short for Numerical Python, is the core library for numerical and scientific computing in Python. It provides powerful
tools for creating and manipulating arrays, matrices and multidimensional data.
2. Pandas: It is a data analysis and manipulation library built on top of NumPy. It introduces data structures like DataFrame and
Series that make it easy to handle structured data efficiently.
3. Matplotlib: It is a data visualization library that helps you create a wide variety of static, animated and interactive plots. It
supports charts such as bar graphs, line charts, histograms and scatter plots.
4. SciPy: It, short for Scientific Python, extends the functionality of NumPy by providing tools for advanced mathematical, scientific
and engineering computations. It includes modules for optimization, integration, signal processing and linear algebra.
5. TensorFlow: It is an open-source machine learning and deep learning framework developed by Google. It allows developers to
build and train neural networks for tasks such as image recognition, natural language processing and predictive modeling.
Libraries in Python
Types of Python Library

2. External Python Libraries

6. . Scikit-learn: It is a popular machine learning library built on top of NumPy and SciPy. It offers simple and efficient tools
for classification, regression, clustering and dimensionality reduction.

7. Scrapy: It is a web scraping and data extraction library designed to efficiently crawl websites and gather structured
information. It allows developers to create spiders that automatically navigate web pages and collect data.

8. PyTorch: It, developed by Facebook’s AI Research Lab, is a deep learning framework that provides dynamic computation
graphs and easy debugging. It supports GPU acceleration, making it suitable for high-performance training of neural
networks.

9. PyGame: It is a cross-platform library used for developing 2D games and multimedia applications. It provides modules
for handling graphics, sound and user input with ease.

10. PyBrain: It, short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network, is a
beginner-friendly machine learning library. It offers pre-built algorithms and flexible tools for training neural networks and
reinforcement learning models.
Libraries in Python

Using Libraries in Python Programs


To use any library, it first needs to be imported into your Python program using the import statement. Once imported, you can
directly call the functions or methods defined inside that library. You can import libraries in three main ways:
1. Import the entire library: import library_name for example, import math
2. Import a specific function or class: from library_name import function_name for example, from math import sqrt
3. Import a library with an alias: import library_name as alias for example, import pandas as pd

Example 1: This program imports the entire math library and uses one of its functions.
Input:
import math
A = 16
print([Link](A))
Output:
4.0

● Here, the complete math library is imported and we use [Link]() to calculate the square root of 16.
● Since the full library is imported, we must prefix the function with the library name (math.).
Libraries in Python: Pandas
Pandas is an open-source Python library used for data manipulation, analysis and cleaning. It provides fast and flexible tools to
work with tabular data, similar to spreadsheets or SQL tables.

Pandas is used in data science, machine learning, finance, analytics and automation because it integrates smoothly with other
libraries such as:

● NumPy: numerical operations


● Matplotlib and Seaborn: data visualization
● SciPy: statistical analysis
● Scikit-learn: machine learning workflows

With Pandas, you can load data, clean it, transform it, analyze it, visualize it all in just a few lines of code.
Libraries in Python: Pandas
Installation
● Before using Pandas, make sure it is installed:

pip install pandas

● After the Pandas have been installed in the system we need to import the library. This module is
imported using:

import pandas as pd

Note: pd is just an alias for Pandas. It’s not required but using it makes the code shorter when calling
methods or properties.

● The source code for Pandas is located at this github repository [Link]
Libraries in Python: Pandas
Why Use Pandas?
● Pandas allows us to analyze big data and make conclusions based on statistical theories.
● Pandas can clean messy data sets, and make them readable and relevant.
● Relevant data is very important in data science.

What Can Pandas Do?


Pandas gives you answers about the data. Like:

● Is there a correlation between two or more columns?


● What is average value?
● Max value?
● Min value?
Libraries in Python: Pandas
Read CSV Files
A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.

In our examples we will be using a CSV file called '[Link]'.

Download [Link]. or Open [Link]

● Load the CSV into a DataFrame:

import pandas as pd

df = pd.read_csv('[Link]')

print(df.to_string())

( use to_string() to print the entire DataFrame.)


Libraries in Python: Pandas
1. Core Data Structures
Pandas relies on two primary ways to hold data:

● Series: A one-dimensional labeled array. Think of it as a single column in a spreadsheet.


● DataFrame: A two-dimensional, size-mutable, potentially heterogeneous tabular data structure. This is your entire spreadsheet (rows and
columns).

2. Getting Started: The Basics


Before you do anything, you need to bring the library into your environment:

Python:

import pandas as pd
# Loading a dataset (the most common first step)
df = pd.read_csv('your_data.csv')

● [Link](n): See the first n rows (default is 5).


● [Link](): Shows data types, non-null counts, and memory usage.
● [Link](): Provides statistical summaries (mean, std, min, max) for numerical columns.
● [Link]: Returns the number of rows and columns.
Libraries in Python: Pandas
3. Selecting and Filtering
Pandas gives you surgical precision when picking out data.

● Selecting Columns: df['ColumnName'] or df[['Col1', 'Col2']]

Filtering Rows:
Python:

# Find all rows where 'Age' is greater than 25


young_adults = df[df['Age'] > 25]

● loc vs iloc:
○ .loc: Label-based indexing (uses column/row names).
○ .iloc: Integer-based indexing (uses row/column numbers).
Libraries in Python: Pandas
4. Data Cleaning (The "Dirty" Work)
Real-world data is usually a mess. Pandas helps you scrub it clean.
Libraries in Python: Pandas
5. Data Manipulation & Aggregation
Once the data is clean, you’ll want to extract insights.

● Creating New Columns: df['Total'] = df['Price'] * df['Quantity']

Grouping (The Power Move):


Python:
# Calculate average salary by department
[Link]('Department')['Salary'].mean()

● Sorting: df.sort_values(by='Date', ascending=False)

6. Merging and Joining


Rarely is all your data in one file. You can combine DataFrames just like SQL Joins:

● [Link](df1, df2, on='ID'): Similar to an SQL JOIN.


● [Link]([df1, df2]): Stacks DataFrames on top of each other or side-by-side.
Libraries in Python: Pandas
5. Data Manipulation & Aggregation
Once the data is clean, you’ll want to extract insights.

● Creating New Columns: df['Total'] = df['Price'] * df['Quantity']

Grouping (The Power Move):


Python:
# Calculate average salary by department
[Link]('Department')['Salary'].mean()

● Sorting: df.sort_values(by='Date', ascending=False)

6. Merging and Joining


Rarely is all your data in one file. You can combine DataFrames just like SQL Joins:

● [Link](df1, df2, on='ID'): Similar to an SQL JOIN.


● [Link]([df1, df2]): Stacks DataFrames on top of each other or side-by-side.
Libraries in Python: Pandas: Benefits
● Handling of missing data (NaN): pandas simplifies working with datasets containing missing data, represented as NaN,
whether the data is numeric or non-numeric.
● GroupBy functionality: pandas provides efficient GroupBy operations, enabling users to perform split-apply-combine
workflows for data aggregation and transformation.
● DataFrame size mutability: Columns can be added or removed from DataFrames or higher-dimensional data structures.
● Automated and explicit data alignment: pandas ensures data alignment by automatically aligning objects like Series
and DataFrames to their labels, simplifying computations.
● Thorough Documentation: The simplified API and fully documented features lower the learning curve for pandas. The
short, simple tutorialsand code samples enable new users to quickly start coding.
● I/O tools: pandas supports importing and exporting data in various formats, such as CSV, Excel, SQL, and HDF5.
● Visualization-ready datasets: pandas has straightforward visualization that can be plotted directly from the DataFrame
object.
● Flexible reshaping and pivoting: pandas simplifies reshaping and pivoting to single function calls on datasets to further
prepare them for analysis or visualization.
● Hierarchical axis labeling: pandas supports hierarchical indexing, allowing users to manage multi-level data structures
within a single DataFrame.
● Time-series functionality: pandas includes multiple time-series analysis functions, offering tools for date-range
generation, frequency conversion, moving window calculations, and lag analysis.
Libraries in Python: Pandas: Real-World Applications
1. SQL Integration and Data Analysis
pandas integrates with SQL databases, enabling users to read from and write to SQL tables directly within the pandas Python API. By importing
data directly into a DataFrame, users can leverage pandas for data analysis while maintaining SQL for querying and managing datasets.
2. Visualization and Insights
pandas’ ability to clean, filter, and transform tabular data ensures that datasets are ready for advanced charting and plotting libraries, like
Matplotlib and Seaborn. For instance, pandas can handle missing data and reformat time-stampedtime-series data to create meaningful trends
and insights.
3. Time-Series Analysis
pandas has numerous time-series functions for tasks like analyzing stock prices, weather patterns, and IoT sensor readings. Its functionality
includes date-range generation, frequency conversion, and advancedreshaping operations for temporal datasets.
4. Complex Data Manipulation
Tasks such as merging, joining, or concatenating multiple DataFrames are straightforward with pandas. The concat method, combined with
tools like pandas append, enables combining disparate data sources. The library also provides GroupBy functionality to aggregate and
transform data, supporting advanced split-apply-combine techniques.
5. Tabular Data Transformation
pandas simplifies the transformation of tabular data with features like reshaping, pivoting, and hierarchical indexing. For example, users can
reshape their datasets to analyze sales performance across regions or pivot tables for a clearer view of customer behavior.
6. Handling Missing Data
Managing missing data is one of pandas' core strengths. Users can fill, interpolate, or drop NaN values directly within a DataFrame to create
clean and complete datasets for analysis or integration into machine learning pipelines.
Libraries in Python: Pandas: Real-World Applications
1. SQL Integration and Data Analysis
pandas integrates with SQL databases, enabling users to read from and write to SQL tables directly within the pandas Python API. By importing
data directly into a DataFrame, users can leverage pandas for data analysis while maintaining SQL for querying and managing datasets.
2. Visualization and Insights
pandas’ ability to clean, filter, and transform tabular data ensures that datasets are ready for advanced charting and plotting libraries, like
Matplotlib and Seaborn. For instance, pandas can handle missing data and reformat time-stampedtime-series data to create meaningful trends
and insights.
3. Time-Series Analysis
pandas has numerous time-series functions for tasks like analyzing stock prices, weather patterns, and IoT sensor readings. Its functionality
includes date-range generation, frequency conversion, and advancedreshaping operations for temporal datasets.
4. Complex Data Manipulation
Tasks such as merging, joining, or concatenating multiple DataFrames are straightforward with pandas. The concat method, combined with
tools like pandas append, enables combining disparate data sources. The library also provides GroupBy functionality to aggregate and
transform data, supporting advanced split-apply-combine techniques.
5. Tabular Data Transformation
pandas simplifies the transformation of tabular data with features like reshaping, pivoting, and hierarchical indexing. For example, users can
reshape their datasets to analyze sales performance across regions or pivot tables for a clearer view of customer behavior.
6. Handling Missing Data
Managing missing data is one of pandas' core strengths. Users can fill, interpolate, or drop NaN values directly within a DataFrame to create
clean and complete datasets for analysis or integration into machine learning pipelines.
Libraries in Python: NUMPY
NumPy(Numerical Python) is a fundamental library for Python numerical computing. It provides efficient multi-dimensional
array objects and various mathematical functions for handling large datasets making it a critical tool for professionals in fields
that require heavy computation.

Key Features of NumPy


NumPy has various features that make it popular over lists.

● N-Dimensional Arrays: NumPy's core feature is ndarray, a N-dimensional array object that supports homogeneous
data types.
● Arrays with High Performance: Arrays are stored in contiguous memory locations, enabling faster computations
than Python lists (Please see Numpy Array vs Python List for details).
● Broadcasting: This allows element-wise computations between arrays of different shapes. It simplifies operations on
arrays of various shapes by automatically aligning their dimensions without creating new data.
● Vectorization: Eliminates the need for explicit Python loops by applying operations directly on entire arrays.
● Linear algebra: NumPy contains routines for linear algebra operations, such as matrix multiplication, decompositions,
and determinants.
Libraries in Python: NUMPY
Installing NumPy in Python
To begin using NumPy, you need to install it first. This can be done using the following pip command:
pip install numpy

Once installed, import the library with the alias np


import numpy as np

1. The Core: The ndarray


The heart of NumPy is the N-dimensional array (or ndarray). Unlike standard Python lists, NumPy arrays are:

● Homogeneous: Every element must be the same data type (usually floats or ints).
● Contiguous in Memory: This allows for incredibly fast mathematical operations.

Array Dimensions
● 1D (Vector): [1, 2, 3]
● 2D (Matrix): [[1, 2], [3, 4]] (Rows and Columns)
● 3D (Tensor): A collection of matrices (used in Deep Learning and Image processing).
Libraries in Python: NUMPY
2. Basic Setup & Creation
First, import it (conventionally as np):

Python:

import numpy as np

# Creating an array from a list


arr = [Link]([1, 2, 3])

# Generating placeholders
zeros = [Link]((3, 4)) # 3x4 matrix of 0s
ones = [Link]((2, 2)) # 2x2 matrix of 1s
range_arr = [Link](0, 10, 2) # [0, 2, 4, 6, 8]
linspace = [Link](0, 1, 5) # 5 evenly spaced numbers from 0 to 1
Libraries in Python: NUMPY
3. The Power of Vectorization
In pure Python, if you want to multiply every number in a list by 2, you need a for loop. In NumPy, you use Vectorization.

Python
data = [Link]([1, 2, 3])

# No loops required!
result = data * 2 # [2, 4, 6]

4. Broadcasting
Broadcasting : It allows you to perform operations between arrays of different shapes, provided they are compatible.

The Rule: NumPy compares their shapes element-wise. Two dimensions are compatible when they are equal, or one of
them is 1.
Libraries in Python: NUMPY
5. Essential Math & Stats
NumPy provides a massive library of mathematical functions.

Example: Matrix Multiplication


For two matrices A and B, the dot product is calculated as:

C=A.B
In NumPy: C = [Link](A, B) or simply A @ B.
Libraries in Python: NUMPY: Benefits
1. Speed (Vectorization)

Python is an interpreted language, which makes it "slow" for math. NumPy operations are written in C and Fortran, which are
compiled. When you add two NumPy arrays, it happens at the hardware level in a single "burst" rather than iterating through every
element one by one.

2. Memory Efficiency

A Python list is actually an array of pointers to objects. Each object has overhead (type info, reference counts). A NumPy array is a
contiguous block of raw memory.

● Python List: Roughly 28 bytes per integer.


● NumPy Array: Exactly 4 or 8 bytes per integer.

3. The Ecosystem Foundation

Almost every data science library you’ve heard of is built on top of NumPy:

● Pandas uses NumPy for its underlying data storage.


● Scikit-learn uses NumPy for machine learning inputs.
● Matplotlib uses NumPy to plot coordinates.
Libraries in Python: NUMPY: Applications
NumPy isn't just for "math homework." It is the invisible engine behind many technologies you use daily.

1. Image Processing

Every digital image is just a grid of pixels. In NumPy, an image is a 3D Array (Height x Width x RGB Color Channels).

● Example: To make an image brighter, you simply add +50 to the entire NumPy array. To flip it, you just reverse the array order.+1

2. Machine Learning & Deep Learning

Neural networks are essentially massive chains of linear algebra.

● Weights and Biases: These are stored as high-dimensional NumPy arrays (Tensors).
● Backpropagation: The math used to "train" AI involves matrix calculus, which is handled via NumPy-like logic.

3. Financial Modeling

Banks and hedge funds use NumPy to run Monte Carlo simulations. Because they need to simulate thousands of potential market scenarios per second, the speed of vectorized
NumPy arrays is non-negotiable for risk management.

4. Scientific Simulations

From weather forecasting to fluid dynamics (how air flows over an airplane wing), scientists use NumPy to solve complex differential equations.
Libraries in Python: SCIPY
SciPy ( Scientific Python) is a foundational library for scientific and technical computing in Python. It builds
on NumPy and provides a collection of algorithms and high-level commands for manipulating and visualizing
data.
Libraries in Python: SCIPY
What is SciPy Used for?
With SciPy, you can perform a wide range of scientific tasks, including:
● Solving integrals and differential equations
● Optimizing mathematical functions
● Performing statistical analysis
● Processing signal and image data
● Interpolating and fitting functions to data
● Managing sparse matrices and linear systems

Which Language is SciPy Written in?

SciPy is predominantly written in Python, but a few segments are written in C.


Libraries in Python: SCIPY
Pros of using SciPy:
(1) Visualizing and manipulating data with high-level commands and classes.
(2) Python sessions that are both robust and interactive.
(3) For parallel programming, there are classes and web and database procedures.
Con of using SciPy:
• SciPy does not provide any plotting function because its focus is on numerical objects and algorithms.
What is Data Visualization
Data visualization uses charts, graphs and maps to present information clearly and simply. It turns complex
data into visuals that are easy to understand. With large amounts of data in every industry, visualization
helps spot patterns and trends quickly, leading to faster and smarter decisions.
What is Data Visualization
Common Types of Data Visualization
There are various types of visualizations where each has a unique purpose in data representation. Here are
the most common types:

1. Charts and Graphs: They are used to visualize data, with charts comparing data points across
categories or showing trends over time and graphs analyzing relationships between variables to
identify correlations, trends and outliers. Examples: Bar Charts, Line Charts, Pie Charts, Scatter
Plots, Histograms, Box Plots.
2. Maps: They are used to display geographical data which provides spatial context to trends and
patterns. Examples: Geographic Maps, Heat Maps
3. Dashboards: They combine multiple visualizations into a single interface which provides real-time
insights and interactive features for users to explore data.
What is Data Visualization
Importance of Data Visualization
Data visualization is essential for understanding and communicating information effectively. Here are some
key reasons why it's important:
1. Simplifies Complex Data: It turns large and complicated data into visual formats like charts and
graphs, making the information easier to understand.
2. Reveals Patterns and Trends: It helps identify trends, relationships and patterns that are not easily
seen in raw data or tables.
3. Saves Time: Visuals allow quicker interpretation of data, helping users spot key information at a
glance instead of manually scanning through numbers.
4. Improves Communication: It makes it easier to explain data insights to others, especially those who
may not be familiar with the technical details.
5. Tells a Clear Story: Data visuals guide the audience through the information step-by-step, making
it easier to reach conclusions and make informed decisions.
What is Data Visualization
Real-World Use Cases for Data Visualization
Data visualization is used across various industries to improve decision-making and drive results. Here are a
few examples:

1. Business Analytics: Used to monitor company performance, track KPIs and make data-driven
decisions by visualizing trends, sales and customer metrics.
2. Healthcare: Helps in analyzing patient records, tracking disease outbreaks and managing hospital
operations through easy-to-read charts and dashboards.
3. Sports: Used to visualize player statistics, team performance and match outcomes, helping coaches
and analysts improve strategies and training plans.
4. Retail and E-commerce: Enables tracking of sales, customer preferences and inventory levels,
helping businesses adjust stock and marketing efforts effectively.
What is Data Visualization
Challenges in Data Visualization
1. Data Quality: Accuracy of visualizations depends on the quality of the data. If the data is inaccurate
or incomplete, the insights from the visualization will be misleading.
2. Over-Simplification: Simplifying data too much can lead to important details being lost like using a
pie chart that oversimplifies complex relationships between categories.
3. Choosing the Right Visualization: Using the wrong type of visualization can distort the message.
For example, a pie chart might not work well with many categories which leads to confusion.
4. Overload of Information: Too much information in a visualization can overwhelm viewers. It's
important to focus on key data points and avoid clutter.
What is Data Visualization
Best Practices for Effective Data Visualization
To ensure our visualizations are impactful and easy to understand we follow these best practices:

1. Audience-Centric Design: Tailor visualizations to our audience’s knowledge. A technical audience


may need detailed graphs while a general audience benefits from simpler charts.
2. Design Clarity and Consistency: Choose the right chart for our data and keep the design clean with
consistent colors, fonts and labels and also avoid clutter to ensure clarity.
3. Provide Context: Always provide context by including labels, titles and data source
acknowledgments. This helps viewers to understand the significance of the data and builds trust in
the results.
4. Interactive and Accessible Design: Make visualizations interactive with features like tooltips and
filters and ensure accessibility for all users regardless of device or visual needs.
What is Data Visualization
Best Practices for Effective Data Visualization
To ensure our visualizations are impactful and easy to understand we follow these best practices:

1. Audience-Centric Design: Tailor visualizations to our audience’s knowledge. A technical audience


may need detailed graphs while a general audience benefits from simpler charts.
2. Design Clarity and Consistency: Choose the right chart for our data and keep the design clean with
consistent colors, fonts and labels and also avoid clutter to ensure clarity.
3. Provide Context: Always provide context by including labels, titles and data source
acknowledgments. This helps viewers to understand the significance of the data and builds trust in
the results.
4. Interactive and Accessible Design: Make visualizations interactive with features like tooltips and
filters and ensure accessibility for all users regardless of device or visual needs.
What is Data Visualization
Data Visualization is the process of presenting data in the form of graphs or
charts. It helps to understand large and complex amounts of data very easily. It
allows the decision-makers to make decisions very efficiently and also allows
them in identifying new trends and patterns very easily.
• It is also used in high-level data analysis for Machine Learning and Exploratory
Data Analysis (EDA). Data visualization can be done with various tools like
Tableau, Power BI, Python.
Data Visualization using Matplotlib in Python
Matplotlib is a used Python library used for creating static, animated and interactive data visualizations. It is
built on the top of NumPy and it can easily handles large datasets for creating various types of plots such as
line charts, bar charts, scatter plots, etc.
Data Visualization using Matplotlib in Python
[Link]
otlib/
Histogram with Matplotlib:
[Link]
using-matplotlib/

You might also like