0% found this document useful (0 votes)
8 views72 pages

Python Chapter 7

This document provides an overview of NumPy, a fundamental library for scientific computing in Python, highlighting its efficiency, vectorization, and broadcasting capabilities. It details various methods for creating NumPy arrays, understanding data types, and accessing elements, along with attributes and operations for statistical calculations. Additionally, it covers array indexing, copying, and advanced iteration techniques, making it a comprehensive guide for utilizing NumPy in data science workflows.

Uploaded by

sthajibesh4
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views72 pages

Python Chapter 7

This document provides an overview of NumPy, a fundamental library for scientific computing in Python, highlighting its efficiency, vectorization, and broadcasting capabilities. It details various methods for creating NumPy arrays, understanding data types, and accessing elements, along with attributes and operations for statistical calculations. Additionally, it covers array indexing, copying, and advanced iteration techniques, making it a comprehensive guide for utilizing NumPy in data science workflows.

Uploaded by

sthajibesh4
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Compiled by: Rakesh Shrestha (sthrakeshrestha)

Unit 7: Common Python


Libraries
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Numpy
- pip install numpy
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Introduction to NumPy

• NumPy (Numerical Python) is a fundamental library for scientific computing in


Python.
• It provides powerful features for working with multidimensional arrays and
matrices.
• NumPy offers significant advantages over standard Python lists for numerical
computations, including:
• Efficiency: NumPy stores arrays in contiguous blocks of memory, enabling
optimized operations compared to Python lists. This translates to faster
calculations and improved performance, especially when dealing with large
datasets.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Introduction to NumPy

• Vectorization: NumPy excels at vectorized operations, a technique where operations


are applied element-wise across entire arrays at once. This approach leverages the
power of your computer's hardware to accelerate computations considerably,
particularly when compared to looping through elements individually.
• Broadcasting: NumPy's broadcasting mechanism is a powerful feature that allows
you to perform operations on arrays of different shapes under certain conditions. It
automatically expands smaller arrays to match the dimensions of larger arrays for
element-wise calculations. This simplifies your code and enhances its flexibility.
• Rich Ecosystem: NumPy serves as the foundation for many popular data science
libraries. These include pandas (data analysis and manipulation), SciPy (advanced
scientific computing algorithms), and Matplotlib (visualization). This rich ecosystem
allows you to seamlessly integrate NumPy within your data science workflows for
comprehensive analysis tasks.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Creating NumPy Arrays

• NumPy offers several ways to create arrays, providing flexibility for various use
cases.
• From Python Lists: can convert a Python list directly into a NumPy array using
the [Link]() function. This is a convenient approach when the data is already
in a list format.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Creating NumPy Arrays

• From Scratch: can create an array directly by specifying the elements within
square brackets []. This method is useful for defining small arrays or arrays with
specific values.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Creating NumPy Arrays

• With Specific Data Types: NumPy allows us to define the data type of the
elements in the array using the dtype argument within the [Link]() function.
This ensures efficient memory usage and optimized operations for the specific
data types we're working with.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Creating NumPy Arrays

• Using NumPy's Built-in Functions:


• NumPy provides various functions for generating arrays with specific
properties:
• [Link](shape, dtype=float): Creates an array filled with zeros.
• shape: A tuple representing the dimensions of the desired array (e.g., (3, 4) for a 3x4
matrix).
• dtype (optional): The data type of the elements (defaults to float).
• order (optional): Controls the memory layout.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Creating NumPy Arrays

• Using NumPy's Built-in Functions:


• [Link](shape, dtype=float): Creates an array filled with ones.
• [Link](shape, fill_value, dtype=None): Creates an array filled with a specific
value.
• shape: The desired shape of the array.
• fill_value: The value to fill the array with.
• dtype (optional): The data type of the elements.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Creating NumPy Arrays

• Using NumPy's Built-in Functions:

• [Link](shape, dtype=float): Creates an array with uninitialized elements (random


garbage values).

• [Link](start, stop, step=1, dtype=None): Generates an array with evenly spaced values
within a given range (exclusive of stop).
• start: Starting value (inclusive).
• stop: Ending value (exclusive).
• step (optional): Interval between elements (defaults to 1).
• dtype (optional): The data type of the elements.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Creating NumPy Arrays

• Using NumPy's Built-in Functions:


• [Link](start, stop, num, endpoint=True, retstep=False, dtype=None): Creates an
array with a specified number of evenly spaced values between a start and stop value.
• start: Starting value (inclusive).
• stop: Ending value (inclusive by default).
• num: Number of elements to generate.
• endpoint (optional): Include the stop value in the array (default is True).
• retstep (optional): If True, returns the step size between elements as a separate scalar
(default is False).
• dtype (optional): The data type of the elements.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Creating NumPy Arrays

• Reading Arrays from Files:


• [Link](fname, delimiter=',', skiprows=0, usecols=None, dtype=float,
converters=None): Loads an array from a text file.
• fname: The filename of the text file.
• delimiter (optional): The character separating values in the file (default is comma).
• skiprows (optional): Number of header rows to skip (default is 0).
• usecols (optional): A sequence of column indices to include (default is all).
• dtype (optional): The data type of the elements in the loaded array (default is float).
• converters (optional): A dictionary mapping column indices to conversion functions.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

NumPy array dimensions

• NumPy arrays are multidimensional structures, allowing us to represent data in


more than one dimension.
• Dimensions refer to the number of axes or "directions" an array has.
• A 1D array (vector) has one dimension (often visualized as a row or column).
• A 2D array (matrix) has two dimensions (rows and columns).
• Arrays can have even higher dimensions for representing complex data
structures (3D for volumes, etc.).
Compiled by: Rakesh Shrestha (sthrakeshrestha)

NumPy array dimensions

• Accessing Elements by Index: Each element in a NumPy array is uniquely


identified by a combination of indices, corresponding to the array's dimensions.
• For a 1D array, a single index specifies the element's position.
• In 2D arrays, we use two indices to access elements, representing the row and
column positions.
• Indexing extends to higher dimensions as well.
• Shape Attribute: The shape attribute of a NumPy array is a tuple that reveals the
number of elements along each dimension.
• For example, a 2D array with 3 rows and 4 columns would have a shape of (3, 4).
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding NumPy Data Types

• Selecting appropriate data types for NumPy arrays is crucial for efficient
memory usage, optimized calculations, and accurate results.
• Common NumPy data types:
• Numeric Data Types:
• Boolean Data Type:
• String Data Type:
• Other Data Types:
• User-Defined Data Types (UDTs):
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding NumPy Data Types

• Numeric Data Types:


• int (integer): Represents whole numbers (positive, negative, or zero).
• There are various subtypes depending on the number of bits used to store the data:
• int8: Stores integers from -128 to 127 (8 bits)
• int16: Stores integers from -32768 to 32767 (16 bits)
• int32: Stores integers from -2147483648 to 2147483647 (32 bits) - This is often the
default choice for integers.
• int64: Stores integers from -9223372036854775808 to 9223372036854775807 (64 bits)
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding NumPy Data Types

• Numeric Data Types:


• float (floating-point): Stores numbers with decimal precision.
• There are two common subtypes:
• float32: Stores single-precision floating-point numbers (approximately 7 decimal
digits of precision)
• float64: Stores double-precision floating-point numbers (approximately 15 decimal
digits of precision) This is often the default choice for floating-point numbers due to
its higher precision.
• Use float when there is a need of decimal values or calculations involving real
numbers.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding NumPy Data Types

• Numeric Data Types:


• complex: Represents complex numbers with a real and imaginary part.
• It can be further subtyped based on the underlying numeric representation (e.g.,
complex64, complex128).
• Use complex for calculations involving imaginary components.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding NumPy Data Types

• Boolean Data Type:


• bool: Represents logical values (True or False).
• Use bool for storing logical conditions or performing Boolean operations.

• String Data Type:


• str: Stores sequences of characters (text data).
• Utilize str for textual data or when we need to represent non-numeric information within
arrays.

• Other Data Types:


• object: This is a generic data type that can hold any Python object.
• However, it's generally less efficient than using more specific data types and can lead to
performance issues in some NumPy operations.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding NumPy Data Types

• User-Defined Data Types (UDTs):


• NumPy allows defining custom data types using structured arrays or record arrays.
• These enable us to combine different data types within a single array element,
creating more complex data structures.
• UDTs offer flexibility for representing complex data but require careful design and
memory management.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding NumPy Data Types

• Choosing the Right Data Type:


• Selecting an appropriate data type is crucial for:
• Memory Efficiency: Smaller data types use less memory, which is important
for handling large datasets.
• Performance: NumPy operations are optimized for specific data types.
Using the right type can significantly improve performance.
• Accuracy: Choose a data type with sufficient precision to represent your
data accurately.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Array Attributes in NumPy

• NumPy arrays come equipped with various attributes that provide valuable insights into their
structure, data types, and memory usage.
• Key Attributes:
• ndim: Represents the number of dimensions (axes) in the array.
• A 1D array has ndim = 1.
• A 2D array (matrix) has ndim = 2.
• Higher-dimensional arrays have correspondingly higher ndim values.
• shape: This is a tuple that specifies the size of the array along each dimension.
• For a 2D array with 3 rows and 4 columns, shape would be (3, 4).
• size: Represents the total number of elements in the array.
• It's the product of elements in the shape tuple.
• dtype: This attribute reveals the data type of the elements in the array (e.g., float64, int32, str).
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding Array Indexing in NumPy

• Effectively accessing and manipulating elements within NumPy arrays centers


on understanding indexing.
• Indexing Basics:
• Each element in a NumPy array has a unique position identified by its indices.
• Indices start from 0 (zero-based indexing).
• A 1D array (vector) has a single index for each element.
• A 2D array (matrix) requires two indices to specify the row and column of an element.
• Higher-dimensional arrays follow the same principle, with an additional index for
each dimension.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding Array Indexing in NumPy

• Accessing Elements:
• You can access individual elements using square brackets [] and their
corresponding indices.
• For a 1D array: arr[index] (e.g., arr[2] accesses the third element).
• For a 2D array: arr[row_index, column_index] (e.g., arr[1, 2] accesses the element
at row 1, column 2).
• This allows to retrieve specific values from the arrays.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding Array Indexing in NumPy

• Slicing for Subarrays:


• Slicing extracts a subarray from the original array without modifying the
original data.
• Slices use the syntax arr[start:stop:step].
• start (optional): The index where the slice begins (defaults to 0).
• stop (exclusive): The index up to which the slice extends (but not including).
• step (optional): The interval between elements in the sliced subarray (defaults to 1).

• Slicing can be used to select specific rows, columns, or even portions of the array
along any dimension.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Understanding Array Indexing in NumPy

• Slicing for Subarrays:


• Out-of-bounds indexing throws an IndexError exception.
• Slicing creates a view of the original data, not a copy.
• Changes to the sliced subarray will reflect in the original array.
• F o r c o p y i n g a s u b a r r a y, u s e . c o p y ( ) a f t e r s l i c i n g ( e . g . , s l i c e d _ a r r =
original_arr[start:stop].copy()).
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Array Copy vs. View

• Copy:
• Define a copy as a new array with its own data in memory.
• modifications to the copy do not affect the original array.
• Uses .copy()
• View:
• Define a view as a reference to the original array's data.
• Explain that changes made to a view are reflected in the original array.
• Uses .view()
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Iterating over NumPy

• Iterating over NumPy arrays is essential for many operations.


• While nested loops might be familiar, they can be slow for large arrays.
• Basic Iteration:
• Using for loops.
• Nested loops for multi-dimensional arrays.

• Vectorized Operations: Efficient and faster than loops.


Compiled by: Rakesh Shrestha (sthrakeshrestha)

Iterating over NumPy

• Advanced Iteration:
• Using [Link]:
• Provides an efficient and flexible multi-dimensional iterator.
• Allows for advanced control, such as broadcasting, iteration order, and multi-
dimensional indexing.
• Suitable for more complex iteration needs.
• Using [Link]:
• Returns a 1D iterator over the array.
• Simple and easy to use.
• Does not offer advanced control over iteration.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

sorting and searching arrays

• NumPy provides efficient tools for sorting and searching arrays.


• Sorting:
• [Link]() for sorted copy
• When a sorted version of array without modifying the original, use [Link]()
• [Link]() for indices of sorted elements
• This is useful for rearranging elements based on their sorted order.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

sorting and searching arrays

• Searching:
• [Link]() for finding elements based on condition
• It returns the indices where the condition is True.
• [Link]() for finding insertion indices
• If you want to find where to insert new elements to maintain sorted order, use
[Link]().
• This is helpful for tasks like merging sorted arrays or finding intervals.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Statistical Calculations

• NumPy offers a rich set of functions for performing statistical calculations on


arrays.
• Central tendency:
• [Link]() calculates the average of the array elements.
• [Link]() finds the middle value when the data is sorted.

• Dispersion:
• [Link]() computes the standard deviation, measuring how spread out the data is.
• [Link]() calculates the variance, which is the square of the standard deviation.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Statistical Calculations

• Range:
• [Link]() and [Link]() find the smallest and largest values in the array.

• Percentiles:
• [Link]() calculates specific percentiles of the data, indicating the value below
which a given percentage of observations fall.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Pandas
- pip install pandas
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Pandas

• Pandas is an open-source data analysis and manipulation tool, built on top of NumPy.
• The name "Pandas" is derived from "Panel Data," a term for multidimensional
structured data sets, and "Python Data Analysis."
• Created by Wes McKinney in 2008.
• It provides data structures and functions needed to work with structured data
seamlessly.
• Pandas is essential for data science and machine learning tasks due to its powerful
data manipulation capabilities.
• Pandas is well-suited for working with tabular data, such as spreadsheets or SQL
tables.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Pandas

• It is built on top of the NumPy library which means that a lot of the structures of
NumPy are used or replicated in Pandas.
• Data set cleaning, merging, and joining.
• Easy handling of missing data (represented as NaN) in floating point as well as
non-floating point data.
• Columns can be inserted and deleted from DataFrame and higher-dimensional
objects.
• Powerful group by functionality for performing split-apply-combine operations
on data sets.
• Data Visualization.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Data Structures in Pandas

• Data Structures in Pandas:


• Series: A one-dimensional labeled array, similar to a column in a spreadsheet.
• DataFrame: A two-dimensional table of data with rows and columns, similar to a
spreadsheet or SQL table
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Data Structures in Pandas

• Hashable Objects:
• A hashable object can be used as a dictionary key because its hash value remains
constant during its lifetime.

• Requirements for Hashability:


• It must have a __hash__() method that computes its hash value.
• It must have an __eq__() method to compare equality with other objects.
• Immutable objects (whose value cannot change after creation) are usually hashable.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Data Structures in Pandas

• Pandas Series
• A one-dimensional labeled array capable of holding data of any type (integer, string,
float, Python objects, etc.).
• Axis labels are collectively called indexes.
• Similar to a column in an Excel sheet.
• Labels need not be unique but must be of a hashable type.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Data Structures in Pandas

• Features
• Supports both integer and label-based indexing.
• Provides various methods for operations involving the index.

• Creating a Series
• Loaded from existing storage (SQL database, CSV file, Excel file).
• Can be created from lists, dictionaries, scalar values, etc.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Data Structures in Pandas

• Pandas DataFrame
• A two-dimensional data structure with labeled axes (rows and columns).
• Size-mutable, potentially heterogeneous tabular data structure.
• Components
• Data: The actual data stored in the DataFrame.
• Rows: Represent the index (labels) for the rows.
• Columns: Represent the labels for the columns.

• Creating DataFrame
• Loaded from existing storage (SQL database, CSV file, Excel file).
• Can be created from lists, dictionaries, a list of dictionaries, etc.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Dealing with Rows and Columns

• Column Selection
• Select columns by their name.
• Example: df['column_name’]

• Row Selection
• Using [Link][] to retrieve rows by label.
• Using [Link][] to retrieve rows by integer location.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

.loc[] and .iloc[]

• .loc[]
• .loc[] is a label-based data selection method.
• It is used to select rows and columns based on their labels.
• When using .loc[], the start and end labels in slicing are both inclusive.

• .iloc[]
• iloc[] is an integer-location based indexing method.
• It is used to select rows and columns based on their integer positions.
• When using .iloc[], the end index in slicing is exclusive.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

.loc[] and .iloc[]

.loc[] .iloc[]
• Label-based • Integer-location based
• Uses row and column labels • Uses row and column integer positions
• Inclusive of both start and end labels • Exclusive of end position
• Raises KeyError for invalid labels • Raises IndexError for invalid positions
• When labels are known and • When positions are known or for general
meaningful slicing
• [Link][0, 'Name'], [Link][:, 'Age'] • [Link][0, 1], [Link][:, 1]
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Indexing and Selecting Data

• Indexing means selecting particular rows and columns from a DataFrame.


• Can select all rows and some columns, some rows and all columns, or some of each.
• Indexing with []
• Used to refer to columns.
• Example: df['column_name’]
• Indexing with .loc[]
• Selects data by the label of the rows and columns.
• Example: [Link][row_label, column_label]
• Indexing with .iloc[]
• Selects data by integer location.
• Example: [Link][row_index, column_index]
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Slicing and Subsetting


Slicing
• Slicing involves selecting a range of rows and columns.
• loc allows slicing using labels,
• while iloc uses integer positions for slicing.
Subsetting
• Refers to the process of selecting a subset of data from a larger dataset or
collection.
• Subsetting DataFrames can be done with conditions using boolean indexing.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Types of Subsetting
• Index-Based Subsetting
• Using indices or positions to select elements.
• This is common in lists and arrays.
• Example: my_list[2:5]

• Label-Based Subsetting
• Using labels to select elements, rows, or columns.
• This is common in Pandas DataFrames.
• Example: [Link]['row_label', 'column_label’]

• Conditional Subsetting
• Selecting elements based on a condition or criteria.
• This is commonly used in DataFrames with boolean indexing.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Pandas Merging, Joining, and


Concatenating
• Methods for joining, merging, and concatenating DataFrames include [Link](),
[Link](), and [Link]().
• Concatenating DataFrames
• Concatenating DataFrame using .concat()
• Concatenate DataFrames and return a new DataFrame.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Matplotlib
- pip install matplotlib
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Matplotlib: Introduction
• Matplotlib is a plotting library for the Python programming language and its
numerical mathematics extension NumPy.
• Powerful Python library for creating visualizations
• Cross-platform compatibility
• It provides an object-oriented API for embedding plots into applications. This allows
embedding plots into applications using general-purpose GUI toolkits like Tkinter,
wxPython
• Commonly used for data visualization in scientific and analytic contexts.
• Basic structure of a Matplotlib program
• Importing necessary libraries ([Link], numpy)
• Creating plots
• Displaying plots
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Matplotlib: Introduction
• Importing necessary libraries ([Link], numpy)
import [Link] as plt
import numpy as np
• Creating plots
x = [Link](0, 10, 100)
y = [Link](x)
[Link](x, y)
• Displaying plots
[Link]()
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Marker
• Symbols used to represent data points
• Customizing data point appearance
• Different types of markers
• ‘o’: Dots
• 's': square
• '^': triangle
• 'd': diamond
• '*': star
• '+': plus
• 'x': cross
• How to customize markers
• Color, size, edge color, marker style
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Lines
• Line plots are perfect for showing how data changes over time or to represent
relationships between variables.
• contains all the 2D line class which can draw with a variety of line styles,
markers and colors.
• Visualizing trends and relationships
• Creating a basic line plot
• Customizing line styles, colors,
and markers
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Lines
• Solid Line: '-' or 'solid'
• Dashed Line: '--' or 'dashed'
• Dotted Line: ':' or 'dotted'
• Dash-Dot Line: '-.' or 'dashdot’
• Custom
• Loosely Dotted: (0, (1, 10))
• Densely Dashed: (0, (5, 1))
• Dash-Dot-Dotted: (0, (3, 5, 1, 5, 1, 5))
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Color
• Color is a powerful tool for making your visualizations stand out.
• Enhancing visual appeal with color
• Color palettes and schemes
• Applying color to plots
• Matplotlib offers various ways to control colors in the plots.
• Use colormaps like 'viridis', 'plasma', 'inferno', and 'magma' for scientific
visualizations.
• Consider color blindness when choosing colors.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Color
• In almost all places in matplotlib where a color can be specified by the user it can
be provided as:
• an RGB or RGBA tuple of float values in [0, 1] (e.g., (0.1, 0.2, 0.5) or (0.1, 0.2, 0.5,
0.3))
• a hex RGB or RGBA string (e.g., '#0F0F0F' or '#0F0F0F0F')
• a string representation of a float value in [0, 1] inclusive for gray level (e.g., '0.5')
• one of {'b', 'g', 'r', 'c', 'm', 'y', 'k', 'w'}
• a X11/CSS4 color name
• a name from the xkcd color survey prefixed with 'xkcd:' (e.g., 'xkcd:sky blue')
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Color
• a name from the xkcd color survey prefixed with 'xkcd:' (e.g., 'xkcd:sky blue')
• one of {'C0', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9'}
• one of {'tab:blue', 'tab:orange', 'tab:green', 'tab:red', 'tab:purple’, 'tab:brown',
'tab:pink', 'tab:gray', 'tab:olive', 'tab:cyan'} which are the Tableau Colors from the
‘T10’ categorical palette (which is the default color cycle).
• All string specifications of color are case-insensitive.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Color
• Colormap:
• A colormap is a range of colors used to map scalar data to colors in
visualizations.
• Colormaps enhance the visualization of data by representing numerical values
as colors
• Commonly used in heatmaps, surface plots, and other types of plots to show
variations and patterns in data.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Color
• Classes of Colormap
• Matplotlib provides various colormaps classified into different categories.
• Main Classes:
• Sequential: For data that progresses from low to high values.
• Diverging: For data that diverges from a central value.
• Qualitative: For categorical data without any specific order.
• Cyclic: For data that repeats periodically.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Color
• Classes of Colormap
• Matplotlib provides various colormaps classified into different categories.
• Main Classes:
• Sequential: For data that progresses from low to high values.
• Diverging: For data that diverges from a central value.
• Qualitative: For categorical data without any specific order.
• Cyclic: For data that repeats periodically.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Color
• Choosing the Right Colormap
• Factors to Consider:
• Nature of Data: Sequential, diverging, qualitative, or cyclic.
• Data Distribution: Whether the data is continuous or categorical.
• Perceptual Uniformity: Colormaps like viridis are designed to be perceptually
uniform.
• Color Vision Deficiency: Choose colormaps that are accessible to colorblind users,
like cividis.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Label
• Labels provide descriptions for axes and titles for plots.
• Enhances readability and understanding of plots
• Basic Labeling Functions
• xlabel(): Sets the label for the x-axis
• ylabel(): Sets the label for the y-axis
• title(): Sets the title for the plot
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Grid Lines

• Grid lines help in visualizing the scale of the plot.


• We can add grid lines to your plot using the [Link]() function.
• We can customize the appearance of the grid lines by specifying parameters like
color, linestyle, and linewidth.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Subplots
• Subplots
• Subplots allow multiple plots in a single figure.
• Use [Link](rows, cols, index) to create subplots.
• Useful for comparing different datasets or visualizations side by side
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Scatter plots
• A scatter plot is a type of plot or mathematical diagram using Cartesian
coordinates to display values for typically two variables for a set of data.
• If the points are coded (color/shape/size), one additional variable can be
displayed.
• Matplotlib, a popular plotting library in Python, provides a convenient way to
create scatter plots.
• Scatter plots display individual data points.
• Use [Link](x, y) to create scatter plots.
• Customize with parameters like color, size, marker, and alpha.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Bar Graph
• A bar graph (or bar chart) in Matplotlib is a visual representation of data where
categories are displayed along one axis and their corresponding values are
represented by rectangular bars.
• The length or height of each bar is proportional to the value it represents.
• [Link](categories, values)
Customizing the Bar Graph
• We can customize the appearance of the bar graph in various ways:
• Color: Change the color of the bars.
• Width: Adjust the width of the bars.
• Orientation: Create horizontal bar graphs.
• Annotations: Add text annotations to the bars.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Bar Graph
• Horizontal Bar Graph
• To create a horizontal bar graph, use the barh() function:
• Adding Error Bars
• Error bars can be added to show the variability of data
• This can help indicate the precision of the measurements or the uncertainty in
the data.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Histogram
• A histogram represents the distribution of numerical data by grouping data
points into bins (intervals) and displaying the frequency of data points in each
bin.
• It’s particularly useful for understanding the underlying distribution of data.
• To create a histogram in Matplotlib, the function [Link]() is used.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Histogram
• Key Parameters: [Link](data, bins=30, color='skyblue', edgecolor='black')
• x: The input data, typically a list or array of numbers.
• bins: The number of bins (intervals) or the bin edges.
• range: The lower and upper range of the bins.
• density: If True, the histogram is normalized to form a probability density.
• cumulative: If True, each bin will contain the cumulative frequency up to and
including the current bin.
• color: The color of the bars.
• edgecolor: The color of the bin edges.
• alpha: The transparency level of the bars.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Pie chart
• A pie chart is a circular statistical graphic divided into slices to illustrate
numerical proportions.
• In Matplotlib, the pie() function is used to create pie charts.
• Each slice of the pie chart represents a category's contribution to the whole,
making it easy to compare parts of the dataset.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Pie chart
• Components of a Pie Chart
• Slices: Each slice represents a category's contribution.
• Labels: Text labels to identify each slice.
• Colors: Different colors for each slice.
• Explode: A feature to offset a slice from the pie for emphasis.
• Autopct: A feature to display the percentage value of each slice.
• Startangle: The starting angle for the pie chart.
Compiled by: Rakesh Shrestha (sthrakeshrestha)

Box plot
• A box plot in Matplotlib is a graphical representation used to summarize the
distribution of a dataset.
• It displays the dataset’s minimum, first quartile (Q1), median, third quartile (Q3), and
maximum values.
• To create a boxplot in Matplotlib, boxplot() function is used
• Box: The main part of the plot, which extends from Q1 to Q3.
• Median Line: A line inside the box that represents the median (middle value) of the
dataset.
• Whiskers: Lines extending from the box to the smallest and largest values within 1.5
times the interquartile range (IQR) from Q1 and Q3, respectively.
• Outliers: Data points outside the whiskers, often marked as individual points.

You might also like