0% found this document useful (0 votes)
3 views40 pages

Python Unit4

NumPy is a library for efficient numerical computation in Python, providing support for multi-dimensional arrays and a wide range of mathematical functions. It is faster and more memory-efficient than Python lists, making it essential for scientific computing, data analysis, and machine learning. The library includes various data types, array operations, and integration capabilities with other libraries like Pandas and Matplotlib.

Uploaded by

ca235213232
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views40 pages

Python Unit4

NumPy is a library for efficient numerical computation in Python, providing support for multi-dimensional arrays and a wide range of mathematical functions. It is faster and more memory-efficient than Python lists, making it essential for scientific computing, data analysis, and machine learning. The library includes various data types, array operations, and integration capabilities with other libraries like Pandas and Matplotlib.

Uploaded by

ca235213232
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

NumPy

NumPy (Numerical Python) is a library for efficient numerical


computation in Python.
• NumPy (Numerical Python) is a library for working with arrays and
mathematical operations in Python. It is a fundamental package for
scientific computing and data analysis.

Why Use NumPy?


Speed: NumPy arrays are much faster than Python lists for numerical
computations.
Efficiency: NumPy arrays use less memory than Python lists for large
datasets.
Functionality: NumPy provides a wide range of high-performance
mathematical functions.
Features
Multi-dimensional arrays: NumPy's core data structure is the ndarray,
which allows for efficient storage and manipulation of large datasets.
Vectorized operations: Perform operations on entire arrays at once,
making it much faster than working with Python lists.
Matrix operations: Supports matrix multiplication, transposes, and
other linear algebra functions.
Random number generation: Generate random numbers and arrays
with various distributions.
Data type support: Supports various data types, including integers,
floats, complex numbers, and more.
Integration with other libraries: Works seamlessly with other popular
libraries like Pandas, Matplotlib, and SciPy.
Applications
Data analysis: Cleaning, filtering, and transforming data.
Scientific simulations: Numerical simulations, data modeling, and visualization.
Machine learning: Data preprocessing, feature extraction, and model training.
Data visualization: Creating plots, charts, and graphs.
Data Science: Data wrangling, feature engineering, and data visualization.
Image Processing: Image filtering, transformation, and feature extraction.
Financial Analysis: Data analysis, risk management, and portfolio (Maximize
returns and Mininize risks) optimization.
Cryptography: Numerical computations for encryption and decryption algorithms.
Engineering: Numerical simulations, data analysis, and visualization in fields like
mechanical, electrical, and civil engineering.
Research: Data analysis, simulations, and visualization in various research fields,
including social sciences and economics.
NumPy Standard Data Types
NumPy has the following standard data types: - data types are used to specify the
type of elements in a NumPy array.
Integer Types
np.int8 (8-bit signed integer)
np.int16 (16-bit signed integer)
np.int32 (32-bit signed integer)
np.int64 (64-bit signed integer)
Unsigned Integer Types
np.uint8 (8-bit unsigned integer)
np.uint16 (16-bit unsigned integer)
np.uint32 (32-bit unsigned integer)
np.uint64 (64-bit unsigned integer)
Floating Point Types
np.float16 (16-bit floating point number)
np.float32 (32-bit floating point number)
np.float64 (64-bit floating point number)
Complex Number Types
np.complex64 (64-bit complex number)
np.complex128 (128-bit complex number)
Boolean Type - [Link] (Boolean type storing True and False values)
Object Type - [Link] (Object type storing Python objects)
String Type
[Link] (String type storing fixed-length strings)
[Link] (String type storing fixed-length unicode strings)
Datetime and Timedelta Types
np.datetime64 (Datetime type storing dates and times)
np.timedelta64 (Timedelta type storing time intervals)
NumPy Array
• A NumPy array is a multi-dimensional collection of elements of the
same data type stored in a single object.
• NumPy is used to work with arrays. The array object in NumPy is
called ndarray.
• To create an ndarray, we can pass a list, tuple or any array-like
object into the array() method, and it will be converted into an
ndarray.
Use a tuple to create a NumPy array:
import numpy as np
arr = [Link]((1, 2, 3, 4, 5))
print(arr)
Types of Array
One Dimensional Array and Multi-Dimensional Array
One Dimensional Array:
A one-dimensional array is a type of linear array.

# importing numpy module


import numpy as np
# creating list
list = [1, 2, 3, 4]
# creating numpy array
sample_array = [Link](list)
print("List in python : ", list)
print("Numpy Array in python :", sample_array)
Multi-Dimensional Array:
Data in multidimensional arrays are stored in tabular form.

# importing numpy module


import numpy as np
# creating list
list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
list_3 = [9, 10, 11, 12]
# creating numpy array
sample_array = [Link]([list_1, list_2, list_3])
print("Numpy multi dimensional array in python\n", sample_array)
NumPy Array Attributes
• Attributes are properties of NumPy arrays that provide information
about the array's shape, size, data type, dimension, and so on.
• For example, to get the dimension of an array, we can use the ndim
attribute. There are numerous attributes available in NumPy.
Attributes Description

ndim returns number of dimension of the array

size returns number of elements in the array

dtype returns data type of elements in the array

shape returns the size of the array in each dimension.

itemsize returns the size (in bytes) of each elements in the array

data returns the buffer containing actual elements of the array in memory
Examples
The ndim attribute returns the number of dimensions in the
numpy array.
For example,
import numpy as np
# create a 2-D array
array1 = [Link]([[2, 4, 6], [1, 3, 5]])
# check the dimension of array1
print([Link]) #Output: 2
# return total number of elements in array1
print([Link]) #Output: 6
# return a tuple that gives size of array in each dimension
print([Link]) #Output: (2,3)
# check the data type of array1
print([Link]) #Output: int64
# create a 1-D array of 32-bit integers
array2 = [Link]([6, 7, 8, 10, 13], dtype=np.int32)
print([Link]) #Output: 4
# print memory address of array1's and array2's data
print("\nData of array1 is: ",[Link])
print("Data of array2 is: ",[Link])
#Output:
Data of array1 is: <memory at 0x7f746fea4a00>
Data of array2 is: <memory at 0x7f746ff6a5a0>
Computation on NumPy Arrays
Arithmetic Operations
NumPy supports standard arithmetic operations like addition,
subtraction, multiplication, division, and modulo on arrays.
import numpy as np
a = [Link]([1, 2, 3])
b = [Link]([4, 5, 6])
print(a + b) # Output: [5 7 9]
print(a * b) # Output: [4 10 18]
print(a / b) # Output: [0.25 0.4 0.5]
Indexing and Slicing
You can access an array element by referring to its index number. The
indexes in NumPy arrays start with 0. NumPy allows efficient access
and manipulation of array elements.
a = [Link]([1, 2, 3, 4, 5])
print(a[2]) # Output: 3
print(a[1:4]) # Output: [2 3 4]
print(a[2:]) # Slice elements from index 2 to the end of the array
print(a[2] + a[3]) # Output: 7
arr = [Link]([[1,2,3,4,5], [6,7,8,9,10]])
print('1 st element on 2nd row: ', arr[1, 0]) # 1 st element on 2nd row: 6
print('5th element on 2nd row: ', arr[1, 4]) #5th element on 2nd row 10
Universal Functions (ufuncs)
Universal functions (ufuncs) in NumPy are functions that operate
element-wise on NumPy arrays. They provide a vectorized way to
perform mathematical operations on arrays.
Common types of ufuncs
Arithmetic - [Link], [Link], [Link], [Link],
np.floor_divide, [Link], [Link]
x = [Link]([2, 3, 4])
y = [Link]([3, 2, 1])
result = [Link](x, y)
print(result) # Output: [5 5 5]
result1 = [Link](x, y)
print(result1) # Output: [ 8 9 4]
Comparision ufuncs - [Link], np.not_equal, [Link], [Link],
np.less_equal, np.greater_equal
x = [Link]([1, 2, 3])
y = [Link]([2, 2, 4])
result = np.not_equal(x, y)
print(result) # Output: [True False True]
Logical ufuncs - np.logical_and, np.logical_or, np.logical_xor,
np.logical_not
x = [Link]([True, False, True])
y = [Link]([False, True, False])
result = np.logical_xor(x, y)
print(result) # Output: [True True True]
Trignometric ufuncs - [Link], [Link], [Link], [Link], [Link],
[Link].
angles = [Link]([0, [Link]/4, [Link]/2])
result = [Link](angles)
print(result) # Output: [0. 1. inf]
Exponential and logarithmic ufuncs - [Link], [Link], np.log10, np.log2,
[Link], [Link]
x = [Link]([1, 10, 100])
result = np.log10(x)
print(result) # Output: [0. 1. 2.]
x = [Link]([4, 9, 16])
result1 = [Link](x)
print(result1) # Output: [2. 3. 4.]
Aggregation Functions
Python NumPy module has many aggregate functions or statistical
functions to work with arrays.
[Link](): Calculates the sum of array elements.
[Link](): Computes the arithmetic mean (average) of array elements.
[Link](): Calculates the median of array elements.
[Link]() and [Link](): Determine the minimum and maximum values in
an array.
[Link](): Computes the standard deviation of array elements.
[Link](): Calculates the variance of array elements.
Example
import numpy as np
data = [Link]([1, 2, 3, 4, 5])
# Compute sum, mean, median, min, max, std, and var
print("Sum:", [Link](data))
print("Mean:", [Link](data)) Output:
print("Median:", [Link](data)) Sum: 15
print("Minimum:", [Link](data)) Mean: 3.0
Median: 3.0
print("Maximum:", [Link](data)) Minimum: 1
print("Standard Deviation:", [Link](data)) Maximum: 5
Standard Deviation: 1.581138
print("Variance:", [Link](data))
Variance: 2.5
Find common values between two arrays
import numpy as np
array1 = [Link]([0, 10, 20, 40, 60])
print("Array1: ", array1)
array2 = [10, 30, 40]
print("Array2: ", array2)
print("Common values between two arrays:")
print(np.intersect1d(array1, array2))
print("\nThe set difference between two arrays:")
print(np.setdiff1d(array1, array2))
Output:
Array1: [ 0 10 20 40 60]
Array2: [10, 30, 40]
Common values between two arrays:
[10 40]
The set difference between two arrays:
[ 0 20 60]
Sorting an array is simple with [Link]()
arr = [Link]([2, 1, 5, 3, 7, 4, 6, 8])
[Link](arr)
Output: array([1, 2, 3, 4, 5, 6, 7, 8])
Concatenate arrays with [Link]()
a = [Link]([1, 2, 3, 4])
b = [Link]([5, 6, 7, 8])
[Link]((a, b))
Output: array([1, 2, 3, 4, 5, 6, 7, 8])
Using [Link]() will give a new shape to an array without changing the
data.
b = [Link](3, 2)
a = [Link](6)
print(b)
print(a) [[0 1]
[0 1 2 3 4 5] [2 3]
[4 5]]
Creating a random array in NumPy
We create random arrays using various functions from the [Link]
module.
arr1 = [Link](0, 10, (3, 3))
print("Random integers:")
print(arr1)
arr2 = [Link](3, 3)
print("\nRandom floats:")
print(arr2)
Output:
Random integers: Random floats:
[[8 9 5] [[0.4236548 0.6458941 0.4375872 ]
[1 4 7] [0.891773 0.9636622 0.3834415 ]
[3 2 6]] [0.7917251 0.5288949 0.5680445 ]]
Multidimensional aggregates
Aggregation functions take an additional argument specifying the axis along which the aggregate is
computed. we can find the minimum value within each column by specifying axis=0 and we can find the
maximum value within each row within each column by specifying axis=1.
arr = [Link]([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Summation
print("Sum of all elements:", [Link](arr)) # Output: 45
print("Sum of each column:", [Link](arr, axis=0)) # Output: [12 15 18]
print("Sum of each row:", [Link](arr, axis=1)) # Output: [ 6 15 24]
# Mean
print("Mean of all elements:", [Link](arr)) # Output: 5.0
print("Mean of each column:", [Link](arr, axis=0)) # Output: [4. 5. 6.]
print("Mean of each row:", [Link](arr, axis=1)) # Output: [2. 5. 8.]
# Min and Max
print("Minimum value:", [Link](arr)) # Output: 1
print("Maximum value:", [Link](arr)) # Output: 9
σ² = (Σ(xᵢ² - nμ²)) / (n - 1)
Where:
σ² = variance
xᵢ = individual data points
n = number of data points
μ = mean (μ = (Σxᵢ) / n)
Pandas
• Pandas is a Python library used for working with data sets.
• Pandas is a powerful and open-source Python library. The Pandas
library is used for data manipulation and analysis.
• It has functions for analyzing, cleaning, exploring, and manipulating
data.
• It is built on top of the NumPy library which means that a lot of the
structures of NumPy are used or replicated in Pandas.
Why Use Pandas?
• Pandas allows us to analyze big data and make conclusions based on
statistical theories.
• Pandas can clean messy data sets, and make them readable and
relevant.
• Relevant data is very important in data science.
• Data Science is a branch of computer science where we study how to
store, use and analyze data for deriving information from it.
Installing and Using Pandas
• Installing Pandas on your system requires NumPy to be installed.
• Python Pandas can be installed on Windows using pip.
• PIP is a package management system used to install and manage
software packages/libraries written in Python. Type the following
command in a command prompt.
pip install pandas
Once Pandas is installed, you can import it and check the version:
import pandas
pandas.__version__
'0.18.1'
Just as we generally import NumPy under the alias np, we will import
Pandas under the alias pd:
import pandas as pd
Introducing Pandas Objects
• Pandas objects can be thought of as enhanced versions of NumPy structured
arrays in which the rows and columns are identified with labels rather than
simple integer indices.
• Pandas objects provide a powerful data structure for efficient data
manipulation and analysis.
• The three fundamental Pandas data structures: the Series, DataFrame, and
Index.
Data Manipulation with Pandas
The Series - The panda Series is a one-dimensional labeled array
capable of holding any data type. We can create a series from a list,
array, or dictionary. we can perform basic math operations.
we can compare a series to another series. we can sort a series in
ascending or descending order. We can perform aggregation
[Link] is a single column of data. Each data point in the
series has a corresponding index [Link] can be modified after
creation.
Data Manipulation with Pandas
Example1: import pandas as pd
list1 = [Link]([1, 2, 3, 4, 5]) # Create a Series from a list
dict1 = [Link]({'a': 1, 'b': 2, 'c': 3}) # Create a Series from a dictionary
print(list1)
print(dict1)
Output
Example2: import pandas as pd data : data present at index 1 is: 2
import numpy as np 0 1 First 5 data elements :
1 2 0 1
np_array = [Link]([1,2,3,5,6,7,9,1,10]) 2 3 1 2
data = [Link](np_array) 3 5 2 3
4 6 3 5
print("data :\n", data) 4 6
5 7
print("data present at index 1 is :",data[1]) 6 9 dtype: int64
print("First 5 data elements : ") 7 1
8 10
print(data[:5]) dtype: int64
Data Manipulation with Pandas
import pandas as pd
import numpy as np Output:
Data :
np_array = [Link]([1,2,3,5,6,7,9,1,10]) a 1
data = [Link](np_array, index=['a','b','c','d','e','f','g','h','i']) b 2
print("data :\n", data) c 3
d 5
print("data present at index 1 is :",data['b']) e 6
f 7
g 9
h 1
i 10
dtype: int64
data present at
index 1 is : 2
Data Manipulation with Pandas
DataFrame
• A DataFrame in Pandas is a two-dimensional, size-mutable, and
potentially heterogeneous tabular data structure with labeled axes (rows
and columns). It's similar to an Excel spreadsheet or a table in a relational
database.
Creating dataframe:
We can create a DataFrame from various data sources, such as:
import pandas as pd
Dictionary: [Link]({'A': [1, 2, 3], 'B': [4, 5, 6]})
List: [Link]([[1, 2, 3], [4, 5, 6]], columns=['A', 'B'])
NumPy array: [Link]([Link]([[1, 2, 3], [4, 5, 6]]), columns=['A', 'B'])
Example1: Example2: Slice Rows - to select multiple rows using ':'
import pandas as pd operator.
info = {'ID' :[101, 102, import pandas as pd
103],'Department' info = {'one' : [Link]([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e']),
:['[Link]','[Link]','[Link]',]} 'two' : [Link]([1, 2, 3, 4, 5, 6], index=['a', 'b', 'c', 'd', 'e', 'f'])}
df = [Link](info) df = [Link](info)
print (df) print (df[2:5])

Output
Output
ID Department one two
0 101 [Link] c 3.0 3
1 102 [Link] d 4.0 4
2 103 [Link] e 5.0 5
Add Row to Pandas Dataframe
df = [Link]({'A': [1, 2], 'B': [3, 4]})
# add a new row
new_row = [Link]({'A': [5], 'B': [6]}) A B
df = [Link](new_row, ignore_index=True) 0 1 3
print(df) 1 2 4
2 5 6
Add Column to Pandas Dataframe
df = [Link]({'A': [1, 2], 'B': [3, 4]})
# add a new column A B C
df = [Link](C=[5, 6]) 0 1 3 5
print(df) 1 2 4 6
Index object
• A Pandas Index object is an immutable (values cannot be modified after
creation) array of values that can be used as a label for data in a Series
or DataFrame. It's similar to an index in a book, which helps you locate
specific information.
Example:
import pandas as pd
# Create an index object
idx = [Link](['apple', 'banana', 'cherry'])
print(idx)
Output:
Index(['apple', 'banana', 'cherry'], dtype='object')
Index object using Series
import pandas as pd
# Create an index object
idx = [Link](['apple', 'banana', 'cherry'])
# Create a Series object with the index object
series = [Link]([10, 20, 30], index=idx)
print(series)
Output:
apple 10
banana 20
cherry 30
dtype: int64
Index object using DataFrame
import pandas as pd
# Create an index object
idx = [Link](['apple', 'banana', 'cherry'])
# Create a DataFrame object with the index object
data = {'Quantity': [10, 20, 30], 'Price': [5.0, 6.0, 7.0]}
df = [Link](data, index=idx)
print(df)
Output:
Quantity Price
apple 10 5.0
banana 20 6.0
cherry 30 7.0
Matplotlib
• Data visualization is a crucial aspect of data analysis, enabling data
scientists and analysts to present complex data in a more understandable
and insightful manner. One of the most popular libraries for data
visualization in Python is Matplotlib.
• Matplotlib is a comprehensive library for creating static, animated, and
interactive visualizations in Python. Matplotlib makes easy things easy
and hard things possible.
• Create publication quality plots.
• Make interactive figures that can zoom, pan, update.
• Customize visual style and layout.
• Export to many file formats.
Importing Matplotlib
To use Matplotlib, you need to import it.
import [Link] as plt
Setting Styles
Matplotlib offers various built-in styles to change the look of your plots.
You can set a style using [Link]().
Example:
# List available styles
print([Link])
# Set a style (e.g., 'ggplot')
[Link]('ggplot')

ggplot provides a range of predefined themes to apply to your plots.


Saving Figures to File
Once you have created your plot, you can save it to a file using [Link](). You
can specify the filename and the file format (e.g., PNG, PDF, SVG).
Example:
x = [1, 2, 3, 4, 5] # Sample data
y = [2, 3, 5, 7, 11]
[Link](x, y, marker='o') # Create a plot with circle as a marker
[Link]('Sample Plot') # Add titles and labels
[Link]('X-axis')
[Link]('Y-axis')
[Link]('sample_plot.png', dpi=300) # Save the figure
# dpi is for resolution
[Link]() # Show the plot window (optional)
Simple Line Plots
A line plot is great for visualizing trends over a continuous variable.
Example:
import [Link] as plt
# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Create a line plot
[Link](x, y, marker='o', linestyle='-', color='b')
# Add titles and labels
[Link]('Simple Line Plot')
[Link]('X-axis')
[Link]('Y-axis')
# Show the plot
[Link]()
Simple Scatter Plots
A scatter plot is useful for visualizing the relationship between two variables. Example
import [Link] as plt
# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Create a scatter plot
[Link](x, y, color='r', marker='x')
# Add titles and labels
[Link]('Simple Scatter Plot')
[Link]('X-axis')
[Link]('Y-axis')
# Show the plot
[Link]()

You might also like