Introduction to NumPy in Python
Introduction to NumPy in Python
ipynb - Colaboratory
1. NumPy in Python
What is NumPy?
It is the fundamental package for scienti c computing with Python. It contains various features
including these important ones:
Besides its obvious scienti c uses, NumPy can also be used as an e cient multi-dimensional
container of generic data. Arbitrary data-types can be de ned using Numpy which allows NumPy to
seamlessly and speedily integrate with a wide variety of databases.
It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive
integers. In NumPy dimensions are called axes. The number of axes is rank. NumPy’s array class is
called ndarray. It is also known by the alias array.
[Link] 1/29
2/23/2021 Pendahuluan [Link] - Colaboratory
1. For example, you can create an array from a regular Python list or tuple using the array
function. The type of the resulting array is deduced from the type of the elements in the
sequences.
2. Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy
offers several functions to create arrays with initial placeholder content. These minimize the
necessity of growing arrays, an expensive operation. For example: [Link], [Link], [Link],
[Link], etc.
3. To create sequences of numbers, NumPy provides a function analogous to range that returns
arrays instead of lists.
4. arange: returns evenly spaced values within a given interval. step size is speci ed.
5. linspace: returns evenly spaced values within a given interval. num no. of elements are
returned.
6. Reshaping array: We can use reshape method to reshape an array. Consider an array with
shape (a1, a2, a3, …, aN). We can reshape and convert it into another array with shape (b1, b2,
b3, …, bM). The only required condition is: a1 x a2 x a3 … x aN = b1 x b2 x b3 … x bM . (i.e
original size of array remains unchanged.)
7. Flatten array: We can use atten method to get a copy of array collapsed into one dimension.
It accepts order argument. Default value is ‘C’ (for row-major order). Use ‘F’ for column major
order.
newarr = [Link](2, 2, 3)
# Flatten array
arr = [Link]([[1, 2, 3], [4, 5, 6]])
flarr = [Link]()
[Link] 3/29
2/23/2021 Pendahuluan [Link] - Colaboratory
A random array:
[[0.70864301 0.1445599 ]
[0.62385575 0.05495546]]
Original array:
[[1 2 3 4]
[5 2 4 2]
[1 2 0 1]]
Reshaped array:
[[[1 2 3]
[4 5 2]]
[[4 2 1]
[2 0 1]]]
Original array:
[[1 2 3]
[4 5 6]]
Fattened array:
[1 2 3 4 5 6]
a = [Link]([1, 2, 5, 3])
[Link] 4/29
2/23/2021 Pendahuluan [Link] - Colaboratory
# transpose of array
a = [Link]([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
Original array:
[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]
Binary operators: These operations apply on array elementwise and a new array is created. You can
use all basic arithmetic operators like +, -, /, , etc. In case of +=, -=, = operators, the exsisting array is
modi ed.
a = [Link]([[1, 2],
[3, 4]])
b = [Link]([[4, 3],
[2, 1]])
# add arrays
print ("Array sum:\n", a + b)
# matrix multiplication
print ("Matrix multiplication:\n", [Link](b))
Array sum:
[[5 5]
[5 5]]
[Link] 5/29
2/23/2021 Pendahuluan [Link] - Colaboratory
Array multiplication:
[[4 6]
[6 4]]
Matrix multiplication:
[[ 8 5]
[20 13]]
a = [Link]([[1, 4, 2],
[3, 4, 6],
[0, -1, 5]])
# sorted array
print ("Array elements in sorted order:\n",
[Link](a, axis = None))
# Creating array
arr = [Link](values, dtype = dtypes)
print ("\nArray sorted by names:\n",
[Link](arr, order = 'name'))
[Link] 6/29
2/23/2021 Pendahuluan [Link] - Colaboratory
Pandas: It is an open-source, BSD-licensed library written in Python Language. Pandas provide high
performance, fast, easy to use data structures and data analysis tools for manipulating numeric
data and time series. Pandas is built on the numpy library and written in languages like Python,
Cython, and C. In pandas, we can import data from various le formats like JSON, SQL, Microsoft
Excel, etc.
# Printing dataframe
df
Numpy: It is the fundamental library of python, used to perform scienti c computing. It provides
high-performance multidimensional arrays and tools to deal with them. A numpy array is a grid of
values (of the same type) that are indexed by a tuple of positive integers, numpy arrays are fast,
easy to understand, and give users the right to perform calculations across arrays.
[[23 46 85]
[43 56 99]
[11 34 55]]
Numpy [Link]()
With the help of Numpy [Link](), we can resize the size of an array. Array can be of any
shape but to resize it we just need the size i.e (2, 2), (2, 3) and many more. During resizing numpy
append zeros if values at a particular place is missing.
Example #1:
In this example we can see that with the help of .resize() method, we have changed the shape of an
array from 1×6 to 2×3.
print(gfg)
[[1 2 3]
[4 5 6]]
Example #2:
In this example we can see that, we are trying to resize the array of that shape which is type of out
of bound values. But numpy handles this situation to append the zeros when values are not existed
[Link] 8/29
2/23/2021 Pendahuluan [Link] - Colaboratory
in the array.
# importing the python module numpy
import numpy as np
print(ga)
[[1 2 3 4]
[5 6 0 0]
[0 0 0 0]]
Both the [Link]() and [Link]() methods are used to change the size of a NumPy
array. The difference between them is that the reshape() does not changes the original array but
only returns the changed array, whereas the resize() method returns nothing and directly changes
the original array.
# creating an array
A = [Link]([1, 2, 3, 4, 5, 6])
print("Original array:")
display(A)
# using reshape()
print("Changed array")
display([Link](2, 3))
print("Original array:")
display(A)
[Link] 9/29
2/23/2021 Pendahuluan [Link] - Colaboratory
Original array:
array([1, 2, 3, 4, 5, 6])
Changed array
array([[1, 2, 3],
Example 2: Using resize()
[4, 5, 6]])
Original array:
array([1, 2, 3, 4, 5, 6])
# importing the module
import numpy as np
# creating an array
Aa = [Link]([1, 2, 3, 4, 5, 6])
print("Original array:")
display(Aa)
# using resize()
print("Changed array")
# this will print nothing as None is returned
display([Link](2, 3))
print("Original array:")
display(Aa)
Original array:
array([1, 2, 3, 4, 5, 6])
Changed array
None
Original array:
array([[1, 2, 3],
[4, 5, 6]])
print(B)
print(new)
[[1 2]
[4 5]
[7 8]]
[[1 2 4]
[5 7 8]]
[Link] 10/29
2/23/2021 Pendahuluan [Link] - Colaboratory
[Link]()
[Link](), We can perform the simple function of transpose within one line by using
[Link]() method of Numpy. It can transpose the 2-D arrays on the other hand it has no
effect on 1-D arrays. This method transpose the 2-D numpy array.
# before transpose
print(Aa, end ='\n\n')
# after transpose
print([Link]())
[[1 2 3]
[4 5 6]
[7 8 9]]
[[1 4 7]
[2 5 8]
[3 6 9]]
# before transpose
print(Ab, end ='\n\n')
# after transpose
print([Link](1, 0))
[[1 2]
[4 5]
[7 8]]
[[1 4 7]
[2 5 8]]
[Link] 11/29
2/23/2021 Pendahuluan [Link] - Colaboratory
import numpy as np
my_array = [Link]([[11,22,33],[44,55,66]])
print(my_array)
print(type(my_array))
[[11 22 33]
[44 55 66]]
<class '[Link]'>
import numpy as np
import pandas as pd
my_array = [Link]([[11,22,33],[44,55,66]])
print(df)
print(type(df))
So here is the complete code to convert the array to a DataFrame with an index:
import numpy as np
import pandas as pd
[Link] 12/29
2/23/2021 Pendahuluan [Link] - Colaboratory
my_array = [Link]([[11,22,33],[44,55,66]])
print(df)
print(type(df))
import numpy as np
print(my_array)
print(type(my_array))
print(my_array.dtype)
import numpy as np
import pandas as pd
print(df)
print(type(df))
[Link] 13/29
2/23/2021 Pendahuluan [Link] - Colaboratory
Let’s check the data types of all the columns in the new DataFrame by adding [Link] to the code:
import numpy as np
import pandas as pd
print(df)
print(type(df))
Let’s check the data types of all the columns in the new DataFrame by adding [Link] to the code:
import numpy as np
import pandas as pd
print(df)
print(type(df))
print([Link])
For example, suppose that you’d like to convert the last 3 columns in the DataFrame to integers.
import numpy as np
import pandas as pd
[Link] 14/29
2/23/2021 Pendahuluan [Link] - Colaboratory
import pandas as pd
my_array = [Link]([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]])
df['Age'] = df['Age'].astype(int)
df['Birth Year'] = df['Birth Year'].astype(int)
df['Graduation Year'] = df['Graduation Year'].astype(int)
print(df)
print(type(df))
print([Link])
[Link]([df1, df2])
import pandas as pd
print (df1)
import pandas as pd
[Link] 15/29
2/23/2021 Pendahuluan [Link] - Colaboratory
print (df2)
import pandas as pd
You may then choose to assign the index values in an incremental manner once you concatenated
the two DataFrames.
[Link] 16/29
2/23/2021 Pendahuluan [Link] - Colaboratory
import pandas as pd
[Link]
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
[Link]['viper']
[Link] 17/29
2/23/2021 Pendahuluan [Link] - Colaboratory
d . oc[ pe ]
max_speed 4
shield 5
Name: viper, dtype: int64
[Link][['viper', 'sidewinder']]
max_speed shield
viper 4 5
sidewinder 7 8
[Link]['cobra', 'shield']
[Link]['cobra':'viper', 'max_speed']
cobra 1
viper 4
Name: max_speed, dtype: int64
import pandas as pd
print (df)
set_of_numbers equal_or_lower_than_4?
0 1 True
1 2 True
2 3 True
3 4 True
4 5 False
5 6 False
6 7 False
7 8 False
[Link] 18/29
2/23/2021 Pendahuluan [Link] - Colaboratory
8 9 False
9 10 False
how to get the same results as in case 1 by using lambada, where the conditions are:
If the number is equal or lower than 4, then assign the value of ‘True’ Otherwise, if the number is
greater than 4, then assign the value of ‘False’
import pandas as pd
print (df)
set_of_numbers equal_or_lower_than_4?
0 1 True
1 2 True
2 3 True
3 4 True
4 5 False
5 6 False
6 7 False
7 8 False
8 9 False
9 10 False
import pandas as pd
print (df)
First_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Mismatch
[Link] 19/29
2/23/2021 Pendahuluan [Link] - Colaboratory
import pandas as pd
print (df)
First_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Mismatch
import pandas as pd
print (df)
First_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Match
import pandas as pd
print (df)
[Link] 20/29
2/23/2021 Pendahuluan [Link] - Colaboratory
set_of_numbers
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 0
11 0
set_of_numbers
0 1
1 2
2 3
3 4
4 555
5 6
6 7
7 8
8 9
9 10
10 999
11 999
import pandas as pd
import numpy as np
[Link][df['set_of_numbers'].isnull(), 'set_of_numbers'] = 0
print (df)
set_of_numbers
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
10 NaN
11 NaN
set_of_numbers
0 1.0
1 2.0
2 3.0
[Link] 21/29
2/23/2021 Pendahuluan [Link] - Colaboratory
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
10 0.0
11 0.0
[Link] 22/29
2/23/2021 Pendahuluan [Link] - Colaboratory
stats_numeric = df['Price'].describe()
print (stats_numeric)
count 5.000000
mean 27600.000000
std 4878.524367
min 22000.000000
25% 25000.000000
50% 27000.000000
75% 29000.000000
max 35000.000000
Name: Price, dtype: float64
You’ll notice that the output contains 6 decimal places. You may then add the syntax of astype (int)
to the code to get integer values.
count 5
mean 27600
std 4878
min 22000
25% 25000
50% 27000
75% 29000
[Link] 23/29
2/23/2021 Pendahuluan [Link] - Colaboratory
max 35000
Name: Price, dtype: int64
stats_categorical = df['Brand'].describe()
print (stats_categorical)
count 5
unique 4
top Toyota Corolla
freq 2
Name: Brand, dtype: object
stats = [Link](include='all')
print (stats)
1. Count
2. Mean
3. Standard deviation
4. Minimum
5. 0.25 Quantile
6. 0.50 Quantile (Median)
7. 0.75 Quantile
8. Maximum
count1 = df['Price'].count()
print('count: ' + str(count1))
mean1 = df['Price'].mean()
print('mean: ' + str(mean1))
std1 = df['Price'].std()
print('std: ' + str(std1))
min1 = df['Price'].min()
print('min: ' + str(min1))
quantile1 = df['Price'].quantile(q=0.25)
print('25%: ' + str(quantile1))
quantile2 = df['Price'].quantile(q=0.50)
print('50%: ' + str(quantile2))
quantile3 = df['Price'].quantile(q=0.75)
print('75%: ' + str(quantile3))
max1 = df['Price'].max()
print('max: ' + str(max1))
[Link] 25/29
2/23/2021 Pendahuluan [Link] - Colaboratory
How to plot a DataFrame using Pandas follow the complete steps to plot:
1. Scatter diagram
2. Line chart
3. Bar chart
4. Pie chart
import pandas as pd
import [Link] as plt
df = [Link](data,columns=['Unemployment_Rate','Stock_Index_Price'])
[Link](x ='Unemployment_Rate', y='Stock_Index_Price', kind = 'scatter')
[Link]()
import pandas as pd
import [Link] as plt
[Link] 26/29
2/23/2021 Pendahuluan [Link] - Colaboratory
df = [Link](data,columns=['Year','Unemployment_Rate'])
[Link](x ='Year', y='Unemployment_Rate', kind = 'line')
[Link]()
import pandas as pd
import [Link] as plt
df = [Link](data,columns=['Country','GDP_Per_Capita'])
[Link](x ='Country', y='GDP_Per_Capita', kind = 'bar')
[Link]()
[Link] 27/29
2/23/2021 Pendahuluan [Link] - Colaboratory
import pandas as pd
import [Link] as plt
[Link] 28/29
2/23/2021 Pendahuluan [Link] - Colaboratory
[Link] 29/29