0% found this document useful (0 votes)
3 views18 pages

FDS LAB Programs

The document provides a comprehensive guide on using NumPy and pandas for data manipulation in Python. It covers creating and manipulating NumPy arrays, including operations like reshaping, slicing, stacking, and sorting, as well as performing similar operations with pandas DataFrames, such as concatenation, filtering, and handling NaN values. Additionally, it includes examples for reading text files into pandas DataFrames.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views18 pages

FDS LAB Programs

The document provides a comprehensive guide on using NumPy and pandas for data manipulation in Python. It covers creating and manipulating NumPy arrays, including operations like reshaping, slicing, stacking, and sorting, as well as performing similar operations with pandas DataFrames, such as concatenation, filtering, and handling NaN values. Additionally, it includes examples for reading text files into pandas DataFrames.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1.

Creating a NumPy Array

• Basic ndarray

• Array of zeros

• Array of ones

• Random numbers in ndarray

• An array of your choice

• Imatrix in NumPy

• Evenly spaced ndarray

• Basic ndarray

import numpy as np

arr=[Link]([1,2,3,4,5])

print(arr)

#output

[1 2 3 4 5]

• Array of zeros

zeros_arr = [Link]((3, 3))

print(zeros_arr)

• Array of ones

ones_arr = [Link]((2, 4)) # 2x4 matrix of ones

print(ones_arr)

#output

[[1. 1. 1. 1.]

[1. 1. 1. 1.]]

• Random numbers in ndarray

random_arr = [Link](3, 3) # 3x3 matrix with random values

print(random_arr)

#output

[[0.5488135 0.71518937 0.60276338]

[0.54488318 0.4236548 0.64589411]

[0.43758721 0.891773 0.96366276]]


• An array of your choice

custom_arr = [Link]([[10, 20, 30], [40, 50, 60]]) # Custom 2x3 array

print(custom_arr)

#output

[[10 20 30]

[40 50 60]]

• matrix in NumPy

identity_matrix = [Link](4) # 4x4 Identity matrix

print(identity_matrix)

#output

[[1. 0. 0. 0.]

[0. 1. 0. 0.]

[0. 0. 1. 0.]

[0. 0. 0. 1.]]

• Evenly spaced ndarray

evenly_spaced = [Link](0, 10, 5) # 5 evenly spaced values from 0 to 10

print(evenly_spaced)

#output

[ 0. 2.5 5. 7.5 10. ]


2. The Shape and Reshaping of NumPy Array

• Dimensions of NumPy array

• Shape of NumPy array

• Size of NumPy array

• Reshaping a NumPy array

• Flattening a NumPy array

• Transpose of a NumPy array

• Dimensions of NumPy array

The number of dimensions (axes) in an array is given by the ndim attribute:

import numpy as np

# Creating a 2-D array

array_2d = [Link]([[1, 2, 3], [4, 5, 6]])

print(array_2d.ndim)

• Shape of NumPy array

The shape of an array, representing the size of each dimension, is accessible via the shape attribute:

print(array_2d.shape)

# Output: (2, 3)

3. Size of a NumPy Array

The total number of elements in the array is obtained using the size attribute:

print(array_2d.size)

# Output: 6

• Reshaping a NumPy array

To change the shape of an array without altering its data, use the reshape() method. The new shape
must be compatible with the original size:

array_1d = [Link]([1, 2, 3, 4, 5, 6])

array_2d = array_1d.reshape((2, 3))

print(array_2d)

# Output:

# [[1 2 3]
# [4 5 6]]

Alternatively, you can use the [Link]() function:

array_2d = [Link](array_1d, (2, 3))

print(array_2d)

# Output:

# [[1 2 3]

# [4 5 6]]

• Flattening a NumPy array

To convert a multi-dimensional array into a 1-D array, use the flatten() method or ravel() function:

array_2d = [Link]([[1, 2, 3], [4, 5, 6]])

array_1d = array_2d.flatten()

print(array_1d)

# Output: [1 2 3 4 5 6]

• Transpose of a NumPy array

To transpose an array (swap rows and columns), use the T attribute:

array_2d = [Link]([[1, 2, 3], [4, 5, 6]])

array_transposed = array_2d.T

print(array_transposed)

# Output:

# [[1 4]

# [2 5]

# [3 6]]
3. Expanding and Squeezing a NumPy Array

• Expanding a NumPy array

• Squeezing a NumPy array

• Sorting in NumPy Array

• Expanding a NumPy array

Method : np.expand_dims()

Syntax:
np.expand_dims(a, axis)
a: The input array.
axis: The position in the result shape where the new axis will be added

Example:
import numpy as np
# Original 1D array arr = [Link]([1, 2, 3, 4])
print("Original array:", arr)
# Expanding the array to 2D (axis=0)
expanded_arr = np.expand_dims(arr, axis=0)
print("Expanded array (axis=0):", expanded_arr)
# Expanding the array to 2D (axis=1)
expanded_arr_2 = np.expand_dims(arr, axis=1)
print("Expanded array (axis=1):", expanded_arr_2)

Output:

Original array: [1 2 3 4]
Expanded array (axis=0): [[1 2 3 4]]
Expanded array (axis=1): [[1]

• Squeezing a NumPy array

Syntax:
[Link](a, axis=None)
• a: The input array.
• axis: Optional. If specified, only the dimensions with size 1 at that axis will be removed. If
None (default), all singleton dimensions (size 1) will be removed.

Example:
Basic Squeezing import numpy as np
# Creating a 3D array with shape (1, 4, 1)
arr = [Link]([[[1], [2], [3], [4]]])
print("Original array shape:", [Link])
print("Original array:\n", arr)
# Squeezing the array (removes the singleton dimensions)
squeezed_arr = [Link](arr)
print("Squeezed array shape:", squeezed_arr.shape)
print("Squeezed array:", squeezed_arr)

Output
Original array shape: (1, 4, 1)
Original array:
[[[1]
[2]
[3]
[4]]]
Squeezed array shape: (4,)
Squeezed array: [1 2 3 4]

• Sorting in NumPy Arrays

Methods for Sorting in NumPy:

[Link](): Returns a sorted copy of the array.

[Link](): Sorts the array in-place, modifying the original array.

[Link]()

import numpy as np

arr = [Link]([3, 1, 4, 1, 5, 9, 2])

sorted_arr = [Link](arr)

print("Original array:", arr)

print("Sorted array:", sorted_arr)

Output:

Original array: [3 1 4 1 5 9 2]

Sorted array: [1 1 2 3 4 5 9]

2. [Link]() (In-place sorting)

arr = [Link]([3, 1, 4, 1, 5, 9, 2])

[Link]()

print("Array after in-place sorting:", arr)

Output:

Array after in-place sorting: [1 1 2 3 4 5 9]


4. Indexing and Slicing of NumPy Array

• Slicing 1-D NumPy arrays

• Slicing 2-D NumPy arrays

• Slicing 3-D NumPy arrays

• Negative slicing of NumPy arrays

• Slicing 1-D NumPy arrays

import numpy as np

# Create a 1-D array


arr = [Link]([10, 20, 30, 40, 50, 60, 70])

# Basic Slicing: Elements from index 1 to 3 (excluding 4)


sliced_arr1 = arr[1:4]
print("Basic Slicing:", sliced_arr1)

# Slicing with Step: Elements from index 0 to 6 with a step of 2


sliced_arr2 = arr[0:7:2]
print("Slicing with Step:", sliced_arr2)

# Slicing from the End: Last three elements


sliced_arr3 = arr[-3:]
print("Slicing from the End:", sliced_arr3)

• Slicing 2-D NumPy arrays

import numpy as np

# Create a 2-D array


arr = [Link]([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Basic Slicing: Slice elements from rows 0 to 1 and columns 1 to 2


sliced_arr1 = arr[0:2, 1:3]
print("Basic Slicing:", sliced_arr1)

# Slicing with Step: Slice every other element from rows and columns
sliced_arr2 = arr[::2, ::2]
print("Slicing with Step:", sliced_arr2)

# Slicing Rows or Columns: Slice the second row


sliced_arr3 = arr[1, :]
print("Slicing Rows:", sliced_arr3)
• Slicing 3-D NumPy arrays

import numpy as np

# Create a 3-D array


arr = [Link]([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]], [[13, 14, 15], [16, 17, 18]]])

# Basic Slicing: Slice elements from 0 to 2 in the first dimension, 0 to 1 in the second
dimension, and all elements in the third dimension
sliced_arr1 = arr[0:2, 0:1, :]
print("Basic Slicing:", sliced_arr1)

# Slicing with Step: Slice elements with a step of 2 in the first dimension, and all elements in
the other dimensions
sliced_arr2 = arr[::2, :, :]
print("Slicing with Step:", sliced_arr2)

# Slicing Specific Rows or Columns: Slice the second row from each 2-D array
sliced_arr3 = arr[:, 1, :]
print("Slicing Specific Rows:", sliced_arr3)

# Slicing Specific Rows or Columns: Slice the second column from each 2-D array
sliced_arr4 = arr[:, :, 1]
print("Slicing Specific Columns:", sliced_arr4)

• Negative slicing of NumPy arrays

import numpy as np

# Negative Slicing in 1-D Arrays


arr = [Link]([10, 20, 30, 40, 50])
print(arr[-3:]) # Output: [30 40 50]

# Negative Slicing in 2-D Arrays


arr_2d = [Link]([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr_2d[-2:, -2:]) # Output: [[5 6] [8 9]]
5. Stacking and Concatenating Numpy Arrays
• Stacking ndarrays
• Concatenating ndarrays
• Broadcasting in Numpy Arrays

• Stacking ndarrays

import numpy as np

# Using [Link]()
array1 = [Link]([1, 2, 3])
array2 = [Link]([4, 5, 6])

# Stack arrays along rows (axis=0)


result_rows = [Link]((array1, array2), axis=0)
print("Stacked array along rows (axis=0):\n", result_rows)

# Output:
# Stacked array along rows (axis=0):
# [[1 2 3] # [4 5 6]]

• Concatenating ndarrays

# Using [Link]()
import numpy as np

# Create two 1D arrays


array1 = [Link]([1, 2, 3])
array2 = [Link]([4, 5, 6])

# Concatenate arrays
result = [Link]((array1, array2))
print("Concatenated array:\n", result)

# Output:
# Concatenated array: # [1 2 3 4 5 6]

• Broadcasting in Numpy Arrays


1D Array and 2D Array

import numpy as np

# Create a 1D array
array1 = [Link]([1, 2, 3])
# Create a 2D array
array2 = [Link]([[4, 5, 6], [7, 8, 9]])

# Perform addition
result = array1 + array2
print(result)

6. Perform following operations using pandas


• Creating dataframe
• concat()
• Setting conditions
• Adding a new column

• Creating dataframe

import pandas as pd

# 1. Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = [Link](data)
print("DataFrame:")
print(df)

• concat()

import pandas as pd

# 1. Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = [Link](data)
print("DataFrame:")
print(df)

# 2. Using concat() to merge two DataFrames


data2 = {'Name': ['David', 'Eve'],
'Age': [28, 22],
'City': ['Houston', 'Miami']}
df2 = [Link](data2)
merged_df = [Link]([df, df2], ignore_index=True)
print("\nConcatenated DataFrame:")
print(merged_df)
• Setting conditions

import pandas as pd

# 1. Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
filtered_df = df[df['Age'] > 25]
print("\nFiltered DataFrame (Age > 25):")
print(filtered_df)

• Adding a new column

import pandas as pd

# 1. Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df['Salary'] = [50000, 60000, 70000]
print("\nDataFrame with New Column (Salary):")
print(df)

7. Perform following operations using pandas


• Filling NaN with string
• Sorting based on column values
• groupby()

• Filling NaN with string

import pandas as pd
data_with_nan = {'Name': ['Alice', 'Bob', None],
'Age': [25, None, 35],
'City': ['New York', 'Los Angeles', None]}
df_nan = [Link](data_with_nan)
df_filled = df_nan.fillna('Unknown')
print("\nDataFrame with NaN filled:")
print(df_filled)

• Sorting based on column values

import pandas as pd
# 1. Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = [Link](data)
df_sorted = df.sort_values(by='Age', ascending=True)
print("\nSorted DataFrame by Age:")
print(df_sorted)

• groupby()

Import pandas as pd
data_group = {'Category': ['A', 'B', 'A', 'B', 'A'],
'Values': [10, 20, 30, 40, 50]}
df_group = [Link](data_group)
grouped_df = df_group.groupby('Category').sum()
print("\nGrouped DataFrame by Category:")
print(grouped_df)

8. Read the following file formats using pandas


• Text files
1. [Link]:
Name Age City
Alice 25 New York
Bob 30 Los Angeles

2. Program:
import pandas as pd

# Read a text file (e.g., tab-separated)


df_text = pd.read_csv('[Link]', sep='\t') # Use sep=' ' for space-separated files

print("Text File Data:")


print(df_text)
• CSV files
 [Link]:
Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
 Program:
# Read a CSV file
df_csv = pd.read_csv('[Link]')

print("\nCSV File Data:")


print(df_csv)
• Excel files
 [Link]:

 Program:
# Read an Excel file
df_excel = pd.read_excel('[Link]', sheet_name='Sheet1') # Replace 'Sheet1' with your sheet name

print("\nExcel File Data:")


print(df_excel)
• JSON files
 [Link]:
[
{"Name": "Alice", "Age": 25, "City": "New York"},
{"Name": "Bob", "Age": 30, "City": "Los Angeles"}
]
 Program:
# Read a JSON file
df_json = pd.read_json('[Link]')

print("\nJSON File Data:")


print(df_json)
9. Read the following file formats
[Link] (Pickle File)
A serialized DataFrame or Python object.
• Pickle files
import pandas as pd
# Read a Pickle file
df_pickle = pd.read_pickle('[Link]')
print("Pickle File Data:")
print(df_pickle)

[Link] (Image File)


Any image file (e.g., JPEG, PNG).

pip install Pillow

• Image files using PIL


from PIL import Image
# Read an image file
image = [Link]('[Link]')
# Display image properties
print("\nImage File Properties:")
print(f"Format: {[Link]}, Size: {[Link]}, Mode: {[Link]}")
# Show the image (optional)
[Link]()
• Multiple files using Glob

[Link]:
Name,Age,City
Alice,25,New York
Bob,30,Los Angeles
[Link]:
Name,Age,City
Charlie,35,Chicago
David,40,Houston

import pandas as pd
import glob
# Read multiple CSV files matching a pattern
file_paths = [Link]('data_*.csv') # Matches files like data_1.csv, data_2.csv, etc.
df_list = [pd.read_csv(file) for file in file_paths]
df_combined = [Link](df_list, ignore_index=True)
print("\nCombined Data from Multiple Files:")
print(df_combined)

• Importing data from database

import pandas as pd
import sqlite3
# Connect to the SQLite database
conn = [Link]('[Link]')
# Query the database
query = "SELECT * FROM Dept"
df_db = pd.read_sql(query, conn)
# Close the connection
[Link]()
print("\nData from Database:")
print(df_db)

10. Demonstrate web scraping using python


pip install requests beautifulsoup4
Program:
#Reading a Web Page with Beautiful Soup
from bs4 import BeautifulSoup
import requests

# Fetch the webpage


url = '[Link]
response = [Link](url)

# Parse with BeautifulSoup


bs = BeautifulSoup([Link], '[Link]')

# Print the page title


print("Page Title:", [Link])

# Extract and print all links


print("\nAll Links:")
for link in bs.find_all('a'):
print([Link]('href')) # Print only the href attribute

# Extract and print all paragraphs


print("\nAll Paragraphs:")
for paragraph in bs.find_all('p'):
print([Link]) # Extract text from <p> tags

11. Perform following preprocessing techniques on loan prediction dataset


pip install pandas scikit-learn
Sample loan prediction dataset:
import pandas as pd

# Sample loan prediction dataset


data = {
'LoanAmount': [5000, 6000, 7000, 8000, 9000],
'CreditScore': [650, 700, 750, 800, 850],
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
'Married': ['Yes', 'No', 'Yes', 'No', 'Yes'],
'LoanStatus': ['Approved', 'Rejected', 'Approved', 'Rejected', 'Approved']
}

df = [Link](data)
print("Original Dataset:")
print(df)

• Feature Scaling
import pandas as pd
from [Link] import MinMaxScaler

# Sample loan prediction dataset


data = {
'LoanAmount': [5000, 6000, 7000, 8000, 9000],
'CreditScore': [650, 700, 750, 800, 850],
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
'Married': ['Yes', 'No', 'Yes', 'No', 'Yes'],
'LoanStatus': ['Approved', 'Rejected', 'Approved', 'Rejected', 'Approved']
}

# Define the DataFrame before using it


df = [Link](data)

# Feature Scaling
scaler = MinMaxScaler()
df['LoanAmount_Scaled'] = scaler.fit_transform(df[['LoanAmount']])

# Output
print("\nDataset after Feature Scaling:")
print(df[['LoanAmount', 'LoanAmount_Scaled']])
• Feature Standardization
import pandas as pd
from [Link] import StandardScaler

# Sample loan prediction dataset


data = {
'LoanAmount': [5000, 6000, 7000, 8000, 9000],
'CreditScore': [650, 700, 750, 800, 850],
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
'Married': ['Yes', 'No', 'Yes', 'No', 'Yes'],
'LoanStatus': ['Approved', 'Rejected', 'Approved', 'Rejected', 'Approved']
}

# Define the DataFrame before using it


df = [Link](data)

# Feature Standardization
standard_scaler = StandardScaler()
df['CreditScore_Standardized'] = standard_scaler.fit_transform(df[['CreditScore']])

# Output the result


print("\nDataset after Feature Standardization:")
print(df[['CreditScore', 'CreditScore_Standardized']])

• Label Encoding
from [Link] import LabelEncoder
import pandas as pd

# Sample loan prediction dataset


data = {
'LoanAmount': [5000, 6000, 7000, 8000, 9000],
'CreditScore': [650, 700, 750, 800, 850],
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
'Married': ['Yes', 'No', 'Yes', 'No', 'Yes'],
'LoanStatus': ['Approved', 'Rejected', 'Approved', 'Rejected', 'Approved']
}

# Define the DataFrame before using it


df = [Link](data)
# Label Encoding for LoanStatus
label_encoder = LabelEncoder()
df['LoanStatus_Encoded'] = label_encoder.fit_transform(df['LoanStatus'])
print("\nDataset after Label Encoding:")
print(df[['LoanStatus', 'LoanStatus_Encoded']])

• One Hot Encoding


import pandas as pd

# Sample loan prediction dataset


data = {
'LoanAmount': [5000, 6000, 7000, 8000, 9000],
'CreditScore': [650, 700, 750, 800, 850],
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
'Married': ['Yes', 'No', 'Yes', 'No', 'Yes'],
'LoanStatus': ['Approved', 'Rejected', 'Approved', 'Rejected', 'Approved']
}

df = [Link](data)
# One Hot Encoding for Gender and Married
df_one_hot = pd.get_dummies(df, columns=['Gender', 'Married'], drop_first=True)
print("\nDataset after One Hot Encoding:")
print(df_one_hot)

12. Perform following visualizations using matplotlib


pip install matplotlib
Sample data:
import [Link] as plt
import numpy as np

# Sample data for different plots


categories = ['A', 'B', 'C', 'D']
values = [23, 45, 56, 78]
sizes = [15, 30, 45, 10] # For pie chart
x = [Link](0, 10, 100)
y = [Link](x)

# Bar Graph
[Link](categories, values, color='skyblue')
[Link]('Bar Graph')
[Link]('Categories')
[Link]('Values')
[Link]()

# Pie Chart
[Link](sizes, labels=categories, autopct='%1.1f%%', startangle=90, colors=['gold',
'lightcoral', 'lightgreen', 'lightskyblue'])
[Link]('Pie Chart')
[Link]()

# Box Plot
data = [[Link](0, std, 100) for std in range(1, 4)]
[Link](data, vert=True, patch_artist=True, labels=['A', 'B', 'C'])
[Link]('Box Plot')
[Link]()

# Histogram
data = [Link](170, 10, 250)
[Link](data, bins=30, color='lightgreen', edgecolor='black')
[Link]('Histogram')
[Link]('Values')
[Link]('Frequency')
[Link]()
# Line Chart and Subplots
fig, (ax1, ax2) = [Link](1, 2, figsize=(10, 5))
# Subplot 1: Line Chart
[Link](x, y, color='blue', label='sin(x)')
ax1.set_title('Line Chart')
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
[Link]()
# Subplot 2: Another Line Chart
[Link](x, [Link](x), color='red', label='cos(x)')
ax2.set_title('Another Line Chart')
ax2.set_xlabel('X')
ax2.set_ylabel('Y')
[Link]()
plt.tight_layout()
[Link]()

# Scatter Plot
x = [Link](50)
y = [Link](50)
colors = [Link](50)
sizes = 1000 * [Link](50)
[Link](x, y, c=colors, s=sizes, alpha=0.6, cmap='viridis')
[Link]()
[Link]('Scatter Plot')
[Link]('X')
[Link]('Y')
[Link]()

You might also like