Data Science Laboratory Manual CS3361
Data Science Laboratory Manual CS3361
LABORATORY MANUAL
[Link] : CS3361
[Link] : DATA SCIENCE LABORATORY
Regulation : R2021
1
INSTITUTE MISSION
To become a centre of excellence in preparing engineering with excellent technical, scientific
research and entrepreneurial abilities to contribute to the society.
INSTITUTE MISSION
DEPARTMENT VISION
To become a centre of excellence in technical education and scientific research in the field of
Computer Science and Engineering for the wellbeing of the society.
DEPARTMENT MISSION
Producing graduates with a strong theoretical and practical in computer
1 technology
to meet the Industry expectation.
Offering holistic learning ambience for faculty and students to investigate, apply
2 and
transfer knowledge.
Inculcating interpersonal traits among the students leading to employability and
3
entrepreneurship.
4 Establishing effective linkage with the Industries for the mutual benefits
Strengthening Research activities to solve the problems related to industry and
5
society.
2
SYLLABUS
COURSE OBJECTIVES :
● To understand the python libraries for data science
● To understand the basic Statistical and Probability measures for data science.
● To learn descriptive analytics on the benchmark data sets.
● To apply correlation and regression analytics on standard data sets.
● To present and interpret data using visualization packages in Python.
EXPERIMENTS
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation,
Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap
TOTAL: 60 Periods
CONTENT BEYOND SYLLABI: Hadoop, Apache spark
3
COURSE OUTCOMES:
On completion of the course, students will be able to:
CO1: Make use of the python libraries for data science
CO2: Make use of the basic Statistical and Probability measures for data science.
CO3: Perform descriptive analytics on the benchmark data sets.
CO4: Perform correlation and regression analytics on standard data sets
CO5: Present and interpret data using visualization packages in Python.
● INTEL based desktop PC with min. 8GB RAM and 500 GB HDD, 17” or higher TFT Monitor,
Keyboard and mouse
● Windows 10 or higher operating system / Linux Ubuntu 20 or higher
● Python3.9 and above, Python, Numpy, Scipy, Matplotlib, Pandas, seaborn, Pycharm
4
List of Experiments
S
List of Experiments Page no
l
.
N
o
Download, install and explore the features of NumPy, SciPy, 6-9
1. Jupyter, Statsmodels and Pandas packages.
d. Also compare the results of the above analysis for the two 35-37
data sets.
Apply and explore various plotting functions on UCI data
43-44
sets.
a. Normal curves
b. Density and contour plots 45-47
6
. c. Correlation and scatter plots 48-52
d. Histograms 53-54
5
[Link] Download, install and explore the features of NumPy, SciPy, Jupyter,
Statsmodels and Pandas packages.
AIM
To Download and install python and its packages using pip installation
PROCEDURE
Install Python Data Science Packages
Python is a high-level and general-purpose programming language with data science and machine learning
packages. Use the video below to install on Windows, MacOS, or Linux. As a first step, install Python for
Windows, MacOS, or Linux.
Python Packages
The power of Python is in the packages that are available either through the pip or conda package managers.
This page is an overview of some of the best packages for machine learning and data science and how to
install them.
We will explore the Python packages that are commonly used for data science and machine learning. You
may need to install the packages from the terminal, Anaconda prompt, command prompt, or from the Jupyter
Notebook. If you have multiple versions of Python or have specific dependencies then use an environment
manager such as pyenv. For most users, a single installation is typically sufficient. The Python package
manager pip has all of the packages (such as gekko) that we need for this course. If there is an administrative
access error, install to the local profile with the --user flag.
pip install gecko
Gekko
Gekko provides an interface to gradient-based solvers for machine learning and
optimization of mixed-integer, differential algebraic equations, and time series models.
Gekko provides exact first and second derivatives through automatic differentiation and
discretization with simultaneous or sequential methods.
pip install gecko
6
Keras
Keras provides an interface for artificial neural networks. Keras acts as an interface for the
TensorFlow library. Other backend packages were supported until version 2.4.
TensorFlow is now the only backend and is installed separately with pip install
tensorflow.
pip install
keras
Matplotlib
The package matplotlib generates plots in Python.
Numpy
Numpy is a numerical computing package for mathematics, science, and engineering.
Many data science packages use Numpy as a dependency.
pip install numpy
OpenCV
OpenCV (Open Source Computer Vision Library) is a package for real-time computer vision
and developed with support from Intel Research.
pip install opencv-python
Pandas
Pandas visualizes and manipulates data tables. There are many functions that allow
efficient manipulation for the preliminary steps of data analysis problems.
pip install pandas
Plotly
Plotly renders interactive plots with HTML and JavaScript. Plotly Express is included with
Plotly.
pip install plotly
7
PyTorc
Scikit-Learn
Scikit-Learn (or sklearn) includes a wide variety of classification, regression and clustering
algorithms including neural network, support vector machine, random forest, gradient
boosting, k-means clustering, and other supervised or unsupervised learning methods.
pip install scikit-learn
SciPy
SciPy is a general-purpose package for mathematics, science, and engineering and extends
the base capabilities of NumPy.
pip install scipy
Seaborn
Seaborn is built on matplotlib, and produces detailed plots in few lines of code.
Statsmodels
Statsmodels is a package for exploring data, estimating statistical models, and performing
statistical tests. It include descriptive statistics, statistical tests, plotting functions, and
result statistics.
pip install statsmodels
TensorFlow
TensorFlow is an open source machine learning platform with particular focus on
training and inference of deep neural networks. Development is led by the Google Brain
team. pip install tensorflow
8
Augmented Questions :
1. How would you approach optimizing a Python program for performance? Discuss
techniques for profiling, identifying bottlenecks, and improving efficiency. Provide
examples of how you might apply these techniques to a data processing task.
2. In the context of data analysis, what are some best practices for ensuring data quality and
integrity? Explain how you would handle missing data, outliers, and data inconsistencies
in a dataset before performing any analysis.
Viva Questions:
1. How do you install NumPy, SciPy, Jupyter, Statsmodels, and Pandas in a Python environment?
2. Can you explain the primary functionalities of NumPy and how it is useful in scientific computing?
3. What are some common functions and features provided by SciPy, and how does it extend
NumPy’s capabilities?
4. What is Jupyter Notebook, and how does it facilitate interactive computing and data analysis?
5. Describe how Pandas and Statsmodels are used for data analysis and statistical modeling in Python.
RESULT:
Thus the download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandas packages was successfully completed.
9
[Link] Working with Numpy arrays
AIM
Dimensions in Arrays
0-D Arrays
0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.
Example
Create a 0-D array with value 42
import numpy as np
10
arr = [Link](42)
11
print(arr)
1-D Arrays
An array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.
These are the most common and basic arrays.
Example
Create a 1-D array containing the values 1,2,3,4,5:
import numpy as np
arr = [Link]([1, 2, 3, 4, 5])
print(arr)
2-D Arrays
An array that has 1-D arrays as its elements is called a 2-D array. These are often used to
represent matrix or 2nd order tensors.
Example
Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:
import numpy as np
arr = [Link]([[1, 2, 3], [4, 5, 6]])
print(arr)
3-D arrays
An array that has 2-D arrays (matrices) as its elements is called 3-D array.
These are often used to represent a 3rd order tensor.
Example
Create a 3-D array with two 2-D arrays, both containing two arrays with the values
1,2,3 and 4,5,6:
import numpy as np
arr = [Link]([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
12
Check Number of Dimensions?
NumPy Arrays provides the ndim attribute that returns an integer that tells us how many
dimensions the array have.
Example
13
OUTPUT:
14
Augmented Questions :
1. Write a Python program using NumPy to create a 3D array of shape (4, 3, 2) with
random integers between 0 and 10. Perform the following tasks:
2. Write a Python program using NumPy to perform the following tasks with two 1D arrays:
● Create two 1D arrays of length 10 with random integers between 1 and 20.
● Compute and print their dot product.
● Compute the element-wise product of the two arrays.
● Normalize the element-wise product by dividing it by the maximum value of the product
Viva Questions:
2. What is the difference between a NumPy array and a Python list, and why would you use a
NumPy array for numerical computations?
3. How can you create a NumPy array from a Python list or tuple? Provide an example.
4. Describe the various methods to access and manipulate elements in a NumPy array. How can
you perform slicing and indexing on a 2D array?
5. How do you perform basic arithmetic operations (such as addition, subtraction, multiplication,
and division) on NumPy arrays? What are some benefits of using NumPy's vectorized operations
over traditional loops?
6. Explain how broadcasting works in NumPy and give an example of how it can be used to
perform operations on arrays of different shapes.
RESULT
Thus the working of Numpy arrays was executed successfully.
15
[Link] Working with Pandas data frames
AIM:
The aim is to illustrate the basic operations of creating, indexing, and loading data into a Pandas
DataFrame. This includes creating a simple DataFrame, locating specific rows using index labels,
adding named indexes, and loading data from an external CSV file into a DataFrame.
ALGORITHM:
Example
16
to print row 0 alone
17
#refer to the row index:
print([Link][0])
Example
Return row 0 and 1:
#use a list of indexes:
print([Link][[0, 1]])
Named Indexes
With the index argument, you can name your own indexes.
Example
Add a list of names to give each row a name:
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = [Link](data, index = ["day1", "day2",
"day3"]) print(df)
Locate Named Indexes
Use the named index in the loc attribute to return the specified row(s).
Example
Return "day2":
#refer to the named index:
print([Link]["day2"])
import pandas as pd
file_path = r'C:\Users\SRM\Downloads\[Link]'
df = pd.read_csv(file_path)
print(df)
18
OUTPUT:
19
AUGMENTED QUESTIONS:
1. Write a Python program using Pandas to analyze a dataset of customer transactions. The
dataset includes columns for 'CustomerID', 'TransactionDate', 'Amount', and 'Category'.
Perform the following tasks:
2. Create a Python program using Pandas to work with a dataset of employee records. The
dataset contains columns 'EmployeeID', 'Name', 'Department', 'JoiningDate', and 'Salary'.
Perform the following operations:
VIVA QUESTIONS:
1. How can you create a Pandas DataFrame from a dictionary, and what are the key methods to
inspect the structure and content of the DataFrame? Provide an example.
2. Describe how you would handle missing data in a Pandas DataFrame. What methods are available
for detecting, removing, or imputing missing values?
3. How can you perform data filtering and selection in a Pandas DataFrame? Explain how to select
rows based on certain conditions and how to access specific columns.
4. What are some common operations for data aggregation and grouping in Pandas? How would you
use the groupby method to calculate summary statistics for different groups within a DataFrame?
5. Explain how to merge and join DataFrames in Pandas. What are the different types of joins
available, and how do you handle conflicts and overlapping column names during the merge process?
RESULT
Thus the working with pandas data frames was executed successfully.
20
[Link]: Reading data from text files, Excel and the web and exploring
4 various commands for doing descriptive analytics on the Iris data
set.
AIM
Perform descriptive analytics on the Iris dataset using Pandas and Seaborn, including data
reading, exploration, manipulation, summary statistics, visualization, and handling missing
values.
ALGORITHM
PROGRAM:
import pandas as pd
import seaborn as sns
import [Link] as plt
# Load the Iris dataset
file_path = r'C:\Users\SRM\Downloads\[Link]'
df = pd.read_csv(file_path)
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print([Link]())
# Check for missing values print("\
nMissing values in the dataset:")
print([Link]().sum())
# Handling missing values (if any)
df = [Link]() # Drop rows with missing values
# Summary statistics
print("\nSummary statistics:")
print([Link]())
# Data exploration and visualization
[Link](style="whitegrid")
# Pairplot to see the pairwise relationships
print("\nGenerating pairplot...")
[Link](df, hue='species')
[Link]("Pairplot of the Iris Dataset", y=1.02)
[Link]()
# Boxplot to see the distribution of each feature
print("\nGenerating boxplot...")
[Link](figsize=(10, 6))
[Link](data=df, width=0.5, palette="colorblind")
21
[Link]("Boxplot of Iris Features")
[Link]()
# Correlation heatmap
print("\nGenerating correlation heatmap...")
[Link](figsize=(8, 6))
[Link]([Link](), annot=True, cmap='coolwarm')
[Link]("Correlation Heatmap")
[Link]()
# Distribution of each species
print("\nGenerating countplot for species distribution...")
[Link](x='species', data=df, palette="Set2")
[Link]("Species Distribution")
[Link]()
OUTPUT:
22
23
24
AUGMENTED QUESTIONS :
1. Write a Python program to read data from a CSV file named [Link] and perform the
following tasks:
2. Write a Python program to read data from an Excel file named [Link] and perform
the following tasks:
VIVA QUESTIONS:
RESULT
Thus the Reading data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set was executed successfully.
25
[Link]. Standard Deviation, Skewness and Kurtosis of Pima Indians Diabetes
1 Dataset.
AIM:
Perform basic data exploration and descriptive statistics on the diabetes dataset using Pandas and
the Statistics module. This includes examining data structure, calculating mean, mode, median,
variance, standard deviation, value counts, skewness, and kurtosis.
ALGORITHM
1. Read the diabetes dataset from a CSV file using Pandas.
2. Display the first few rows, shape, and data type of the dataset.
3. Calculate descriptive statistics using the Statistics module:
4. Calculate mean, mode, median, variance, and
5. Calculate value counts for the "Outcome" column.
6. Calculate skewness and kurtosis for the entire dataset.
PROGRAM
import pandas as pd
import statistics
# Load the dataset
file_path = r'C:\Users\SRM\Downloads\[Link]' # Adjust the path to your file location
pima = pd.read_csv(file_path)
# Display the first few rows of the dataset
print("First few rows of the dataset:")
print([Link]())
# Print the shape of the dataset
print("\nShape of the
dataset:") print([Link])
# Print the type of the dataset
print("\nType of the dataset:")
print(type(pima))
# Print the index of the dataset
print("\nRow indices:")
pima_row_idx = [Link]
print(pima_row_idx)
# Print the columns of the dataset
print("\nColumn names:")
pima_col_idx = [Link]
print(pima_col_idx)
# Print the data types of the columns
print("\nData types of each column:")
26
print([Link])
# Calculate statistical measures
mean = [Link](pima["Insulin"])
mode = [Link](pima["Insulin"])
median = [Link](pima["Insulin"])
variance = [Link](pima["Outcome"])
standard_deviation = [Link](pima["Outcome"])
fre_count = pima["Outcome"].value_counts()
skew = [Link](axis=0, skipna=True)
kurt = [Link](skipna=True)
# Print the calculated statistical measures
print("\nMean of Insulin:", mean)
print("Mode of Insulin:", mode)
print("Median of Insulin:", median)
print("Variance of Outcome:", variance)
print("Standard Deviation of Outcome:", standard_deviation) print("\
nFrequency count of Outcome:")
print(fre_count)
print("\nSkewness of each column:")
print(skew)
print("\nKurtosis of each column:")
print(kurt)
OUTPUT:
27
RESULT:
Thus the Standard Deviation, Skewness and Kurtosis of Pima Indians Diabetes Dataset was executed
successfully.
28
[Link]. Univariate analysis: Frequency, Mean, Median, Mode, Variance,Standard
2 Deviation, Skewness and Kurtosis of UCI Diabetes Dataset.
AIM:
ALGORITHM:
1. Read the diabetic dataset from a CSV file using Pandas.
2. Display the first few rows, shape, and data type of the dataset.
3. Retrieve and print row and column indices.
4. Calculate descriptive statistics using the Statistics module.
PROGRAM
import pandas as pd
import statistics
# Load the dataset
file_path = r'C:\Users\SRM\Downloads\[Link]' # Adjust the path to your file location
pima = pd.read_csv(file_path)
# List of columns to analyze
columns = [Link]
# Univariate analysis
for column in columns:
print(f"\nAnalysis for column:
{column}") # Frequency
frequency = pima[column].value_counts()
print("Frequency:\n", frequency)
# Mean
mean = pima[column].mean()
print("Mean:", mean)
# Median
median = pima[column].median()
print("Median:", median)
# Mode
mode = pima[column].mode()[0] if not pima[column].mode().empty else "No mode"
print("Mode:", mode)
# Variance
variance = pima[column].var()
print("Variance:", variance)
29
# Standard Deviation
30
std_dev = pima[column].std()
print("Standard Deviation:", std_dev)
# Skewness
skewness = pima[column].skew()
print("Skewness:", skewness)
# Kurtosis
kurtosis = pima[column].kurt()
print("Kurtosis:", kurtosis)
OUTPUT:
31
32
33
RESULT:
Thus the Univariate analysis: Frequency, Mean, Median, Mode, Variance,Standard
Deviation, Skewness and Kurtosis of UCI Diabetes Dataset was executed successfully.
34
[Link]. Bivariate Analysis-Program for linear regression
3
AIM:
Explore the relationship between Glucose and Blood Pressure in the diabetes dataset using a scatter plot
and create a linear regression model to predict Age based on BMI.
ALGORITHIM:
1. Import necessary libraries: NumPy, Pandas, Seaborn, Statistics, Matplotlib, scikit-
learn, and statsmodels.
2. Read the diabetes dataset from a CSV file.
3. Select relevant columns for analysis (Pregnancies, Glucose,
BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction,
Age).
4. Create a scatter plot to visualize the relationship between Glucose and
Blood Pressure.
5. Extract features (X) and target variable (Y) for linear regression (e.g., Age vs BMI).
6. Use scikit-learn'sLinearRegression to fit a linear model.
7. Display the scatter plot.
8. Fit a linear regression model to predict Age based on BMI using statsmodels.
PROGRAM
import numpy as np
import pandas as pd
import seaborn as sns
import [Link] as plt
from sklearn.linear_model import LinearRegression
35
[Link](X, Y)
OUTPUT:
36
RESULT:
Thus the Bivariate Analysis-Program for linear regression was executed successfully.
37
[Link]. Bivariate Analysis Logistic regression
4
AIM:
Perform logistic regression analysis on the diabetes dataset to predict the likelihood of
diabetes based on various independent variables. Evaluate the model's performance using
classification report and confusion matrix.
ALGORITHM
1. Import necessary libraries: NumPy, Pandas, Seaborn,Matplotlib, statsmodels, and scikit-learn.
38
print("\nClassification Report:")
print(classification_report(Y_test, Y_pred))
print("\nConfusion Matrix:")
conf_matrix = confusion_matrix(Y_test, Y_pred)
print(conf_matrix)
OUTPUT:
39
RESULT:
Thus the Bivariate Analysis Logistic regression was executed successfully.
40
[Link]. MULTIPLE REGRESSION ANALYSIS
5
AIM:
The aim of the provided code is to analyze and visualize a dataset related to diabetes using Python and
various libraries such as pandas, seaborn, matplotlib, and statsmodels. The code explores the dataset,
calculates and visualizes the correlation matrix, generates a quantile-quantile (QQ) plot for the 'Age'
variable, and produces scatter matrices for different subsets of the data.
ALGORITHM:
PROGRAM:
import pandas as pd
import seaborn as sns
import [Link] as plt
import [Link] as sm
import pylab
41
[Link]('Correlation Heatmap')
[Link]()
OUTPUT:
42
43
44
AUGMENTED QUESTIONS:
1. Write a Python program to perform the following tasks using the UCI Diabetes dataset:
● Calculate and print the mean, median, and standard deviation of the 'Glucose' column.
● Fit a linear regression model to predict 'Outcome' based on 'Glucose' and print the model's R-
squared value.
2. Write a Python program to perform the following tasks using the Pima Indians Diabetes dataset:
● Calculate and print the frequency of unique values in the 'Outcome' column.
● Fit a logistic regression model to predict 'Outcome' based on 'Glucose' and print the model's
accuracy.
VIVA QUESTIONS :
1. How do you calculate the mean, median, and standard deviation for a column in the Diabetes
dataset?
2. What is the purpose of performing linear regression, and how would you apply it to the Diabetes
dataset?
3. How can you fit a logistic regression model to the Diabetes dataset and evaluate its performance?
4. What is multiple regression analysis, and how would you use it to analyze the Diabetes dataset?
5. How would you compare the results of your analysis between the UCI Diabetes dataset and the
Pima Indians Diabetes dataset?
RESULT:
The code produces visualizations and analyses, including a correlation matrix heatmap, a QQ plot for
'Age,' and scatter matrices, offering insights into variable relationships and distributions within the
diabetes dataset.
45
[Link] Apply and explore various plotting functions on UCI data sets.
a. Normal
curves AIM:
The aim of the provided Python code is to generate and visualize the probability
density function (PDF) of a normal distribution. The range for the x-axis values is set from -
20 to 20 with a step size of 0.01.
ALGORITHM:
1. Import necessary libraries: numpy for numerical operations, [Link]
for plotting, [Link] for the normal distribution, and statistics for
calculating mean and standard deviation (although, this is later corrected to use
numpy functions).
2. Create an array x_axis with values ranging from -20 to 20 with a step size of 0.01.
3. Incorrect calculation of mean and standard deviation:
● The initial code attempts to use [Link] and [Link] on the
x_axis array, which is not the correct approach. The correct approach is to
use [Link] and [Link].
4. Plot the normal distribution PDF:
● The code uses [Link] to plot the normal distribution PDF using
[Link] from the [Link] module. The mean and standard deviation
are used as parameters for the normal distribution.
5. Display the plot:
●The code uses [Link]() to display the generated plot.
PROGRAM
import numpy as np
import [Link] as plt
from [Link] import norm
import statistics
46
sd))
47
[Link]('Normal Distribution')
[Link]('x')
[Link]('Probability Density')
[Link](True)
[Link]()
RESULT:
The corrected code will generate a plot displaying the probability density function of
a normal distribution with mean and standard deviation calculated from the range of values
specified on the x-axis (-20 to 20 with steps of 0.01). The resulting plot visually represents
the distribution of the random variable within that range.
48
b. Density and contour plots
AIM:
The aim of this program is to use Python code is to create a 2D contour plot with filled contours and an
overlaid image of a mathematical function. The function `f(x, y)` is defined, and the contours of this
function are plotted on a grid using Matplotlib. Additionally, an image of the function is displayed using
`[Link]`, and a color bar is added for reference.
ALGORITHM:
1. Import necessary libraries:
- `[Link]` for plotting.
- `numpy` for numerical operations.
3. Define the function `f(x, y)` which represents a mathematical expression involving
sine, cosine, and exponentiation.
4. Generate evenly spaced values for `x` and `y` using `[Link]`.
5. Create a mesh grid (`X`, `Y`) using `[Link]` based on the generated `x` and
`y` values.
6. Evaluate the function `f(X, Y)` for each point on the grid and store the result in the variable
`Z`.
7. Plot black contours of the function using `[Link]` with a single line.
9. Add labeled contour lines using `[Link]` to provide information about the
contour levels.
10. Display an image of the function using `[Link]` with transparency (alpha=0.5) and
a specified colormap ('RdGy').
11. Add a color bar to the plot for reference using `[Link]()`.
49
PROGRAM:
import numpy as np
import [Link] as plt
50
Result:
The result of the code execution is a 2D contour plot with filled contours and an overlaid
image of the function defined by `f(x, y)`. The contours provide a visual representation of the
function's behavior, and the color bar helps to interpret the values associated with the
colormap. The overall plot combines different elements to present a comprehensive view of
the mathematical function in the specified range.
51
c. Correlation and scatter plots
AIM:
The aim of the provided Python code is to analyze and visualize the relationship
between the 'BloodPressure' and 'BMI' columns in a diabetes dataset using the Pandas,
Seaborn, and SciPy libraries. It includes loading the dataset, displaying its headers, creating
scatter plots, fitting a regression line, and calculating the correlation coefficient and
correlation matrix.
ALGORITHM:
PROGRAM:
import pandas as pd
import numpy as np
import seaborn as sns
import [Link] as plt
import [Link] as sm
52
cormat = [Link]()
53
print("\nCorrelation MATRIX:")
print(round(cormat, 2))
Output:
54
55
56
RESULT:
The code generates visualizations and statistical measures for understanding the association
between 'BloodPressure' and 'BMI' in the diabetes dataset. This includes scatter plots,
regression lines, correlation coefficient, and a correlation matrix heatmap.
57
His
tograms
AIM:
The aim of the provided Python code is to create a histogram of a given dataset (`x`)
and display its distribution.
ALGORITHM:
PROGRAM:
58
# Display the plot
[Link]()
59
RESULT:
The result of the code execution is a histogram that visualizes the distribution of the
data in the list `x`. The histogram is divided into 10 bins, providing insights into the
frequency of values within each interval. The visualization allows for a quick understanding
of the data's central tendency and spread.
60
E. THREE DIMENSIONAL PLOTTING
AIM:
The aim of this provided Python code is to create a three-dimensional (3D) plot using Matplotlib.
The code includes plotting a three-dimensional line and scattered points in a 3Dspace.
ALGORITHM:
1. Import necessary libraries: `mpl_toolkits.mplot3d`, `numpy`, and
2. `[Link]`.
3. Create a figure and 3D axes using `[Link]()` and `[Link](projection='3d')`.
a. Generate data for a three-dimensional line: `zline`, `xline`, and `yline`.
b. Plot the three-dimensional line using `ax.plot3D`.
c. Generate data for three-dimensional scattered points: `zdata`, `xdata`, and
`ydata`.
d. Plot the scattered points using `ax.scatter3D`. The color of the points (`c`) is
determined by the `zdata` values, and a colormap ('Greens') is applied for visual
representation.
PROGRAM:
import numpy as np
import [Link] as plt
from mpl_toolkits.mplot3d import Axes3D
# Show plot
61
[Link]()
62
63
AUGMENTED QUESTIONS :
1. Write a Python program to perform the following tasks using the UCI dataset:
● Normal Curves: Plot a normal distribution curve over a histogram for a numerical
column, such as 'Glucose'. Calculate and display the mean and standard deviation used for
plotting the normal curve.
● Density and Contour Plots: Create a density plot for the 'Glucose' and 'BMI'
columns, and overlay a contour plot to visualize the density regions.
2. Write a Python program to create the following visualizations using the UCI dataset:
VIVA QUESTIONS:
1. How would you plot a normal distribution curve for a numerical column in the UCI dataset?
Which Python libraries and functions can you use for this?
2. Explain how you can create a density plot and a contour plot for two numerical columns in
the UCI dataset. What insights can these plots provide?
3. Describe the process for generating a correlation plot and scatter plot between two features in
the UCI dataset. How do these plots help in understanding the relationship between features?
4. What steps would you take to create a histogram of a numerical feature in the UCI dataset?
What information can be derived from a histogram?
5. How can you create a three-dimensional plot using three numerical features from the UCI
dataset? Which Python functions are used for 3D plotting and what do these plots represent?
RESULT:
The code generates a 3D plot with a three-dimensional line and scattered points. The
line is defined by the functions `[Link]` and `[Link]`, and the scattered points have random
coordinates influenced by sine and cosine functions. The color of the scattered points varies
based on the `zdata` values, creating a visually appealing representation of the data in 3D
space
64
[Link] VISUALIZING GEOGRAPHIC DATA WITH BASEMAP
AIM:
The goal of the provided code is to visualize geographic data using the Basemap toolkitin Matplotlib.
Specifically, it creates maps with different projections, includes topographic features, and marks the location of
Seattle.
ALGORITHM:
1. Set up a Matplotlib figure and create a Basemap with Lambert conformal conic
projection, specifying parameters like width, height, central latitude, and longitude.
2. Overlay topographic features using the `etopo` method and add a point on the map
corresponding to Seattle.
3. Define a function `draw_map` to draw shaded-relief images and latitude/longitude
lines with specified styles.
4. Generate three different maps with varying projections: cylindrical projection
covering the entire world, repeated cylindrical projection, and Lambert conformal conic
projection focused on a specific region.
5. Display the maps using Matplotlib.
PROGRAM:
import [Link] as plt
import [Link] as ccrs
import [Link] as cfeature
# Add gridlines
[Link](draw_labels=True)
# Set title
[Link]('World Map with Cartopy')
65
OUTPUT:
66
AUGMENTED QUESTIONS :
1. Write a Python program using Basemap to create an interactive map that displays the locations of major
cities around the world. Include functionality to zoom in and out, and add labels to each city. How would
you integrate Basemap with other libraries to enhance interactivity?
2. Develop a Python script to visualize global climate data using Basemap. Create a map that displays
temperature anomalies with a color gradient. Integrate Basemap with data from a CSV file containing
latitude, longitude, and temperature anomaly values. How would you handle large datasets and ensure
efficient rendering of the map?
VIVA QUESTIONS:
1. What is the Basemap toolkit, and how is it used for visualizing geographic data in Python?
2. How would you plot a simple map of a specific region or country using Basemap? What are the
basic steps involved in creating such a map?
3. Explain how to overlay markers or data points on a Basemap. What functions or methods are used
to add these elements to the map?
4. How can you display geographic data such as country borders, rivers, or cities on a map using
Basemap? What are some common map features you can add?
5. Describe how to customize the appearance of a map created with Basemap, such as changing the
map's projection, adding gridlines, or adjusting the map's color scheme.
RESULT:
The code produces geographic visualizations, including a Lambert conformal conic
projection with topographic features, a cylindrical projection covering the entire world, and a
repeated cylindrical projection. The resulting maps showcase the versatility of the Basemap
toolkit for visualizing geographic data in Matplotlib.
67