Data visualization translates complex data sets into visual formats that are easier for the human
brain to understand. This can include a variety of visual tools.
Data visualization techniques use visual representations like charts and graphs to make data easier
to understand, analyze, and communicate, including pie charts, bar charts, scatter plots, heatmaps,
line charts, histograms, area charts, box plots, bubble charts, treemaps, maps, and network
diagrams.
Common Data Visualization Techniques:
● Pie Chart:
A circular chart that shows parts of a whole, with each slice representing a proportion.
● Bar Chart:
Uses rectangular bars to compare data across different categories, where the length of the bars
represents the value.
● Scatter Plot:
Displays data points on a Cartesian coordinate system, showing the relationship between two
variables.
● Heatmap:
Uses color to represent data values, often showing patterns or trends in large datasets.
● Line Chart:
Connects data points with lines to show trends and changes over time.
● Histogram:
A type of bar chart that shows the distribution of a single variable, with bars representing
frequency.
● Area Chart:
Similar to a line chart, but the area under the line is filled, emphasizing the magnitude of the data.
● Box Plot:
Provides a visual summary of the distribution of numerical data, showing the median, quartiles, and
outliers.
● Bubble Chart:
Uses bubbles to represent data, where the size of the bubble represents the value of the data, and
the color can represent another variable.
● Treemap:
Displays hierarchical data using nested rectangles, with the size of each rectangle representing the
value of the data.
● Map:
Visualizes data on a geographical map, showing locations and distributions.
● Network Diagram:
Shows relationships between different entities or nodes, connected by lines.
Key Features for Data Visualization with Pandas:
Pandas offers several features that make it a great choice for data visualization:
● Variety of Plot Types: Pandas supports various plot types including line plots, bar plots,
histograms, box plots, and scatter plots.
● Customization: Users can customize plots by adding titles, labels, and styling enhancing the
readability of the visualizations.
● Handling of Missing Data: Pandas efficiently handles missing data ensuring that
visualizations accurately represent the dataset without errors.
● Integration with Matplotlib: Pandas integrates with Matplotlib that allow users to create a
wide range of static, animated, and interactive plots.
Pandas library is primarily used for data manipulation and analysis but it also provides the data
visualization capabilities by using the Python's Matplotlib library support.
In Python, the Pandas library provides a basic method called .plot() for generating a wide variety of
visualizations along the different specialized plotting methods. These visualizations tools are built on
top of the Python's Matplotlib library, offering flexibility and customization options.
Types of Plots Available in Pandas
Pandas supports various plot types through the kind parameter or specialized plotting methods.
Following is the overview of the different plotting methods −
Specialized
Plot Type kind Value Use Case
Method
Line 'line' .line() Visualizing trends over time or a sequence.
Bar 'bar' .bar() Comparing quantities across categories.
Horizontal
'barh' .barh() Same as bar charts, but horizontal.
Bar
Histogram 'hist' .hist() Visualizing Distribution of numeric data.
Box Plot 'box' .box() Summarizing data distribution and outliers.
Area 'area' .area() Highlighting trends with cumulative data.
Scatter 'scatter' .scatter() Relationship between two variables, for DataFrame only.
Visualizing data density in two dimensions, for
Hexbin 'hexbin' .hexbin()
DataFrame only.
'kde' or
Density .kde() or .density() Smoothing data distributions (Kernel Density Estimation).
'density'
Pie 'pie' .pie() Proportional data in a circular graph.
Plotting Bar Plot with plot() method
Let us now see what a Bar Plot is by creating one.
import pandas as pd
import numpy as np
import [Link] as plt
# Creating a random DataFrame
df = [Link]([Link](10,4), columns=['a','b','c','d'])
# Plotting the bar plot
[Link](kind='bar')
[Link]()
output
To produce a stacked bar plot, pass stacked=True –
import pandas as pd
import numpy as np
import [Link] as plt
# Creating a random DataFrame
df = [Link]([Link](10,4), columns=['a','b','c','d'])
# Plotting the stacked Bar
plot [Link](kind='bar', stacked=True)
[Link]()
OUTPUT
To get horizontal bar plots, use the barh option –
import pandas as pd
import numpy as np
import [Link] as plt
# Creating a random DataFrame
df = [Link]([Link](10,4), columns=['a','b','c','d'])
# Plotting the horizontal bar
plot [Link](kind='barh', stacked=True)
[Link]()
OUTPUT
Histograms
Histograms can be plotted using the hist option of the plot() method kind argument. We can specify
number of bins.
import pandas as pd
import numpy as np
import [Link] as plt
# Creating a random DataFrame
df = [Link]({'a':[Link](1000)+1,'b':[Link](1000),
'c':[Link](1000) - 1}, columns=['a', 'b', 'c'])
[Link](kind='hist', bins=20)
[Link]()
Box Plots
Box plot can be drawn calling 'box' option for both the Series and DataFrame objects to visualize the
distribution of values within each column.
For instance, here is a boxplot representing five trials of 10 observations of a uniform random
variable on [0,1).
import pandas as pd
import numpy as np
import [Link] as plt
# Creating a random DataFrame
df = [Link]([Link](10, 5), columns=['A', 'B', 'C', 'D', 'E'])
[Link](kind='box')
[Link]()
Area Plot
Area plot can be created using the plot(kind='area') option.
import pandas as pd
import numpy as np
import [Link] as plt
# Creating a random DataFrame
df = [Link]([Link](10, 4), columns=['a', 'b', 'c', 'd'])
[Link](kind='area')
[Link]()
Its output is as follows −
Scatter Plot
Scatter plot can be created using the plot(kind='scatter') option.
import pandas as pd
import numpy as np
import [Link] as plt
# Creating a random DataFrame
df = [Link]([Link](50, 4), columns=['a', 'b', 'c', 'd'])
[Link](kind='scatter', x='a', y='b')
[Link]()
Its output is as follows −
Pie Chart
Pie chart can be created using the plot(kind='pie') option.
import pandas as pd
import numpy as np
import [Link] as plt
# Creating a random DataFrame
df = [Link](3 * [Link](4), index=['a', 'b', 'c', 'd'], columns=['x'])
[Link](kind='pie', subplots=True)
[Link]()
Its output is as follows −
Data Visualization with Pandas – FAQs
How does Pandas handle missing data in visualizations?
Pandas efficiently handles missing data by either ignoring them in visualizations or providing options
to fill or interpolate missing values. This ensures that the visualizations accurately represent the
dataset without errors.
How do you visualize a pandas DataFrame?
To visualize a Pandas DataFrame, use the [Link]() method. This leverages Matplotlib to create
various plot types like line, bar, histogram, and scatter plots. Customize plots with titles, labels, and
styles for clarity.
Why is Pandas a popular choice for data visualization in Python?
Pandas is popular for data visualization in Python because it provides a simple interface for creating
plots directly from DataFrames. It also integrates well with other libraries like Matplotlib and
Seaborn, making it versatile for both basic and complex visualizations.
Can I create interactive plots with Pandas?
While Pandas itself is primarily used for static plots, it can be combined with libraries like Plotly or
Bokeh to create interactive visualizations. These libraries offer additional features for interactivity
and dynamic data exploration.
How to Plot a Time Series in Matplotlib?
Time series data is the data marked by some time. Each point on the graph represents a
measurement of both time and quantity. A time-series chart is also known as a fever chart when the
data are connected in chronological order by a straight line that forms a succession of peaks and
troughs. x-axis of the chart is used to represent time intervals. y-line locates values of the parameter
getting monitored.
We will use the syntax mentioned below to draw a Time Series graph:
Syntax:
[Link](dataframe.X, dataframe.Y)
where
● X variable belongs to the datetime. datetime() class in the given dataframe.
● Y variable belongs to the values corresponding to date
We can also rotate the axis by using xticks() function
Syntax:
[Link](rotation, ha)
where
● rotation describes the degrees you want to rotate
● ha describes the position like right, left, top, bottom
Approach
● We need to have two axes for our graph i.e X and Y-axis. We will start by having a dataframe
to plot the graph.
● We can either make our own data frame or use some publicly available data frames. In X-
axis we should have a variable of DateTime. In Y-axis we can have the variable which we
want to analyze with respect to time.
● [Link]() method is used to plot the graph in matplotlib.
● To provide labels and title to make our graph meaningful we can use methods like
– [Link](), [Link](), [Link]()
Example 1:
Let say we have a dataframe of the days of the week and the number of classes on each day of the
upcoming week. We are Taking 7 days from 1-11-2021 to 7-11-2021
● Python3
# import modules
import pandas as pd
import [Link] as plt
import datetime
import numpy as np
# create dataframe
dataframe = [Link]({'date_of_week': [Link]([[Link](2021, 11, i+1)
for i in range(7)]),
'classes': [5, 6, 8, 2, 3, 7, 4]})
# Plotting the time series of given dataframe
[Link](dataframe.date_of_week, [Link])
# Giving title to the chart using [Link]
[Link]('Classes by Date')
# rotating the x-axis tick labels at 30degree
# towards right
[Link](rotation=30, ha='right')
# Providing x and y label to the chart
[Link]('Date')
[Link]('Classes')
Output:
We can also create scatter plots with the help of Time Series with the help of matplotlib.
Example 2: Scatter time series plot of the given dataframe
● Python3
#import modules
import pandas as pd
import [Link] as plt
import datetime
import numpy as np
# Let say we have a dataframe of the days of
# the week and number of classes in each day of the upcoming week.
# Taking 7 days from 1-11-2021 to 7-11-2021
dataframe = [Link]({'date_of_week': [Link]([[Link](2021, 11, i+1)
for i in range(7)]),
'classes': [5, 6, 8, 2, 3, 7, 4]})
# To draw scatter time series plot of the given dataframe
plt.plot_date(dataframe.date_of_week, [Link])
# rotating the x-axis tick labels at 30degree towards right
[Link](rotation=30, ha='right')
# Giving title to the chart using [Link]
[Link]('Classes by Date')
# Providing x and y label to the chart
[Link]('Date')
[Link]('Classes')
Output:
Similarly, we can plot the time series of two dataFrames and compare them. Let say we have two
colleges -‘XYZ’ and ‘ABC’. Now we need to compare these two by time-series graph of matplotlib.
Example 3:
● Python3
# Initialising required libraries
import pandas as pd
import [Link] as plt
import datetime
import numpy as np
# ABC colllege classes by date- from 01-11-2021 to 07-11-2021
abc = [Link]({'date_of_week': [Link]([[Link](2021, 11, i+1)
for i in range(7)]),
'classes': [5, 6, 8, 2, 3, 7, 4]})
# XYZ colllege classes by date - from 01-11-2021 to 07-11-2021
xyz = [Link]({'date_of_week': [Link]([[Link](2021, 11, i+1)
for i in range(7)]),
'classes': [2, 3, 7, 3, 4, 1, 2]})
# plotting the time series of ABC college dataframe
[Link](abc.date_of_week, [Link])
# plotting the time series of XYZ college dataframe
[Link](xyz.date_of_week, [Link], color='green')
# Giving title to the graph
[Link]('Classes by Date')
# rotating the x-axis tick labels at 30degree
# towards right
[Link](rotation=30, ha='right')
# Giving x and y label to the graph
[Link]('Date')
[Link]('Classes')
Output:
Similarly, we can plot the time series plot from a dataset. Here is the link to the dataset
Example 4:
● Python3
# Initialising required libraries
import pandas as pd
import [Link] as plt
import datetime
import numpy as np
# Loading the dataset
data = pd.read_csv("C:/Users/aparn/Desktop/[Link]")
# X axis is price_date
price_date = data['Date']
# Y axis is price closing
price_close = data['Close']
# Plotting the timeseries graph of given dataset
[Link](price_date, price_close)
# Giving title to the graph
[Link]('Prices by Date')
# rotating the x-axis tick labels at 30degree
# towards right
[Link](rotation=30, ha='right')
# Giving x and y label to the graph
[Link]('Price Date')
[Link]('Price Close')
Output:
MATPLOTLIB SUBPLOT
What are subplots in Matplotlib?
[Link] creates a figure and a grid of subplots with a single call, while providing
reasonable control over how the individual plots are created. For more advanced use cases
you can use GridSpec for a more general subplot layout or Figure.add_subplot for adding
subplots at arbitrary locations within the figure.
What is a figure in Matplotlib?
The figure is the very basic foundation for plotting data in matplotlib. When we plot
data on a graph, that graph will be drawn on a figure. By default, a figure is
automatically generated during plotting in matplotlib. We can create multiple figures,
change the figure's properties, and add subplots to a figure
Display Multiple Plots
With the subplot() function you can draw multiple plots in one figure:
Draw 2 plots:
import [Link] as plt
import numpy as np
#plot 1:
x = [Link]([0, 1, 2, 3])
y = [Link]([3, 8, 1, 10])
[Link](1, 2, 1)
[Link](x,y)
#plot 2:
x = [Link]([0, 1, 2, 3])
y = [Link]([10, 20, 30, 40])
[Link](1, 2, 2)
[Link](x,y)
[Link]()
Result:
The subplot() Function
The subplot() function takes three arguments that describes the layout of the figure.
The layout is organized in rows and columns, which are represented by
the first and second argument.
The third argument represents the index of the current plot.
[Link](1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.
[Link](1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.
So, if we want a figure with 2 rows an 1 column (meaning that the two plots will be displayed on top
of each other instead of side-by-side), we can write the syntax like this:
Example
Draw 2 plots on top of each other:
import [Link] as plt
import numpy as np
#plot 1:
x = [Link]([0, 1, 2, 3])
y = [Link]([3, 8, 1, 10])
[Link](2, 1, 1)
[Link](x,y)
#plot 2:
x = [Link]([0, 1, 2, 3])
y = [Link]([10, 20, 30, 40])
[Link](2, 1, 2)
[Link](x,y)
[Link]()
TITLE
You can add a title to each plot with the title() function:
Example
2 plots, with titles:
import [Link] as plt
import numpy as np
#plot 1:
x = [Link]([0, 1, 2, 3])
y = [Link]([3, 8, 1, 10])
[Link](1, 2, 1)
[Link](x,y)
[Link]("SALES")
#plot 2:
x = [Link]([0, 1, 2, 3])
y = [Link]([10, 20, 30, 40])
[Link](1, 2, 2)
[Link](x,y)
[Link]("INCOME")
[Link]()
Result:
SUPER TITLE
You can add a title to the entire figure with the suptitle() function:
Example
Add a title for the entire figure:
import [Link] as plt
import numpy as np
#plot 1:
x = [Link]([0, 1, 2, 3])
y = [Link]([3, 8, 1, 10])
[Link](1, 2, 1)
[Link](x,y)
[Link]("SALES")
#plot 2:
x = [Link]([0, 1, 2, 3])
y = [Link]([10, 20, 30, 40])
[Link](1, 2, 2)
[Link](x,y)
[Link]("INCOME")
[Link]("MY SHOP")
[Link]()
Result: