0% found this document useful (0 votes)
16 views27 pages

Essential Data Visualization Techniques

Data visualization simplifies complex data sets into visual formats like charts and graphs, enhancing understanding and communication. Common techniques include pie charts, bar charts, scatter plots, and more, with Pandas providing various plotting methods and customization options for effective data representation. The document also covers how to create different types of plots using Pandas and Matplotlib, including time series and subplots.

Uploaded by

44workinglife
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views27 pages

Essential Data Visualization Techniques

Data visualization simplifies complex data sets into visual formats like charts and graphs, enhancing understanding and communication. Common techniques include pie charts, bar charts, scatter plots, and more, with Pandas providing various plotting methods and customization options for effective data representation. The document also covers how to create different types of plots using Pandas and Matplotlib, including time series and subplots.

Uploaded by

44workinglife
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data visualization translates complex data sets into visual formats that are easier for the human

brain to understand. This can include a variety of visual tools.

Data visualization techniques use visual representations like charts and graphs to make data easier
to understand, analyze, and communicate, including pie charts, bar charts, scatter plots, heatmaps,
line charts, histograms, area charts, box plots, bubble charts, treemaps, maps, and network
diagrams.

Common Data Visualization Techniques:

● Pie Chart:

A circular chart that shows parts of a whole, with each slice representing a proportion.

● Bar Chart:

Uses rectangular bars to compare data across different categories, where the length of the bars
represents the value.

● Scatter Plot:

Displays data points on a Cartesian coordinate system, showing the relationship between two
variables.

● Heatmap:

Uses color to represent data values, often showing patterns or trends in large datasets.

● Line Chart:

Connects data points with lines to show trends and changes over time.

● Histogram:

A type of bar chart that shows the distribution of a single variable, with bars representing
frequency.

● Area Chart:

Similar to a line chart, but the area under the line is filled, emphasizing the magnitude of the data.

● Box Plot:

Provides a visual summary of the distribution of numerical data, showing the median, quartiles, and
outliers.

● Bubble Chart:

Uses bubbles to represent data, where the size of the bubble represents the value of the data, and
the color can represent another variable.

● Treemap:

Displays hierarchical data using nested rectangles, with the size of each rectangle representing the
value of the data.
● Map:

Visualizes data on a geographical map, showing locations and distributions.

● Network Diagram:

Shows relationships between different entities or nodes, connected by lines.

Key Features for Data Visualization with Pandas:

Pandas offers several features that make it a great choice for data visualization:

● Variety of Plot Types: Pandas supports various plot types including line plots, bar plots,
histograms, box plots, and scatter plots.

● Customization: Users can customize plots by adding titles, labels, and styling enhancing the
readability of the visualizations.

● Handling of Missing Data: Pandas efficiently handles missing data ensuring that
visualizations accurately represent the dataset without errors.

● Integration with Matplotlib: Pandas integrates with Matplotlib that allow users to create a
wide range of static, animated, and interactive plots.

Pandas library is primarily used for data manipulation and analysis but it also provides the data
visualization capabilities by using the Python's Matplotlib library support.

In Python, the Pandas library provides a basic method called .plot() for generating a wide variety of
visualizations along the different specialized plotting methods. These visualizations tools are built on
top of the Python's Matplotlib library, offering flexibility and customization options.

Types of Plots Available in Pandas

Pandas supports various plot types through the kind parameter or specialized plotting methods.
Following is the overview of the different plotting methods −

Specialized
Plot Type kind Value Use Case
Method

Line 'line' .line() Visualizing trends over time or a sequence.


Bar 'bar' .bar() Comparing quantities across categories.

Horizontal
'barh' .barh() Same as bar charts, but horizontal.
Bar

Histogram 'hist' .hist() Visualizing Distribution of numeric data.

Box Plot 'box' .box() Summarizing data distribution and outliers.

Area 'area' .area() Highlighting trends with cumulative data.

Scatter 'scatter' .scatter() Relationship between two variables, for DataFrame only.

Visualizing data density in two dimensions, for


Hexbin 'hexbin' .hexbin()
DataFrame only.

'kde' or
Density .kde() or .density() Smoothing data distributions (Kernel Density Estimation).
'density'

Pie 'pie' .pie() Proportional data in a circular graph.

Plotting Bar Plot with plot() method

Let us now see what a Bar Plot is by creating one.

import pandas as pd

import numpy as np

import [Link] as plt

# Creating a random DataFrame

df = [Link]([Link](10,4), columns=['a','b','c','d'])

# Plotting the bar plot

[Link](kind='bar')

[Link]()
output

To produce a stacked bar plot, pass stacked=True –

import pandas as pd

import numpy as np

import [Link] as plt

# Creating a random DataFrame

df = [Link]([Link](10,4), columns=['a','b','c','d'])

# Plotting the stacked Bar

plot [Link](kind='bar', stacked=True)


[Link]()

OUTPUT

To get horizontal bar plots, use the barh option –

import pandas as pd

import numpy as np

import [Link] as plt

# Creating a random DataFrame

df = [Link]([Link](10,4), columns=['a','b','c','d'])

# Plotting the horizontal bar

plot [Link](kind='barh', stacked=True)

[Link]()
OUTPUT

Histograms

Histograms can be plotted using the hist option of the plot() method kind argument. We can specify
number of bins.

import pandas as pd

import numpy as np

import [Link] as plt

# Creating a random DataFrame

df = [Link]({'a':[Link](1000)+1,'b':[Link](1000),
'c':[Link](1000) - 1}, columns=['a', 'b', 'c'])

[Link](kind='hist', bins=20)

[Link]()
Box Plots

Box plot can be drawn calling 'box' option for both the Series and DataFrame objects to visualize the
distribution of values within each column.

For instance, here is a boxplot representing five trials of 10 observations of a uniform random
variable on [0,1).

import pandas as pd

import numpy as np

import [Link] as plt

# Creating a random DataFrame

df = [Link]([Link](10, 5), columns=['A', 'B', 'C', 'D', 'E'])

[Link](kind='box')

[Link]()
Area Plot

Area plot can be created using the plot(kind='area') option.

import pandas as pd

import numpy as np

import [Link] as plt

# Creating a random DataFrame

df = [Link]([Link](10, 4), columns=['a', 'b', 'c', 'd'])

[Link](kind='area')

[Link]()

Its output is as follows −


Scatter Plot

Scatter plot can be created using the plot(kind='scatter') option.

import pandas as pd

import numpy as np

import [Link] as plt

# Creating a random DataFrame

df = [Link]([Link](50, 4), columns=['a', 'b', 'c', 'd'])

[Link](kind='scatter', x='a', y='b')

[Link]()

Its output is as follows −


Pie Chart

Pie chart can be created using the plot(kind='pie') option.

import pandas as pd

import numpy as np

import [Link] as plt

# Creating a random DataFrame

df = [Link](3 * [Link](4), index=['a', 'b', 'c', 'd'], columns=['x'])

[Link](kind='pie', subplots=True)

[Link]()

Its output is as follows −

Data Visualization with Pandas – FAQs

How does Pandas handle missing data in visualizations?

Pandas efficiently handles missing data by either ignoring them in visualizations or providing options
to fill or interpolate missing values. This ensures that the visualizations accurately represent the
dataset without errors.

How do you visualize a pandas DataFrame?


To visualize a Pandas DataFrame, use the [Link]() method. This leverages Matplotlib to create
various plot types like line, bar, histogram, and scatter plots. Customize plots with titles, labels, and
styles for clarity.

Why is Pandas a popular choice for data visualization in Python?

Pandas is popular for data visualization in Python because it provides a simple interface for creating
plots directly from DataFrames. It also integrates well with other libraries like Matplotlib and
Seaborn, making it versatile for both basic and complex visualizations.

Can I create interactive plots with Pandas?

While Pandas itself is primarily used for static plots, it can be combined with libraries like Plotly or
Bokeh to create interactive visualizations. These libraries offer additional features for interactivity
and dynamic data exploration.
How to Plot a Time Series in Matplotlib?

Time series data is the data marked by some time. Each point on the graph represents a
measurement of both time and quantity. A time-series chart is also known as a fever chart when the
data are connected in chronological order by a straight line that forms a succession of peaks and
troughs. x-axis of the chart is used to represent time intervals. y-line locates values of the parameter
getting monitored.

We will use the syntax mentioned below to draw a Time Series graph:

Syntax:

[Link](dataframe.X, dataframe.Y)

where

● X variable belongs to the datetime. datetime() class in the given dataframe.

● Y variable belongs to the values corresponding to date

We can also rotate the axis by using xticks() function

Syntax:

[Link](rotation, ha)

where

● rotation describes the degrees you want to rotate

● ha describes the position like right, left, top, bottom

Approach

● We need to have two axes for our graph i.e X and Y-axis. We will start by having a dataframe
to plot the graph.

● We can either make our own data frame or use some publicly available data frames. In X-
axis we should have a variable of DateTime. In Y-axis we can have the variable which we
want to analyze with respect to time.

● [Link]() method is used to plot the graph in matplotlib.

● To provide labels and title to make our graph meaningful we can use methods like
– [Link](), [Link](), [Link]()

Example 1:

Let say we have a dataframe of the days of the week and the number of classes on each day of the
upcoming week. We are Taking 7 days from 1-11-2021 to 7-11-2021

● Python3

# import modules
import pandas as pd

import [Link] as plt

import datetime

import numpy as np

# create dataframe

dataframe = [Link]({'date_of_week': [Link]([[Link](2021, 11, i+1)

for i in range(7)]),

'classes': [5, 6, 8, 2, 3, 7, 4]})

# Plotting the time series of given dataframe

[Link](dataframe.date_of_week, [Link])

# Giving title to the chart using [Link]

[Link]('Classes by Date')

# rotating the x-axis tick labels at 30degree

# towards right

[Link](rotation=30, ha='right')

# Providing x and y label to the chart

[Link]('Date')

[Link]('Classes')

Output:
We can also create scatter plots with the help of Time Series with the help of matplotlib.

Example 2: Scatter time series plot of the given dataframe

● Python3

#import modules

import pandas as pd

import [Link] as plt

import datetime

import numpy as np

# Let say we have a dataframe of the days of

# the week and number of classes in each day of the upcoming week.

# Taking 7 days from 1-11-2021 to 7-11-2021

dataframe = [Link]({'date_of_week': [Link]([[Link](2021, 11, i+1)


for i in range(7)]),

'classes': [5, 6, 8, 2, 3, 7, 4]})

# To draw scatter time series plot of the given dataframe

plt.plot_date(dataframe.date_of_week, [Link])

# rotating the x-axis tick labels at 30degree towards right

[Link](rotation=30, ha='right')

# Giving title to the chart using [Link]

[Link]('Classes by Date')

# Providing x and y label to the chart

[Link]('Date')

[Link]('Classes')

Output:
Similarly, we can plot the time series of two dataFrames and compare them. Let say we have two
colleges -‘XYZ’ and ‘ABC’. Now we need to compare these two by time-series graph of matplotlib.

Example 3:

● Python3

# Initialising required libraries

import pandas as pd

import [Link] as plt

import datetime

import numpy as np

# ABC colllege classes by date- from 01-11-2021 to 07-11-2021

abc = [Link]({'date_of_week': [Link]([[Link](2021, 11, i+1)

for i in range(7)]),

'classes': [5, 6, 8, 2, 3, 7, 4]})


# XYZ colllege classes by date - from 01-11-2021 to 07-11-2021

xyz = [Link]({'date_of_week': [Link]([[Link](2021, 11, i+1)

for i in range(7)]),

'classes': [2, 3, 7, 3, 4, 1, 2]})

# plotting the time series of ABC college dataframe

[Link](abc.date_of_week, [Link])

# plotting the time series of XYZ college dataframe

[Link](xyz.date_of_week, [Link], color='green')

# Giving title to the graph

[Link]('Classes by Date')

# rotating the x-axis tick labels at 30degree

# towards right

[Link](rotation=30, ha='right')

# Giving x and y label to the graph

[Link]('Date')

[Link]('Classes')

Output:
Similarly, we can plot the time series plot from a dataset. Here is the link to the dataset

Example 4:

● Python3

# Initialising required libraries

import pandas as pd

import [Link] as plt

import datetime

import numpy as np

# Loading the dataset

data = pd.read_csv("C:/Users/aparn/Desktop/[Link]")

# X axis is price_date

price_date = data['Date']

# Y axis is price closing


price_close = data['Close']

# Plotting the timeseries graph of given dataset

[Link](price_date, price_close)

# Giving title to the graph

[Link]('Prices by Date')

# rotating the x-axis tick labels at 30degree

# towards right

[Link](rotation=30, ha='right')

# Giving x and y label to the graph

[Link]('Price Date')

[Link]('Price Close')

Output:
MATPLOTLIB SUBPLOT
What are subplots in Matplotlib?

[Link] creates a figure and a grid of subplots with a single call, while providing
reasonable control over how the individual plots are created. For more advanced use cases
you can use GridSpec for a more general subplot layout or Figure.add_subplot for adding
subplots at arbitrary locations within the figure.

What is a figure in Matplotlib?

The figure is the very basic foundation for plotting data in matplotlib. When we plot
data on a graph, that graph will be drawn on a figure. By default, a figure is
automatically generated during plotting in matplotlib. We can create multiple figures,
change the figure's properties, and add subplots to a figure
Display Multiple Plots
With the subplot() function you can draw multiple plots in one figure:
Draw 2 plots:

import [Link] as plt


import numpy as np

#plot 1:
x = [Link]([0, 1, 2, 3])
y = [Link]([3, 8, 1, 10])

[Link](1, 2, 1)
[Link](x,y)

#plot 2:
x = [Link]([0, 1, 2, 3])
y = [Link]([10, 20, 30, 40])

[Link](1, 2, 2)
[Link](x,y)

[Link]()
Result:
The subplot() Function

The subplot() function takes three arguments that describes the layout of the figure.

The layout is organized in rows and columns, which are represented by


the first and second argument.

The third argument represents the index of the current plot.

[Link](1, 2, 1)
#the figure has 1 row, 2 columns, and this plot is the first plot.

[Link](1, 2, 2)
#the figure has 1 row, 2 columns, and this plot is the second plot.

So, if we want a figure with 2 rows an 1 column (meaning that the two plots will be displayed on top
of each other instead of side-by-side), we can write the syntax like this:

Example

Draw 2 plots on top of each other:

import [Link] as plt


import numpy as np
#plot 1:
x = [Link]([0, 1, 2, 3])
y = [Link]([3, 8, 1, 10])

[Link](2, 1, 1)
[Link](x,y)

#plot 2:
x = [Link]([0, 1, 2, 3])
y = [Link]([10, 20, 30, 40])

[Link](2, 1, 2)
[Link](x,y)

[Link]()
TITLE
You can add a title to each plot with the title() function:

Example

2 plots, with titles:

import [Link] as plt


import numpy as np

#plot 1:
x = [Link]([0, 1, 2, 3])
y = [Link]([3, 8, 1, 10])

[Link](1, 2, 1)
[Link](x,y)
[Link]("SALES")

#plot 2:
x = [Link]([0, 1, 2, 3])
y = [Link]([10, 20, 30, 40])

[Link](1, 2, 2)
[Link](x,y)
[Link]("INCOME")

[Link]()

Result:
SUPER TITLE
You can add a title to the entire figure with the suptitle() function:

Example

Add a title for the entire figure:

import [Link] as plt


import numpy as np

#plot 1:
x = [Link]([0, 1, 2, 3])
y = [Link]([3, 8, 1, 10])

[Link](1, 2, 1)
[Link](x,y)
[Link]("SALES")

#plot 2:
x = [Link]([0, 1, 2, 3])
y = [Link]([10, 20, 30, 40])
[Link](1, 2, 2)
[Link](x,y)
[Link]("INCOME")

[Link]("MY SHOP")
[Link]()

Result:

You might also like