Data Analysis and Visualization
Introduction to Data Visualization
Prepared by: Tran Luong Quoc Dai, Ph.D.
Outline
• Introduction to Visualization
• The Big Picture
• What is Data Visualization
• The Main goals of Data Visualization
• History of Data Visualization
• Data Abstraction
• Essential Chart Types
The Big Picture
• “Computer-based visualization systems provide visual representations of datasets designed
to help people carry out tasks more effectively.”
• Visualization enhances human capabilities rather than replacing them with computational
decision-making.
• The vast design space involves both the creation and interaction with visual
representations.
• Designing visualizations requires trade-offs, and many possible designs are ineffective for
specific tasks, making validation essential yet challenging.
• Designers must consider resource limitations related to computers, humans, and display
constraints.
• Visualization usage can be analyzed based on why it is needed, what data is shown, and
how it is designed.
Why Have a Human in the Loop?
• Data visualization (Vis) helps people analyze data even when they have not yet defined
specific questions.
• When questions are well-defined, computational methods such as statistics and machine
learning can automate many tasks.
• In complex analysis scenarios, people may not know in advance which questions are most
important.
• When there are dozens, thousands, or even more possibilities to consider, the best
approach is to combine data analysis with human intuition and reasoning.
• Vis leverages the human visual system’s powerful pattern recognition capabilities to
support data analysis.
• Vis is most suitable when the goal is to enhance human abilities rather than entirely
replace them with automation.
Why Have a Computer in the Loop?
• Computation enables the creation of
visual tools for exploring large, dynamic
datasets—something infeasible by hand.
• Manual methods are limited by time and
attention, restricting dataset size and
complexity.
• Automated visualization saves effort and
allows tracking changes over time while
designers can retain the expressive
qualities of hand-drawn diagrams.
Why Use an External Representation?
• External representations enhance human abilities by helping us overcome
internal cognition and memory limitations.
• External representations, like diagrams, enhance human capacity by
offloading cognitive and memory tasks to the perceptual system.
• Diagrams can support quick perceptual inferences and organize information
spatially, speeding up search and recognition.
• Well-designed diagrams group relevant data together, but poorly designed
ones may organize information inefficiently, hindering problem-solving.
Why Depend on Vision?
• Visualization relies on the human visual system as a key communication tool due
to its high efficiency and ability to process information quickly.
• The visual system can simultaneously handle vast amounts of data at the
subconscious level, enabling us to notice patterns like red items among many gray
ones.
• Unlike sound, which is experienced sequentially, vision processes information
simultaneously, allowing us to perceive large amounts of data at once.
• Other sensory modalities, such as taste and smell, lack the technology for
effective data communication, and haptic feedback only covers limited aspects of
sensory input.
What is Data Visualization?
• Data visualization is the graphical
representation of information and data.
• Data visualization tools use elements like charts,
graphs, and maps to make trends, outliers, and
patterns in data easier to see and understand.
• These visual representations convey complex
data relationships and insights in a clear, easy-
to-understand format.
• In the world of Big Data, data visualization
tools and technologies are crucial for analyzing
vast amounts of information and making
informed, data-driven decisions.
The Main Goals of Data Visualization
• The main goals of data visualization are explanation and exploration:
o Explanation: This involves presenting large volumes of data in a visual format, delivering
relevant information to help users gain foundational insights into the subject matter.
o Exploration: This focuses on offering a multidimensional view of the dataset, allowing users to
explore, ask questions, and uncover deeper insights.
• Data visualizations are powerful tools for uncovering trends, patterns, and correlations within data.
• They support decision-making by enabling key stakeholders to make well-informed, data-backed
choices.
• Visualizations simplify the understanding of large datasets and allow integration of information
from multiple sources.
• They effectively use storytelling techniques to communicate insights and data-driven ideas.
• Engaging and easy-to-understand visuals help capture and hold the target audience’s attention.
• Data visualizations aid in tracking key metrics and monitoring critical performance indicators
(KPIs).
History of
Data Visualization
• The foundation of data visualization
was established in the 1600s with
simple presentations of statistical
data.
• Today, data visualization has
evolved into a vital tool for a wide
range of professional fields.
Outline
• Introduction to Visualization
• Data Abstraction
• Data Semantics vs. Data types
• Data types vs. Dataset types
• Understand the Data
• Essential Chart Types
Data Abstraction
• The four main types of datasets are tables, networks, fields, and geometry,
with other possible collections, including clusters, sets, and lists.
• These datasets combine five data types: items, attributes, links, positions, and
grids.
• Data can be static, available as a file, or dynamic, processed as a stream.
• Attributes can be categorical or ordered, with further distinctions between
ordinal and quantitative types.
• The order of attributes can be sequential, diverging, or cyclic.
Data Semantics vs. Data Types
• Visualization design is heavily influenced by the type of data available. To make
sense of data, you must understand its meaning and structure.
• The semantics (real-world meaning) and the data’s type (structural or
mathematical interpretation). Understanding both semantics and types helps
interpret the data correctly.
• Semantics clarify what the data represents—such as a person’s name, a postal
code, or a measurement.
• The type describes its mathematical properties and operations:
• At the data level: The data types can be items, links, or attributes.
• At the dataset level: The structure can be a table, network, or field.
• At the attribute level: Specifies what mathematical operations are meaningful
for the data type.
Data Types
o Items – Distinct individual entities represented as rows in a table or nodes in a network.
Example: People in a customer database, cities on a map, or products in an inventory.
o Attributes – Measurable or observable properties of items, represented as columns in a
table or properties of nodes in a network. Example: A person’s age, a product’s price, or
a city’s population.
o Links – Relationships between items, represented as edges in a graph or connections
between nodes. Example: Friendships in a social network, trade routes between
countries, or hyperlinks between web pages.
o Positions – Spatial locations in 2D or 3D space, represented as coordinates in a map or a
point in a coordinate system. Example: GPS coordinates for a city, medical scan
positions in 3D imaging, or object placements in a game environment.
o Grids – Structured arrangements for continuous data, represented as arrays or mesh-
based layouts defining geometric and topological relationships. Example: Elevation maps,
temperature distribution grids, or image pixels in a digital photo.
Dataset Types
• A dataset is any collection of information used for analysis. The four basic dataset types
are:
o Tables – Organized in rows (items) and columns (attributes).
Example: A spreadsheet of customer data with names, ages, and purchase history.
o Networks – Represent relationships between items as nodes and links.
Example: A social network showing connections between users.
o Fields – Continuous data distributed across space, often structured in grids.
Example: Temperature variations across a geographic region.
o Geometry – Spatial shapes and structures, such as points, lines, and polygons.
Example: A city map with roads, buildings, and boundaries.
• Other ways to organize data include clusters (grouped items with similar
properties), sets (collections with shared characteristics), and lists (ordered sequences of
items).
• In real-world scenarios, datasets often combine multiple types for complex analysis.
Data and Dataset Types
• The four fundamental dataset types are tables, networks, fields, and geometry,
while other possible item collections include clusters, sets, and lists.
• These datasets comprise five core data types: items, attributes, links, positions,
and grids.
Dataset Types
• The figure shows the internal structure of
the four fundamental dataset types.
• Tables consist of cells indexed by items
and attributes, whether in a simple flat
format or a more complex
multidimensional structure.
• Networks represent items as nodes
connected by links, with trees being a
special case.
• Continuous fields use grids based on
spatial positions, where each cell contains
attributes.
• Spatial geometry consists solely of
position information.
Dataset Types - Tables
• multidimensional tables
• indexing based on multiple
keys
• eg genes, patients
Dataset Types - Networks
• network/graph
• nodes (vertices) connected by links
(edges)
• tree is special case: no cycles often
have roots and are directed
Dataset Types - Fields
Spatial
Dataset Types - Geometry
Spatial
Understand the Data
• To create an effective visualization, it’s essential to identify key goals and
present the data in a way that highlights them.
• Auto-generated charts may not effectively emphasize the
most important aspects of the data.
• Explore the data first to determine which insights are most relevant for your
audience.
• Keep it focused—avoid overcrowding the visualization with too many variables
or excessive information. Additional data can be included in an appendix or table.
• Use annotations to provide context and enhance understanding.
Data Visualization Characteristics
Our eyes perceive
changes and highlights by
recognizing the following
visual characteristics.
Clear and Clean
• Simplify the chart without losing clarity.
• Minimize complexity by reducing numbers
to their simplest form whenever possible.
• Ensure clear labels, specified units, and a
legend when needed.
• Eliminate redundancy—avoid duplicating
elements like gridlines and numeric labels or
using colors and data labels for the same
information.
• Avoid 3D graphics and shadows, as they
can distort data and make comparisons
harder.
• Use small multiples to break down
cluttered charts into more digestible sections.
Select a Right Type of Graph
• Common visualization type include:
similarities and differences between
Comparison values, and highlighting any gaps
similarities and differences between
Composition values, and highlighting any gaps
similarities and differences between
Relationship values, and highlighting any gaps
similarities and differences between
Distribution values, and highlighting any gaps
Outline
• Introduction to Visualization
• Data Abstraction
• Essential Chart Types
Essential Chart Types for Data Visualization
• Charts play a crucial role in data analysis by transforming large datasets into clear, digestible
visuals.
• They help reveal insights to viewers and effectively communicate findings to those who may
not engage with the raw data.
• With many chart types available—each suited to different purposes—one of the biggest
challenges in data visualization is selecting the right chart for the task.
• The best chart type depends on several factors:
• Data characteristics: What types of metrics, features, or variables are you plotting?
• Audience: Is the visualization for personal exploration, or will it be presented to others?
• Desired takeaway: What key conclusion do you want viewers to draw?
• Carefully considering these factors will help ensure your visualization is both clear and
impactful.
Bar Charts
A bar chart is one of the most
commonly used visualizations for
comparing categorical data. It displays
data using rectangular bars, where the
length or height of each bar represents
the value of a particular category.
When to use a Bar Charts
• Distribution of data: A bar chart is used when you want to show a
distribution of data points or perform a comparison of metric values across
different subgroups of your data.
• Comparing categories: Best for showing differences between distinct groups
(e.g., sales by region, population by country).
• Tracking changes over time: Effective when comparing values across
different periods, though a line chart may be better for continuous trends.
• Ranking data: Easily highlights the largest and smallest values in a dataset.
Type of Bar Charts
• Vertical Bar Chart (Column
Chart): Bars are displayed vertically;
commonly used for time-based
comparisons.
• Horizontal Bar Chart: Useful
when category labels are long or when
comparing many categories.
• Stacked Bar Chart: Shows part-to-
whole relationships by stacking
different data series within each bar.
• Grouped Bar Chart: Displays
multiple categories side by side for
direct comparison.
Line Charts
• A line chart is a type of
data visualization that
displays information as a
series of points (called data
markers) connected by
straight lines. It’s particularly
effective for showing changes
over time or trends in
continuous data.
When to Use a Line Charts
• Use a line chart to emphasize changes in one variable (vertical axis) over continuous
values of a second variable (horizontal axis), typically to show patterns or trends over
time.
• Tracking changes over time is ideal for showing how a variable evolves over time
(e.g., stock prices, temperature fluctuations, website traffic).
• Displaying trends: Useful for visualizing overall trends, such as upward or downward
movement, in datasets with time-based intervals.
• Comparing multiple series: Line charts display various datasets on the same graph,
allowing comparisons across several variables over time.
• Line charts can also show frequency distributions, called frequency polygons, as a
substitute for histograms, especially when comparing multiple distributions.
Types of Line Charts
• Single Line Chart: Displays
one dataset, showing how a
single variable changes over
time.
• Multiple Line Chart: Plots
two or more datasets on the
same chart, allowing for
comparisons between variables.
Area Charts
• An area chart combines elements
of both a line chart and a bar
chart to display how the numeric
values of one or more groups
change over time or another
variable. It differs from a line
chart by adding shading between
the lines and a baseline, similar to
a bar chart.
When to use Area Charts
• An area chart is perfect for showing how numeric values change over time (or another
continuous variable) while revealing multiple groups’ cumulative impact. It’s useful in the
following scenarios:
• Tracking Changes Over Time: Ideal for showing how one or more values evolve
over a period, highlighting both trends and overall changes.
• Comparing Multiple Groups: When comparing several groups over time, the
overlapping area chart helps visualize how each group’s value changes, with shading
emphasizing which group dominates at different points.
• Understanding Contributions to a Whole: Use a stacked area chart to track the
total value and how individual components contribute to it, with segments stacked on top
to show each group’s part in the total.
• Visualizing Trends and Composition: Area charts are effective when you want to
showcase both trends over time and the proportional contributions of different categories
in one visualization.
Types of Area Charts
• Area Chart: A variation of the line chart
where the area beneath the line is filled,
useful for emphasizing the volume or
magnitude of change.
• Overlapping Area Chart: Overlap
many area charts to compare several
groups’ values over time.
• Stacked Area Chart is a variation of the
area chart where multiple data series are
plotted on top of one another, with the
areas “stacked” to show the cumulative
total.
Scatter Charts
• A scatter plot (a scatter chart or
scatter graph) uses dots to represent
the values of two numeric variables.
The position of each dot on the
horizontal and vertical axes
corresponds to the values of a single
data point. Scatter plots are
commonly used to observe
relationships between variables.
When to Use a Scatter Charts
• Display relationships between two numeric variables, revealing patterns
across the entire dataset.
• Identify correlations: Predict the vertical value based on the horizontal
one, with the horizontal axis representing the independent variable and the
vertical axis the dependent variable.
• Describe relationships as positive or negative, strong or weak, linear or
nonlinear.
• Uncover data patterns: Identify clusters, gaps, and outliers, which can
be useful for segmenting data (e.g., in user persona development).
Pie Charts
• A pie chart displays how a total
amount is divided among
categories, represented as circle
slices. Each slice corresponds to a
category, and the size of the slice—
both in area and arc length—shows
the proportion of the whole that
each category represents.
When to Use a Pie Charts
• Compare contributions to a whole: Use a pie chart to show how a total amount is
divided into distinct parts, focusing on each group's contribution to the whole rather than
comparing groups directly.
• Suitable for categorical data: A pie chart is ideal when you have a
“whole” that is divided into categories or segments.
• Two common use cases:
• Total count: The whole represents a total count, such as votes in an election or
transactions by user type (e.g., guest, new user, existing user).
• The sum of a data variable: The whole represents a total sum, like total revenue
from transactions divided by user type, age bracket, or location to reveal insights
about business performance.
Types of Pie Charts
• Standard Pie Chart: Displays categories as
slices, with each slice showing its proportion of
the whole.
• Donut Chart: A variation of the pie chart
where the center is cut out, creating a
“donut” shape. It’s visually cleaner and can
accommodate labels in the center.
• Pie of Pie Chart: A combination
chart where the smaller slices of a standard pie
chart are broken out into a secondary pie chart
for further analysis.
• Bar of Pie Chart: This chart is similar to the
Pie of Pie chart but uses a bar chart for the
secondary data instead of another pie chart.
Treemap Charts
• A treemap is a type of chart used to visualize hierarchical data using nested
rectangles.
• It represents relationships between different categories using rectangles, where
the size of each rectangle corresponds to a specific quantitative value.
• Each branch of the hierarchy is represented by a large rectangle, which is then
divided into smaller rectangles representing subcategories.
• The size of each rectangle corresponds to a quantitative value, making treemaps
useful for visualizing categorical and hierarchical relationships.
• Unlike traditional tree diagrams, treemaps do not require connecting lines or
branches, as the hierarchy is represented through the nested arrangement of
rectangles.
An Example of Treemap Charts
• From Excel
An Example of Treemap Charts
An Example of Treemap Charts
When to Use Treemap Charts?
• Comparing Proportions in a Hierarchy: This is ideal for visualizing how
categories and subcategories contribute to a whole.
• Compact Display of Large Datasets: Efficiently presents a lot of data in a
limited space.
• Identifying Patterns and Trends: Highlights dominant categories,
relationships, and anomalies through size and color.
• Alternative to Pie Charts: More practical when dealing with many categories
that would clutter a pie chart.
• Market Share & Competitive Analysis: Visualizes dominance and
distribution within industries or segments.
Heatmap Charts
• A heatmap chart is a data visualization technique that uses color to represent
values in a matrix or dataset.
• It helps identify patterns, correlations, and variations within large data sets by
using color gradients to indicate intensity or magnitude.
• Darker or more saturated colors typically represent higher values, while lighter or
less saturated colors indicate lower values.
An Example of Heatmap Charts
• You have sales data for each salesperson over 12
months.
• Looking at raw numbers in a table can be
overwhelming, but a heatmap allows you to
visualize patterns instantly.
An Example of Heatmap Charts
• A heatmap would instantly
highlight:
• Who has the
highest/lowest sales?
• Seasonal sales trends (e.g.,
December has the highest
sales, but mid-year shows
a drop).
• Areas that need
improvement (e.g., David’s
low performance).
When to Used a Heatmap?
• A heat map chart is ideal when analyzing and comparing complex data sets across
multiple variables. It’s beneficial in the following scenarios:
• Spotting Patterns and Trends: Quickly identify your data’s correlations,
anomalies, or recurring patterns.
• Comparing Data Across Categories: Using a structured matrix, analyze
differences across various categories, such as regional sales over time.
• Emphasizing Highs and Lows: Easily highlight the highest and lowest values,
such as peak customer engagement areas or top-performing sales regions.
• Simplifying Large Datasets: Transform complex, large-scale data into a
visually intuitive format for faster and more precise insights.
Type of Heatmap Charts
• Geographical Heatmap – Visualizes
data on maps, such as population
density or weather patterns.
• Website Heatmap – Shows user
interactions like clicks, scrolls, and
mouse movements.
• Clustered Heatmap – A grid
heatmap with hierarchical sorting and a
dendrogram to show relationships.
• Correlogram – A heatmap that
visualizes correlations between the same
categories on both axes. Used in
genetics to show relationships between
ethnic groups.
Pareto Charts
• A Pareto Chart combines a bar chart (showing individual category
values) and a line chart (showing cumulative percentage).
• The bar charts correspond to
individual values in descending order,
while the line graph displays the
cumulative percentage total.
• It follows the 80/20 rule—where a
small number of causes often
contribute to most of the effects.
Example for Pareto Chart
• A company analyzes customer
complaints to improve service.
They categorize complaints
as Delivery Delays, Product
Quality, Customer Service, Pricing
Issues, and Others. The data
shows:
Complaint Type Frequency
Delivery Delays 40
Product Quality 30
Customer Service 15
Pricing Issues 10
Others 5
Example for Pareto Chart
When to Use a Pareto Charts?
• When analyzing data about the frequency of problems or causes in a
process.
• When there are numerous problems or causes, you may want to prioritize
the most significant ones.
• When examining broad causes, it's useful to break them down into their
specific components.
• When sharing your findings with others, clear communication is essential.
Types of Pareto Charts
• Basic Pareto Chart – Standard
version with bars and a
cumulative percentage line.
• Weighted Pareto Chart –
Weighs factors like cost or impact
instead of frequency.
• Comparative Pareto Chart –
Compares multiple time periods
or categories side by side.
Geo Charts
• A Geo Chart (or Geographical Map Chart) is a data visualization that
represents geographical data using colors, markers, or bubbles on a map.
• It shows spatial information, such as the distribution of values across
different regions, countries, or states.
• The chart helps analyze regional trends, distributions, and
comparisons based on location.
An Example for Geo Charts
• Construct a geo map showing the population of the following countries:
Population
Country
(million)
United
331
States
Canada 37,7
Brazil 212
India 1380
China 1440
When to Use Geo Charts?
• Comparing geographical distributions of data (e.g., sales, population,
or disease spread).
• Identifying regional patterns and trends.
• Visualizing demographic and economic statistics.
• Showing election results, crime rates, or climate data across
locations.
Waterfall Charts
• A Waterfall chart is like a visual story that shows how various factors
contribute to a final result.
• The chart explains how an initial value is affected by showing the
cumulative effect of sequentially intermediate positive or negative values.
• The chart is often used for financial analysis, budgeting, and
understanding cumulative effects.
• The chart gets its name from its cascading appearance, resembling a
waterfall.
An Example for Waterfall Charts
• Considering different expenses, a company wants to analyze how its net profit is
derived from revenue. A waterfall chart shows a company's profit breakdown
from revenue to net income.
Category Value
(USD)
Revenue 100,000
Cost of Goods Sold -40,000
Operating Expenses -20,000
Taxes 10,000
Net Profit 30,000
An Example for Waterfall Charts
When to Use Waterfall Charts?
• Waterfall charts are ideal for visualizing both positive and negative
growth.
• Tracking changes from a starting value to a final total.
• They should be used whenever it’s essential to see overall changes.
• Understanding cumulative effects (e.g., budget allocation, project
cost analysis).
• Comparing gains and losses in a process (e.g., sales performance).
Funnel Charts
• A Funnel Chart is a type of visualization that represents data that
progressively decreases as it moves through different stages in a process.
• Funnel charts, shaped like funnels, are often used to filter data or track
progress step by step.
• They have a horizontal border at the top and a narrower border at the
bottom.
• The width of each section represents the proportion of data retained at
each step.
An Example for Funnel Charts
Create a funnel chart showing
a company’s sales conversion
process from leads to closed
deals.
Stage Count %
Website visitors 10,000 100%
Downloads 5,000 50%
Potential
2,500 25%
customers
Requested price 1,200 12%
Invoice sent 600 6%
An Example for Funnel Charts
When to Used Funnel Charts
• Funnel charts are most commonly used in business and sales contexts
to track how users or customers drop out at each process stage.
• This type of chart is excellent for highlighting the progression or
attrition of data.
• They help visualize how an initial group progressively narrows down,
revealing where significant drop-offs occur.
• We can also use them to compare data across various stages.
Types of Funnel Charts
• Stacked Funnel - Visualize data flow across different stages with
stacking based on a sub-category in each stage.
Buble Charts
• A bubble chart extends the scatter chart to explore relationships between three
numeric variables.
• Each dot in a bubble chart corresponds with a single data point.
• The variables’ values for each point are indicated by:
• X-axis: Represents one numerical variable.
• Y-axis: Represents another numerical variable.
• Bubble Size: Represents a third variable (e.g., magnitude, population,
revenue).
• Bubble charts help compare relationships and distributions with an additional
data dimension compared to standard scatter plots.
An Example for Buble Charts
• Using the GDP per capita data table, life
expectancy, and population data, let’s
create a Bubble Chart that illustrates
the relationship between GDP per capita
and Life Expectancy, with the size of the
bubbles representing the population.
Life
Population
Country GDP ($) Expectancy
(millions)
(years)
China 12,000 76 1,400
USA 65,000 79 331
India 2,500 70 1,380
Germany 50,000 81 83
Brazil 8,500 75 213
An Example for Buble Charts
When to Use Bubble Charts
• While a traditional scatter plot helps visualize the correlation
between two variables, a bubble chart allows you to analyze
a three-way relationship in a single visualization.
• A bubble chart primarily identifies patterns or correlations between
variables.
• When you have a large dataset and want to represent it visually,
bubble charts can display many data points in a single chart, making
it easier to spot trends and outliers.
Histogram Charts
• A histogram chart visually represents the distribution of quantitative data by grouping
data into bins (intervals).
• The height of each bar indicates the frequency (count) of data points that fall within
that range.
• Unlike bar charts, which compare different categories, histograms show the distribution
of numerical data.
• To create a histogram:
• The first step is to divide the range of values into bins (or buckets)—a series of
consecutive, non-overlapping intervals.
• Each bin represents a specific range, and the histogram counts how many data points
fall within each interval.
• The bins are typically adjacent and often of equal size, although they vary depending
on the data and analysis requirements.
An Example for Histogram Charts
• You run an online
store and want to analyze
the distribution of order
values.
• Instead of listing individual
orders [57.45 , 47.92, 59.71,
33.33, 29.15, 78.12, 20.11,
45.12, …].
• A histogram chart helps
summarize how frequently
orders fall within different
price ranges.
Histogram Charts
When to Used Histogram Charts
• Histograms are most effective for visualizing continuous numerical data.
• When analyzing a dataset’s distribution, a histogram helps identify patterns and
provides insights into what values are most common.
• Histograms provide more information on the distribution of values in a dataset.
• It is ideal for visualizing the spread and variation of the data, helping you
identify outliers or unusual data points that fall far outside the normal range of
the data.
• It is also useful when dealing with large data sets (greater than 100 observations).
Types of Histogram Charts
• Cumulative Histogram - represents the running total of observations across
bins, counting all data points up to a given bin. The cumulative histogram !! of
a histogram "" is defined as:
Types of Histogram Charts
• Overlapping Histogram - is used
to compare multiple datasets by
displaying their distributions on the
same plot. It is particularly useful
for analyzing changes before and
after events.
Candlestick Charts
• A candlestick chart (also called Japanese candlestick chart or K-line) is a
financial chart style used to represent the price movement of an asset over time.
• It consists of candlesticks, each representing four important pieces of information
for a specific period:
• Open price (starting price).
• Close price (ending price).
• High price (highest price during the period).
• Low price (lowest price during the period).
• Each candlestick has a body and wicks (shadows):
• Green (or White) Candle: The price closed higher than it opened (bullish
movement).
• Red (or Black) Candle: The price closed lower than it opened (bearish
movement).
An Example for
Candlestick Charts
• Imagine you are a stock trader analyzing
the price trends of Digital Coin. A
candlestick chart allows you to see
whether the stock price is going up or
down and the volatility trends.
Date Open High Low Close
2025-01-01 150 155 148 152
2025-01-02 152 157 149 150
2025-01-03 148 151 145 149
2025-01-04 155 160 152 158
2025-01-05 160 165 157 162
2025-01-06 158 162 155 160
2025-01-07 165 170 160 168
An Example for Candlestick Charts
When to Used Candlestick Charts
• Candlestick charts are essential tools in financial analysis. They provide
a detailed view of price movements over a specific period.
• They are widely used in stock, forex, cryptocurrency, commodity, and options
trading to help traders make informed decisions based on price action.
• Unlike simple line charts, candlestick charts display four key price points—
open, high, low, and close—allowing traders to assess an asset's price behavior
quickly.
• Each candlestick represents a specific time frame (e.g., 1 minute, 1 hour, 1
day, or 1 week) and helps traders understand market sentiment at a glance.
Scatter Maps
• A scatter map (or dot map) is a type of geospatial visualization that
displays data points on a geographical map using latitude and longitude
coordinates.
• It is commonly used to represent spatial distributions, trends, or data
densities across different locations.
• Each point on the scatter map represents a specific location with
additional attributes such as size, color, or intensity to convey more
information.
An Example of Scatter Maps
• A company wants to analyze the distribution of its retail stores
across Vietnam and visualize their sales performance. A scatter map can display:
• The location of each store (latitude & longitude).
• The size of the point indicating the store’s monthly revenue.
• The color of the points representing sales performance (e.g., high, medium,
low).
Hanoi Ho Chi Minh Da Nang Hai Phong Can Tho
Latitude 21.0285 10.7769 16.0544 20.8449 10.0452
Longitude 105.8542 106.7009 108.2022 106.6881 105.7469
Revenue 500000 750000 400000 350000 300000
Exercises
1. Discussion:
• Which colors are used for
male and female in this
report?
• Are they consistent?
• Is there anything you would
change about the colors of
these visuals?
Exercises
2. Draw a bar chart, stacked bar chart, and grouped bar chart for the data
table below:
3. Draw a line chart, area chart, and stack area chart for the data table
below:
Year Overall Sales Printer Sales
2017 60 32
2018 87 47
2019 73 40
2020 69 37
2021 63 39
Exercises
4. Create a Pie Chart to display sales data for a small business:
Category Sales Amount ($)
Electronics 4,500
Furniture 1,500
Clothing 3,000
Accessories 1,200
5. Look deeper at Electronics and create a Pie of Pie Chart to show how the
subcategories within Electronics contribute to their overall totals:
• Smartphones: $2,000
• Laptops: $1,500
• Headphones: $1,000
Feb 2nd, 2025 90
Exercises
6. Perform steps to build Hierarchical Clustering for the Clustered Heatmap
chart.
Hours Studies Score Street Address
An 10 15 11
Binh 20 20 24
Cuong 12 16 18
Dai 28 25 9
Exercises
7. An electronics store recorded customer complaints by category over the past month.
The data is as follows:
Issue Type Number of Complaints
Defective Product 50
Late Delivery 30
Poor Customer Service 20
Wrong Order 15
Unreasonable Pricing 10
Complicated Return Policy 5
a. Draw a Pareto chart to visualize the number of complaints for each issue.
b. Calculate the cumulative percentage to identify the most critical issues.
c. Mark the 80% cutoff point on the chart to determine the primary causes that need
attention.
Exercises
8. An e-commerce website tracks user behavior through the purchase process. The data is
as follows:
Stage Number of Users
Website Visit 10,000
Product View 7,500
Added to Cart 4,000
Checkout Started 2,500
Purchase Completed 1,800
a. Create a Funnel chart to display the number of users at each stage.
b. Analyze which stage has the highest drop-off rate and provide observations.
c. If the company wants to improve the conversion rate, which step should they focus
on?
Exercises
9. A company tracks order processing times (in minutes) and collects the following data:
22, 25, 30, 32, 28, 35, 40, 45, 38, 50,
55, 60, 62, 65, 48, 52, 58, 63, 67, 70,
18, 20, 22, 27, 29, 34, 36, 42, 44, 47,
53, 56, 61, 64, 66, 68, 72, 75, 78, 80.
a. Create a Histogram chart with a reasonable number of bins.
b. Identify the most common processing time range.
c. Calculate the median and standard deviation of the dataset.
References
[1] Tamara Munzner. Visualization Analysis and Design. A K Peters Visualization Series, CRC Press,
2014.
[2] Mastering data charts: A comprehensive guide to visualization. Atlassian.
[Link]
[3] T. Hammond. 2023. 20 types of charts and graphs for data visualization.
[Link]