Data Visualization
Visualization
• “Data is the new oil”
• Like oil, data in its raw, unrefined form is pretty worthless
• To unlock its value, data needs to be refined, analyzed and
understood
• More and more organizations are seeing potential in their data
connections, but how do you allow non-experts to analyze data at
scale and extract potentially complex insights? One answer is
through interactive graph visualization.
Visualization
• Visualization is the quickest way for understanding the data
• 90% of the information transmitted to the brain is visual
• Humans process images 60,000 times faster than text
• 70% of our sensory receptors are in our eyes
• 65% of people are visual learners
• In short, visual data is easier to remember than words
• Studies have also shown that while only 10-20% of written or spoken data is
remembered, 65% of information is remember when it’s presented visually
• That’s why it’s important to present your most important data visually
• It ensures your information is processed faster, more easily understood and
remembered.
Visualization
Why is data visualization important?
Data visualization has 5 additional benefits:
1. Amplifies your message: Your message is amplified in a few different
ways
• First of all, by taking the time to create data visualizations, you show
your audience that you’ve done your homework
• That alone gives a sense of credibility to your content
• Without visualizations, you run the risk of your audience not
understanding what you are trying to present
• Your data might even be received as meaningless, and your entire
message lost
Visualization
2. Gives meaning to your data
• Visualizations communicate valuable insights by creating visual
representations of your data
• For example, an Excel spreadsheet showing that Microsoft’s sales
revenue has almost doubled between 2011 and 2018 isn’t nearly as
effective as graphic that data in a simple column with some formatting
• And notice how much easier it is to visualize that change in revenue in
the picture above on the right
• This also gets into the importance of highlighting your point visually in
your data visualizations, which you can learn more about in the video
below
Cont’d
Visualization
3. Saves time
• Instead of spending the time trying to figure out what the facts and
figures mean, your audience members can ENGAGE with the meaning
• A visual representation allows you to analyze huge amounts of info in
the blink of an eye
• As we know, the human eye can recognize and process visual
information much faster than text
4. Makes for better decision making
• Assuming your data visualizations contain correct data and are done
properly, you’ll not only be able to make decisions faster, but they will
be based on data that you fully comprehend
Visualization
5. Is more shareable and digestible
• One of the best things about data visualization is that they are
accessible and easier to share across departments, with
colleagues, your boss, or with a large audience
• They can be inserted in your PowerPoint presentation, printed for
seminar handouts, or even posted and shared on social media.
Visualization Workflow
• Visualization workflow is the process of converting raw data into
meaningful visuals
• Helps in better understanding, analysis, and decision-making
• Widely used in business, research, and analytics
Workflow Overview
• Data Collection
• Data Preparation
• Data Exploration
• Define Objective
• Choose Visualization
• Build Visualization
• Interpretation
• Communication
• Feedback & Deployment
Data Collection
• Gather data from multiple sources:
• Databases
• APIs
• Surveys
• Files (CSV, Excel)
• Ensure data is relevant and accurate
Data Preparation
• Clean the data:
• Remove duplicates
• Handle missing values
• Transform data:
• Normalize values
• Convert formats
• Select required attributes
Data Exploration (EDA)
• Understand the dataset
• Use:
• Summary statistics
• Correlation analysis
• Identify:
• Patterns
• Trends
• Outliers
Define Objective
• Identify the goal of visualization
• Questions to consider:
• What problem are we solving?
• Who is the audience?
• Example:
• Business → dashboards
• Research → detailed analysis
Choose Visualization Type
• Based on data:
• Bar Chart → Comparison
• Line Graph → Trends
• Pie Chart → Composition
• Histogram → Distribution
• Scatter Plot → Relationships
Design & Build Visualization
• Use tools like:
• Tableau
• Power BI
• Python libraries
• Follow design principles:
• Clear labels and titles
• Proper color usage
• Avoid clutter
Interpretation
• Analyze the visuals
• Identify:
• Key insights
• Patterns
• Anomalies
• Convert data into meaningful information
Communication
• Present results through:
• Dashboards
• Reports
• Presentations
• Make it easy for the audience to understand
Feedback & Iteration
• Collect feedback from users
• Improve visualization clarity
• Refine based on requirements
Deployment
• Share dashboards via:
• Web platforms
• Business intelligence tools
• Enable real-time updates if needed
Data Abstraction
Data abstraction
Data abstraction
Data abstraction
❖ This figure shows the abstract types of what can be visualized.
❖ The four basic dataset types are tables, networks, fields, and geometry; other
possible collections of items include clusters, sets, and lists.
❖ These datasets are made up of different combinations of the five data types:
items, attributes, links, positions, and grids.
❖ For any of these dataset types, the full dataset could be available immediately in
the form of a static file, or it might be dynamic data processed gradually in the
form of a stream.
❖ The type of an attribute can be categorical or ordered, with a further split into
ordinal and quantitative.
❖ The ordering direction of attributes can be sequential, diverging, or cyclic.
Why Do Data Semantics and Types Matter?
❑ What kind of data are you given?
❑ What information can you figure out from the data, versus the meanings that you
must be told explicitly?
❑ What high-level concepts will allow you to split datasets apart into general and useful
pieces?
Suppose that you see the following data:
14, 2.6, 30, 30, 15, 100001
❖ What does this sequence of six numbers mean?
Similarly, suppose that you see the following data:
Basil, 7, S, Pear
❖ These numbers and words could have many possible meanings.
• To know about the data, two crosscutting pieces of information are required. Theses
are:
• Semantics of data
• Types of data.
• The semantics of the data is its real-world meaning.
• For instance, does a word represent a human first name,
• or !!!!!!!!!!!
• is it the shortened version of a company name where the full name can be looked
up in an external list,
• or !!!!!!!!!
• is it a city,
• or !!!!!!!!!!! is it a fruit?
• The type of the data is its structural or mathematical interpretation.
• Two levels:
• At the data level, what kind of thing is it: an item, a link, an attribute?
• At the attribute level: what kinds of mathematical operations are meaningful for
it?
• For example: if a number represents a count of boxes of detergent, then its type is a
quantity, and adding two such numbers together makes sense.
• If the number represents a postal code, then its type is a code rather than a quantity—
it is simply the name for a category that happens to be a number rather than a textual
name.
• Adding two of these numbers together does not make sense.
• Meta data:
• Additional (textual information) information of the original dataset is called
metadata
• ID Name Age Shirt Size Favorite Fruit
• 1 Amy 8 S Apple
• 2 Basil 7 S Pear
• 3 Clara 9 M Durian
• 4 Desm 13 L Elderberry
• 5 Ernest 12 L Peach
• 6 Fanny 10 S Lychee
• 7 Geore 9 M Orange
• 8 Hect 8 L Loquat
• 9 Ida 10 M Pear
• 10 Amy 12 M Orange
Data types
❖ An attribute is some specific property that can be measured, observed, or logged.
❖ For Synonyms for attribute are variable and data dimension, or just dimension for short.
❖ Example: attributes could be salary, price, number of sales, protein expression
levels, or temperature, weather data.
❖ An item is an individual entity that is discrete, such as a row in a simple table or a
node in a network.
❖ For example, items may be people, stocks, coffee shops, genes, or cities.
❖ A link is a relationship between items, typically within a network.
❖ A grid specifies the strategy for sampling continuous data in terms of both
geometric and topological relationships between its cells.
❖ A position is spatial data, providing a location in two-dimensional (2D) or three-
dimensional (3D) space.
❖ For example, a position might be a latitude–longitude pair describing a
location on the Earth’s surface or three numbers specifying a location within
the region of space measured by a medical scanner.
Dataset types
❖ A dataset is any collection of information that is the target of analysis. The
four basic dataset types are:
❖ tables, networks, fields, and geometry.
Dataset types
Tables: made up of rows and columns: spreadsheet.
❖ flat table: each row represents an item of data, and each column is an attribute
of the dataset.
❖ Each cell in the table is fully specified by the combination of a row and a
column—an item and an attribute—and contains a value for that pair.
❖ A multidimensional table has a more complex structure for indexing into a cell,
with multiple keys.
Networks: it is well suited for specifying that there is some kind of relationship
between two or more items.
❖ An item in a network is known as node
❖ A link is a relation between two items
❖ For example, in an articulated social network the nodes are people, and
links mean friendship.
❖ In a gene interaction network, the nodes are genes, and links between
them that these genes have been observed to interact with each other.
❖ Networks with hierarchical structure are called trees.
❖ Note: trees do not have cycles: each child node has only one parent node pointing
to it.
Field dataset type also contains attribute values associated with cells.
❖ Each cell in a field contains measurements or calculations from a continuous
domain.
❖ Continuous data requires careful treatment that takes into account the
mathematical questions of sampling
❖ Sampling: how frequently to take the measurements, and
❖ Interpolation: how to show values in between the sampled points in a
way that does not mislead.
❖ Continuous data is often found in the form of a spatial field, where the cell
structure of the field is based on sampling at spatial positions.
❖ Most datasets that contain inherently spatial data occur in the context of tasks
that require understanding aspects of its spatial structure, especially shape
❖ Grids – Uniform Grid, Unstructured grid
❖ The geometry dataset type specifies information about the shape of items with
explicit spatial positions.
❖ The items could be points, or one-dimensional lines or curves, or 2D surfaces
or regions, or 3D volumes.
❖ Geometry datasets are intrinsically spatial, and like spatial fields they typically
occur in the context of tasks that require shape understanding
Other Combinations
❖ Set
❖ Lists
❖ Cluster
❖ Path
Data availability
The two kinds of dataset availability: static or dynamic.
❖ the entire dataset is available all at once, as a static file.
❖ Some datasets are available in dynamic streams: One kind of dynamic
change is to add new items or delete previous items.
Attribute types
❖ The type of categorical data, such as favorite fruit or names, doesn’t have an
implicit ordering, but it often has hierarchical structure.
❖ Examples of categorical attributes are fruits (apples, oranges, etc..), movie genres,
file types, and city names.
❖ All ordered data does have an implicit ordering, as opposed to unordered
categorical data.
❖ This type can be further subdivided such as ordinal and quantitative.
❖ With ordinal data, such as shirt size, we cannot do full-fledged arithmetic, but
there is a well-defined ordering. For example, large minus medium is not a
meaningful concept, but we know that medium falls between small and large.
❖ A subset of ordered data is quantitative data, namely, a measurement of
magnitude that supports arithmetic comparison.
❖ For example, the quantity of 68 inches minus 42 inches is a meaningful
concept, and the answer of 26 inches can be calculated.
❖ Other examples of quantitative data are height, weight, temperature, stock
price, number of calling functions in a program, and number of drinks sold at a
coffee shop in a day.
Task Abstraction
Analysis: Four Levels for
Validation
Data Representation
Chart Types
Data Representation: Chart Types
• When visualizing data, choosing the right chart type is crucial to
effectively communicate patterns, trends, and relationships.
Donutchart
Sunburst
and Tree
map chart
Sankeydiagram
Scatter Map Choropleth Map
• Flow Map
• Flow Map is a variation of thematic maps used in cartography to
visualize how objects —for example, people or goods —move
between different geographical locations.
Flow Mapsare capable of displaying both quantitative and
qualitative types of data.
Lines in Flow Mapscan have different widths to visualize the flow
volume, and markers to show the direction.