•
•
Unit II: Data Preparation, Summarisation and Visualisation Using spreadsheet (6 hours) Data Preparation and Cleaning, Sort and
filter, Conditional formatting, Text to Column, Removing Duplicates, Data Validation, identifying outliers in the data, covariance and
correlation matrix, Moving Averages, Finding the missing value from data; Summarisation; Visualisation: scatter plots, line charts,
histogram, etc., Pivot Tables, pivot charts and interactive dashboards.
Unit II: Data Preparation, Summarisation and Visualisation Using Spreadsheet
(Detailed 10-Mark Answer Notes | [Link] Business Analytics)
1. Data Preparation and Cleaning
Meaning
Data preparation and cleaning is the process of converting raw data into a clean, accurate, and usable format before analysis.
In real business situations, data often contains:
• Missing values
• Duplicate entries
• Incorrect spellings
• Formatting issues
• Outliers
Therefore, cleaning is necessary to improve accuracy of analysis.
Importance
• Improves data quality
• Removes errors and inconsistencies
• Increases reliability of results
• Helps in better decision-making
Example
A company sales sheet may contain:
Employee Sales
Amit 50,000
Amit 50,000
Rahul Blank
Problems:
• Duplicate records
• Missing values
Cleaning removes these issues.
2. Sort and Filter
(A) Sorting
Meaning
Sorting means arranging data in a particular order.
Types
1. Ascending Order
• A to Z
• Smallest to Largest
2. Descending Order
• Z to A
• Largest to Smallest
Example
Sales Data:
Name Sales
Amit 20,000
Raj 50,000
Neha 10,000
After Sorting (Largest to Smallest):
Raj → 50,000
Amit → 20,000
Neha → 10,000
Steps in Excel
1. Select data
2. Go to Data Tab
3. Click Sort
4. Choose column and order
Uses
• Rank employees
• Arrange sales records
• Organize customer data
(B) Filtering
Meaning
Filtering helps display only selected data while hiding unwanted data.
Example
A company wants only Delhi customers.
Filter city = Delhi.
Steps
1. Select data
2. Go to Data → Filter
3. Choose condition
Benefits
• Quick analysis
• Saves time
• Easy data segmentation
3. Conditional Formatting
Meaning
Conditional formatting automatically changes cell color, font, or style based on conditions.
Purpose
Used to identify:
• High sales
• Low performance
• Duplicates
• Negative values
Example
If sales > ₹50,000 → Green color.
Sales
20,000
75,000
15,000
Steps in Excel
1. Select cells
2. Home → Conditional Formatting
3. Choose rule
Business Application
Banks use it to identify overdue accounts.
4. Text to Column
Meaning
Text to Column splits combined text into separate columns.
Example
Before:
Full Name
Amit Raj
After:
First Name Last Name
Amit Raj
Types
1. Delimited
• Based on comma, space, tab
2. Fixed Width
• Fixed character length
Steps
1. Select column
2. Data → Text to Columns
3. Choose delimiter
Uses
• Customer databases
• Employee records
5. Removing Duplicates
Meaning
Duplicate data means repeated records.
Removing duplicates helps maintain accuracy.
Example
Customer
Amit
Rahul
Amit
After removing duplicates:
Amit
Rahul
Steps
1. Select data
2. Data → Remove Duplicates
3. Select columns
Advantages
• Avoids repeated counting
• Improves accuracy
6. Data Validation
Meaning
Data validation restricts wrong data entry.
Example
Age must be between 18–60.
Wrong entry like 100 will not be accepted.
Types
• Whole number
• Decimal
• Date
• List/Dropdown
Example
Gender dropdown:
• Male
• Female
• Other
Steps
1. Select cell
2. Data → Data Validation
3. Set rules
Importance
Prevents human error.
7. Identifying Outliers in Data
Meaning
Outliers are unusual or extreme values different from normal observations.
Example
Sales:
10,000
12,000
15,000
14,000
2,00,000 ← Outlier
Causes
• Entry mistake
• Fraud
• Exceptional event
Methods to Identify
1. Box Plot
2. Z-score
3. Interquartile Range (IQR)
Importance
• Improves analysis accuracy
• Detects fraud or abnormal behavior
Business Example
Unusually high transaction in a bank may indicate fraud.
8. Covariance and Correlation Matrix
(A) Covariance
Meaning
Covariance measures how two variables move together.
Formula
∑(𝑋 − 𝑋ˉ)(𝑌 − 𝑌ˉ)
𝐶𝑜𝑣(𝑋, 𝑌) =
𝑛−1
Interpretation
• Positive → move together
• Negative → move opposite
Example
Advertising ↑ and Sales ↑
Positive covariance.
(B) Correlation
Meaning
Correlation measures strength and direction of relationship between variables.
Formula
𝐶𝑜𝑣(𝑋, 𝑌)
𝑟=
𝜎𝑥 𝜎𝑦
Range
• +1 = Perfect positive relation
• 0 = No relation
• −1 = Perfect negative relation
Example
Study hours and marks = positive correlation.
Correlation Matrix
A table showing correlation among multiple variables.
Example:
Variables Sales Profit Cost
Sales 1 0.8 -0.2
Profit 0.8 1 -0.5
Excel Function
=CORREL(array1,array2)
9. Moving Averages
Meaning
Moving average smoothens fluctuations and identifies trends over time.
Used in:
• Sales forecasting
• Stock prices
• Demand prediction
Formula
Sum of observations
𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 =
Number of periods
Example
Sales:
100, 120, 140
3-period moving average:
100 + 120 + 140
= = 120
3
Advantages
• Removes short-term fluctuations
• Helps trend analysis
10. Finding Missing Values from Data
Meaning
Missing values occur when information is unavailable.
Example
Name Sales
Amit 10,000
Raj Blank
Methods to Handle Missing Data
1. Delete missing rows
2. Replace with average
3. Replace with median
4. Use prediction methods
Excel Functions
• ISBLANK()
• Filter blanks
Importance
Missing values can distort analysis.
11. Summarisation
Meaning
Summarisation means converting large data into meaningful information.
Methods
• Average
• Sum
• Count
• Percentage
• Pivot Table summary
Example
Total sales = ₹5,00,000
Average sales = ₹50,000
Importance
Makes data easy to understand.
12. Visualisation
Meaning
Data visualization means presenting data in graphical form.
It helps:
• Easy understanding
• Pattern recognition
• Better decision-making
(A) Scatter Plot
Meaning
Shows relationship between two variables.
Example
Advertising vs Sales
Use
Identify correlation.
(B) Line Chart
Meaning
Shows trends over time.
Example
Monthly sales trend.
Use
Forecast growth.
(C) Histogram
Meaning
Shows frequency distribution of data.
Example
Distribution of marks.
Use
Understand spread of data.
13. Pivot Tables
Meaning
Pivot Table summarizes large datasets quickly.
Functions
• Sum
• Count
• Average
• Filter
Example
Total sales city-wise.
City Sales
Delhi 50,000
Mumbai 40,000
Steps
1. Select data
2. Insert → Pivot Table
3. Drag fields into rows and values
Advantages
• Quick reporting
• Dynamic analysis
14. Pivot Charts
Meaning
Charts created from pivot tables.
Purpose
Visual representation of summarized data.
Types
• Bar chart
• Pie chart
• Column chart
Example
City-wise sales graph.
15. Interactive Dashboards
Meaning
An interactive dashboard is a visual screen displaying important business metrics.
It combines:
• Charts
• Pivot tables
• Filters
• KPIs
Features
• Real-time insights
• Interactive filters (Slicers)
Example
Sales Dashboard:
• Total Revenue
• Profit %
• Monthly Trend
• Top Products
Business Importance
Managers use dashboards for decision-making.
Important Spreadsheet/Excel Functions for Exam
Function Purpose
SORT() Arrange data
FILTER() Extract data
CORREL() Correlation
AVERAGE() Mean
COUNT() Count values
ISBLANK() Missing values
IF() Conditional logic
10-Mark Conclusion
Data preparation, summarisation, and visualization are important parts of business analytics. Spreadsheet tools like Excel help
clean data, remove errors, analyze relationships, summarize information, and create charts and dashboards for better business
decision-making.