0% found this document useful (0 votes)
5 views13 pages

Unit II

Unit II focuses on data preparation, summarization, and visualization using spreadsheets, covering essential techniques such as data cleaning, sorting, filtering, and identifying outliers. It emphasizes the importance of accurate data analysis through methods like covariance, correlation, and moving averages, alongside visual tools like pivot tables and interactive dashboards. The unit highlights the role of spreadsheet functions in enhancing decision-making processes in business analytics.

Uploaded by

02 Amit Raj
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views13 pages

Unit II

Unit II focuses on data preparation, summarization, and visualization using spreadsheets, covering essential techniques such as data cleaning, sorting, filtering, and identifying outliers. It emphasizes the importance of accurate data analysis through methods like covariance, correlation, and moving averages, alongside visual tools like pivot tables and interactive dashboards. The unit highlights the role of spreadsheet functions in enhancing decision-making processes in business analytics.

Uploaded by

02 Amit Raj
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd


Unit II: Data Preparation, Summarisation and Visualisation Using spreadsheet (6 hours) Data Preparation and Cleaning, Sort and
filter, Conditional formatting, Text to Column, Removing Duplicates, Data Validation, identifying outliers in the data, covariance and
correlation matrix, Moving Averages, Finding the missing value from data; Summarisation; Visualisation: scatter plots, line charts,
histogram, etc., Pivot Tables, pivot charts and interactive dashboards.
Unit II: Data Preparation, Summarisation and Visualisation Using Spreadsheet

(Detailed 10-Mark Answer Notes | [Link] Business Analytics)

1. Data Preparation and Cleaning

Meaning

Data preparation and cleaning is the process of converting raw data into a clean, accurate, and usable format before analysis.

In real business situations, data often contains:

• Missing values

• Duplicate entries

• Incorrect spellings

• Formatting issues

• Outliers

Therefore, cleaning is necessary to improve accuracy of analysis.

Importance

• Improves data quality

• Removes errors and inconsistencies

• Increases reliability of results

• Helps in better decision-making

Example

A company sales sheet may contain:

Employee Sales

Amit 50,000

Amit 50,000

Rahul Blank

Problems:

• Duplicate records

• Missing values

Cleaning removes these issues.


2. Sort and Filter

(A) Sorting

Meaning

Sorting means arranging data in a particular order.

Types

1. Ascending Order

• A to Z

• Smallest to Largest

2. Descending Order

• Z to A

• Largest to Smallest

Example

Sales Data:

Name Sales

Amit 20,000

Raj 50,000

Neha 10,000

After Sorting (Largest to Smallest):

Raj → 50,000
Amit → 20,000
Neha → 10,000

Steps in Excel

1. Select data

2. Go to Data Tab

3. Click Sort

4. Choose column and order

Uses

• Rank employees

• Arrange sales records

• Organize customer data

(B) Filtering
Meaning

Filtering helps display only selected data while hiding unwanted data.

Example

A company wants only Delhi customers.

Filter city = Delhi.

Steps

1. Select data

2. Go to Data → Filter

3. Choose condition

Benefits

• Quick analysis

• Saves time

• Easy data segmentation

3. Conditional Formatting

Meaning

Conditional formatting automatically changes cell color, font, or style based on conditions.

Purpose

Used to identify:

• High sales

• Low performance

• Duplicates

• Negative values

Example

If sales > ₹50,000 → Green color.

Sales

20,000

75,000

15,000

Steps in Excel

1. Select cells

2. Home → Conditional Formatting


3. Choose rule

Business Application

Banks use it to identify overdue accounts.

4. Text to Column

Meaning

Text to Column splits combined text into separate columns.

Example

Before:

Full Name

Amit Raj

After:

First Name Last Name

Amit Raj

Types

1. Delimited

• Based on comma, space, tab

2. Fixed Width

• Fixed character length

Steps

1. Select column

2. Data → Text to Columns

3. Choose delimiter

Uses

• Customer databases

• Employee records

5. Removing Duplicates

Meaning

Duplicate data means repeated records.

Removing duplicates helps maintain accuracy.

Example
Customer

Amit

Rahul

Amit

After removing duplicates:

Amit
Rahul

Steps

1. Select data

2. Data → Remove Duplicates

3. Select columns

Advantages

• Avoids repeated counting

• Improves accuracy

6. Data Validation

Meaning

Data validation restricts wrong data entry.

Example

Age must be between 18–60.

Wrong entry like 100 will not be accepted.

Types

• Whole number

• Decimal

• Date

• List/Dropdown

Example

Gender dropdown:

• Male

• Female

• Other

Steps
1. Select cell

2. Data → Data Validation

3. Set rules

Importance

Prevents human error.

7. Identifying Outliers in Data

Meaning

Outliers are unusual or extreme values different from normal observations.

Example

Sales:

10,000
12,000
15,000
14,000
2,00,000 ← Outlier

Causes

• Entry mistake

• Fraud

• Exceptional event

Methods to Identify

1. Box Plot

2. Z-score

3. Interquartile Range (IQR)

Importance

• Improves analysis accuracy

• Detects fraud or abnormal behavior

Business Example

Unusually high transaction in a bank may indicate fraud.

8. Covariance and Correlation Matrix

(A) Covariance

Meaning

Covariance measures how two variables move together.

Formula

∑(𝑋 − 𝑋ˉ)(𝑌 − 𝑌ˉ)


𝐶𝑜𝑣(𝑋, 𝑌) =
𝑛−1
Interpretation

• Positive → move together

• Negative → move opposite

Example

Advertising ↑ and Sales ↑

Positive covariance.

(B) Correlation

Meaning

Correlation measures strength and direction of relationship between variables.

Formula

𝐶𝑜𝑣(𝑋, 𝑌)
𝑟=
𝜎𝑥 𝜎𝑦

Range

• +1 = Perfect positive relation

• 0 = No relation

• −1 = Perfect negative relation

Example

Study hours and marks = positive correlation.

Correlation Matrix

A table showing correlation among multiple variables.

Example:

Variables Sales Profit Cost

Sales 1 0.8 -0.2

Profit 0.8 1 -0.5

Excel Function

=CORREL(array1,array2)

9. Moving Averages

Meaning

Moving average smoothens fluctuations and identifies trends over time.

Used in:
• Sales forecasting

• Stock prices

• Demand prediction

Formula

Sum of observations
𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 =
Number of periods

Example

Sales:

100, 120, 140

3-period moving average:

100 + 120 + 140


= = 120
3

Advantages

• Removes short-term fluctuations

• Helps trend analysis

10. Finding Missing Values from Data

Meaning

Missing values occur when information is unavailable.

Example

Name Sales

Amit 10,000

Raj Blank

Methods to Handle Missing Data

1. Delete missing rows

2. Replace with average

3. Replace with median

4. Use prediction methods

Excel Functions

• ISBLANK()

• Filter blanks

Importance
Missing values can distort analysis.

11. Summarisation

Meaning

Summarisation means converting large data into meaningful information.

Methods

• Average

• Sum

• Count

• Percentage

• Pivot Table summary

Example

Total sales = ₹5,00,000

Average sales = ₹50,000

Importance

Makes data easy to understand.

12. Visualisation

Meaning

Data visualization means presenting data in graphical form.

It helps:

• Easy understanding

• Pattern recognition

• Better decision-making

(A) Scatter Plot

Meaning

Shows relationship between two variables.

Example

Advertising vs Sales

Use

Identify correlation.

(B) Line Chart

Meaning
Shows trends over time.

Example

Monthly sales trend.

Use

Forecast growth.

(C) Histogram

Meaning

Shows frequency distribution of data.

Example

Distribution of marks.

Use

Understand spread of data.

13. Pivot Tables

Meaning

Pivot Table summarizes large datasets quickly.

Functions

• Sum

• Count

• Average

• Filter

Example

Total sales city-wise.

City Sales

Delhi 50,000

Mumbai 40,000

Steps

1. Select data

2. Insert → Pivot Table

3. Drag fields into rows and values

Advantages

• Quick reporting
• Dynamic analysis

14. Pivot Charts

Meaning

Charts created from pivot tables.

Purpose

Visual representation of summarized data.

Types

• Bar chart

• Pie chart

• Column chart

Example

City-wise sales graph.

15. Interactive Dashboards

Meaning

An interactive dashboard is a visual screen displaying important business metrics.

It combines:

• Charts

• Pivot tables

• Filters

• KPIs

Features

• Real-time insights

• Interactive filters (Slicers)

Example

Sales Dashboard:

• Total Revenue

• Profit %

• Monthly Trend

• Top Products

Business Importance

Managers use dashboards for decision-making.


Important Spreadsheet/Excel Functions for Exam

Function Purpose

SORT() Arrange data

FILTER() Extract data

CORREL() Correlation

AVERAGE() Mean

COUNT() Count values

ISBLANK() Missing values

IF() Conditional logic

10-Mark Conclusion

Data preparation, summarisation, and visualization are important parts of business analytics. Spreadsheet tools like Excel help
clean data, remove errors, analyze relationships, summarize information, and create charts and dashboards for better business
decision-making.

You might also like