0% found this document useful (0 votes)
51 views19 pages

Unit 4 Data Analysis Using Spreadsheet

This document provides an overview of data analysis using spreadsheets, highlighting their importance in organizing, cleaning, and analyzing data. It outlines key steps in data preparation and cleaning, as well as techniques for data visualization and descriptive statistics. Additionally, it discusses the significance of hypothesis testing in making data-driven decisions.

Uploaded by

shohelansarirvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views19 pages

Unit 4 Data Analysis Using Spreadsheet

This document provides an overview of data analysis using spreadsheets, highlighting their importance in organizing, cleaning, and analyzing data. It outlines key steps in data preparation and cleaning, as well as techniques for data visualization and descriptive statistics. Additionally, it discusses the significance of hypothesis testing in making data-driven decisions.

Uploaded by

shohelansarirvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT 4 DATA ANALYSIS USING SPREDSHEET

DATA ANALYSIS USING SPREADSHEET

In modern business research, spreadsheets such as Microsoft Excel and Google Sheets are essential
tools. They help researchers store data, clean it, analyse it, and present it visually. These tools are
powerful yet simple, making them suitable for both beginners and professional analysts.

Spreadsheets act as a bridge between raw data and meaningful insights. They help transform large
and messy data into organized, understandable, and decision-friendly information.

Meaning of Spreadsheet in Data Analysis

A spreadsheet is a grid-like software that organizes data into rows and columns.
It allows a researcher to:

• Enter and store data

• Apply formulas and functions

• Perform calculations

• Summarize data

• Identify patterns

• Create charts and graphs

• Present results clearly

✔ Simple Meaning

A spreadsheet is a tool that helps in analysing data quickly and accurately using formulas, charts,
and tables.

Why Spreadsheet is Important in Data Analysis?

1. Easy to Use
Even non-technical users can perform complex analysis.

2. Organized Data Structure


Rows = observations
Columns = variables

3. Fast Calculations
Formulas can calculate totals, averages, percentages, correlations, etc.

4. Data Cleaning
Helps remove errors, duplicates, missing values, and inconsistencies.

5. Visualization Tools
Charts and graphs make interpretation simple.

6. Cost-Effective
Tools like Google Sheets are completely free.
7. Widely Used
Almost all industries use spreadsheets for data analysis.

Key Steps in Data Analysis Using Spreadsheet.

1. Data Entry and Preparation

Before analysis, data must be entered cleanly into a spreadsheet.


Each row represents one respondent, and each column represents one variable.

Data Preparation Includes:

• Naming variables clearly

• Formatting cells (numbers, dates, text)

• Removing blank rows

• Handling missing data

• Standardizing responses

A well-organized worksheet makes analysis smooth.

2. Data Cleaning

Data often contains errors, duplicates, or inconsistencies.


Spreadsheet tools help fix them.

Techniques:

• Remove Duplicates

• Find and Replace

• TRIM (), CLEAN (), SUBSTITUTE () functions

• Checking outliers

• Converting text to numbers

• Handling missing data (delete, replace, or code)

Clean data = Accurate results.

3. Sorting Data

Sorting helps arrange data in a meaningful order.

Examples:

• Sort employees by salary

• Sort students by marks

• Sort customers by age


Sorting can be ascending or descending.

4. Filtering Data

Filtering shows only the rows that match certain conditions.

Example:

• Customers from Delhi

• Sales > ₹20,000

• Students who scored more than 80%

Filter is very useful for focusing on specific data segments.

5. Using Formulas and Functions for Analysis

Spreadsheets offer hundreds of powerful functions for calculations.

A. Basic Functions

Used for simple calculations:

• SUM () → Total

• AVERAGE () → Mean

• COUNT () → Number of entries

• MAX ()/MIN () → Highest/Lowest value

B. Conditional Functions

• IF () → Logical test

• SUMIF () → Sum based on condition

• COUNTIF () → Count based on condition

• AVERAGEIF () → Average based on condition

Example:
=SUMIF (B2:B20, ">50", C2:C20)
(Adds sales for quantities greater than 50)

C. Statistical Functions

• STDEV () → Standard deviation

• VAR () → Variance

• CORREL () → Correlation between two variables

• LINEST () → Regression

These help measure variability, relationship, and prediction.


6. Pivot Tables — The Heart of Analysis

Pivot tables summarize large data instantly.

Uses:

• Total sales by region

• Average marks by class

• Count of customers by gender

• Revenue by product category

Pivot tables convert raw data into meaningful tables automatically.


They are one of the most powerful tools in Excel/Sheets.

7. Data Visualization (Charts & Graphs)

Charts help interpret data visually.

Common Charts:

• Bar Chart → Comparisons

• Pie Chart → Proportions

• Line Chart → Trends

• Histogram → Distribution

• Scatter Plot → Relationships

Charts help non-technical readers understand findings easily.

8. Data Interpretation

After performing calculations, summarizing data, and visualizing results, the final step is
interpreting what the data means.

Example Interpretation:

• If correlation between study time and marks = 0.82 → Strong positive relationship

• If sales increased in line chart → upward trend

• Pivot table shows highest sales from South Region → managerial insight

Interpretation turns numbers into knowledge.

Advantages of Using Spreadsheet for Data Analysis

1. Easy to use and learn

2. Fast calculations and automation

3. Ideal for small and medium datasets


4. Excellent visualization tools

5. Helpful for report writing

6. Accessible and low-cost

7. Supports formulas, tables, and charts

Limitations of Spreadsheet

1. Not suitable for very large datasets

2. Risk of manual errors

3. Limited advanced statistical analysis

4. Slow performance when data is heavy

5. Less secure than advanced statistical software

DATA PREPARATION AND CLEANING


Data analysis begins only after the data has been properly prepared and cleaned.
Raw data collected from surveys, questionnaires, interviews, or spreadsheets is usually messy,
incomplete, or inaccurate.
If the data is not cleaned, the analysis will be misleading and incorrect.

Therefore, Data Preparation and Cleaning is one of the most crucial stages in the entire research
process.

Meaning of Data Preparation

Data preparation refers to the process of organizing, structuring, transforming, and preparing raw
data so that it becomes suitable for analysis.

✔ Simple Meaning:

Data preparation means getting the data ready for analysis by arranging it, correcting it, and
ensuring it is usable.

It acts as a bridge between data collection and data analysis.

Meaning of Data Cleaning

Data cleaning is the process of removing errors, fixing inconsistencies, filling missing values,
correcting incorrect entries, and removing duplicates to ensure the data is accurate and valid.

✔ Simple Meaning:

Data cleaning means correcting wrong data, removing unwanted data, and improving data quality.

Clean data = Correct results


Dirty data = Misleading results
Importance of Data Preparation and Cleaning

1. Ensures accuracy
Errors in raw data can lead to wrong conclusions.

2. Improves reliability
Clean data increases trust in results.

3. Reduces noise
Removes irrelevant or duplicate information.

4. Enables correct analysis


Many statistical tools require structured data.

5. Prevents costly mistakes


Incorrect insights can cause bad business decisions.

6. Saves time during analysis


Clean data makes analysis faster and smoother.

Steps in Data Preparation and Cleaning

Data preparation consists of several systematic steps:

1. Data Editing

Data editing means checking the collected data for completeness, consistency, and accuracy.

Tasks in Data Editing:

• Ensuring there are no skipped questions


• Checking that responses are logical
• Verifying that entries are within acceptable range
• Identifying contradictions

Example:
A 12-year-old respondent cannot have 15 years of work experience.

Editing is the first check of data correctness.

2. Data Coding

Data coding involves converting responses into numerical values so that they can be analysed
easily.

Example:

Gender:
Male = 1
Female = 2

Satisfaction (Likert Scale):


Strongly Agree = 5
Agree = 4
Neutral = 3
Disagree = 2
Strongly Disagree = 1

Coding helps in using formulas, statistical tools, and pivot tables.

3. Data Classification

Classification means arranging data into meaningful categories, groups, or classes.

Types:

• Chronological (time-based)
• Geographical
• Categorical (gender, education, income group)
• Frequency-based (how many people selected option X)

Classification helps simplify large data into understandable segments.

4. Data Tabulation

Tabulation means organizing data into tables for easy understanding and comparison.

Types:
• Simple tables
• Cross-tabulation (two variables)
Example: Gender vs. Satisfaction level.

Tabulation is essential for analysis, pivot tables, and report writing.

5. Handling Missing Data

Raw data often has missing values.


They must be handled carefully, using techniques like:

Techniques to handle missing data:

• Delete the row (if few missing values)


• Replace with mean/median
• Use “Not Applicable”
• Use predictive estimation

Example: If income is missing for one respondent, replace with average income of the group.

6. Handling Outliers

Outliers are extreme values that distort analysis.


Example: A salary of ₹1 crore in a dataset of ₹10,000–₹80,000.

Strategies:
• Verify if outlier is a mistake
• Remove it if incorrect
• Keep it if logically valid
7. Removing Duplicates

Duplicate responses occur when the respondent submits twice.


Spreadsheets offer:

Excel → Data → Remove Duplicates


Google Sheets → Data Cleanup → Remove duplicates

8. Standardizing Data

Standardization ensures uniformity.

Examples:

• “Delhi” vs “delhi” vs “DELHI” → make them identical


• Dates in same format
• Categories spelled consistently

9. Data Transformation

Transforming data into a usable form.

Examples:
• Converting percentage into decimals
• Merging columns
• Splitting text into different columns
• Creating new variables (age groups, income categories)

10. Final Validation of Data

Before analysis, final checks include:

• Are all required variables present?


• Are formulas working correctly?
• Are there any impossible values?
• Are categories consistent?

This step ensures data is analysis-ready.

Techniques & Tools for Data Cleaning (Spreadsheet)

✔ Functions used:

• TRIM () → Removes extra spaces


• CLEAN () → Removes unwanted characters
• SUBSTITUTE () → Replaces text
• LOWER (), UPPER () → Standardizes case
• IFERROR () → Removes formula errors
• FILTER () → Filter data
• Remove Duplicates → Excel / Sheets

These functions make cleaning fast and accurate.


Advantages of Proper Data Preparation and Cleaning

1. High-quality, reliable data

2. More accurate statistical analysis

3. Better decision-making

4. Reduced errors and bias

5. Improved clarity in reporting

6. Saves time and cost in later stages

Consequences of Poor Data Cleaning

• Incorrect conclusions
• Misleading insights
• Higher errors in analysis
• Loss of credibility
• Wrong business decisions

Clean data is the backbone of good research.

GRAPHICAL PRESENTATION OF DATA

Graphical presentation of data means showing numerical information in the form of charts, graphs,
and diagrams so that it becomes easy to understand, compare, and interpret.

Graphs convert complex tables into a visual form, making patterns, trends, and relationships clear
at a glance.

Importance of Graphical Presentation

• Makes data easy to read and understand


• Shows trends, patterns, and comparisons quickly
• Helps in decision-making
• Highlights key insights without reading long tables
• Useful in reports, presentations, and analysis

Types of Graphical Presentation

Below are the most commonly used graphs in research and spreadsheet-based analysis:

1. Bar Chart (or Bar Graph)

Represents data using rectangular bars of equal width but varying height.

Used for:
• Comparing categories
• Showing differences between groups

Example: Sales of four products.


2. Column Chart

Similar to bar chart but vertical.


Used widely in Excel for business data.

Used for:
• Time-wise comparisons
• Growth trends
• Category analysis

3. Pie Chart

A circle divided into sectors to show proportion or percentage of each category.

Used for:
• Market share
• Budget allocation
• Composition analysis

4. Line Graph

Shows data points connected by lines.

Used for:
• Trends over time
• Stock prices, sales growth, temperature changes

Line graphs clearly show upward or downward movement.

5. Histogram

Represents frequency distribution of continuous data.

Used for:
• Showing distribution of marks
• Age distribution
• Income distribution

Histograms help identify skewness and spread of data.

6. Scatter Plot

Shows relationship between two variables using dots.

Used for:
• Correlation analysis
• Example: study hours vs. marks

If points form a pattern → relationship exists.


7. Area Chart

Similar to line chart but filled with colour under the line.

Used for:
• Showing contribution over time
• Cumulative data

8. Radar/Spider Chart

Shows multi-dimensional data on different axes.

Used for:
• Comparing product performance on several attribute

Advantages of Graphical Presentation

• Easy to understand
• Attractive and visually appealing
• Highlights trends and comparisons
• Saves time
• Useful for presentations and reports

Limitations

• Not suitable for very large datasets


• May mislead if not scaled properly
• Cannot show highly detailed information like tables

FREQUENCY DISTRIBUTION

Meaning

Frequency Distribution is a systematic arrangement of data that shows how many times
(frequency) each value or group of values occurs in a dataset.

Simple Meaning:

It tells how often each value appears.


It converts raw, unorganized data into a structured form.

Types of Frequency Distribution

1. Ungrouped Frequency Distribution

Used when data values are small and not repeated many times.
Example: Marks of 10 students with exact values.

2. Grouped Frequency Distribution

Used when data is large and spread out.


Values are grouped into class intervals like 0–10, 10–20, etc.
Components of Grouped Frequency Table:

• Class intervals
• Frequency
• Class boundaries
• Midpoint (Class mark)
• Cumulative frequency

Importance of Frequency Distribution

• Converts raw data into orderly form


• Helps identify patterns
• Useful for graphs (histogram, frequency polygon)
• Makes comparison easy
• Foundation for further analysis (mean, median, SD)

DESCRIPTIVE STATISTICS

Meaning

Descriptive statistics are statistical tools used to summarize, describe, and present the main
features of a dataset.

✔ Simple Meaning:

They describe what the data looks like its average, spread, and shape.

It does NOT make predictions; it only summarizes.

Components of Descriptive Statistics

Descriptive statistics mainly include:

1. Measures of Central Tendency

These show the center of the data.

(a) Mean (Average)

[
\text{Mean} = \frac{\sum X}{N}
]
Most commonly used.

(b) Median

Middle value when data is arranged in order.


Used when data has extreme values (outliers).

(c) Mode

Value that occurs most frequently.


Useful for categorical data
2. Measures of Dispersion (Variability)

These show how spread out the data is.

(a) Range

[
\text{Range} = \text{Maximum} - \text{Minimum}
]

(b) Variance

Shows how much values deviate from the mean.

(c) Standard Deviation (SD)

Square root of variance.


Higher SD → more variation
Lower SD → data closely groupe

3. Measures of Shape

Describe how data is distributed.

(a) Skewness

Shows whether data is tilted left or right.

(b) Kurtosis

Shows whether the curve is flat or peaked.

4. Frequency Tables and Graphs

Descriptive Statistics also include visual summaries like:


• Histogram
• Bar chart
• Pie chart
• Frequency polygon

Why Descriptive Statistics Are Important?

• Summarize large data simply


• Help understand distribution and patterns
• Show average and variation
• Basis for further inferential analysis
• Useful in decision-making and reporting

STEPS IN HYPOTHESIS TESTING


Hypothesis testing is a systematic statistical procedure used to decide whether the sample data
provides enough evidence to support or reject a claim about a population.
It helps researchers move from assumptions to scientific conclusions.

Hypothesis testing ensures that decisions are based on data, not opinion.
Step 1: Formulation of Hypotheses (H₀ and H₁)

This is the foundation of hypothesis testing.

Null Hypothesis (H₀)

It states that there is no effect, no difference, or no relationship.


It represents the existing belief or status quo.

Examples:
• H₀: Average customer satisfaction is equal in both stores.
• H₀: Advertising does not increase sales.
• H₀: There is no relationship between income and spending.

Alternative Hypothesis (H₁ or Ha)

It states that there is an effect, a difference, or a relationship.

Examples:
• H₁: Satisfaction differs between stores.
• H₁: Advertising increases sales.

H₀ is tested directly; if evidence is strong, we reject it in Favor of H₁.

Why important?
It gives direction and purpose to the entire analysis.

Step 2: Select the Level of Significance (α)

Significance level tells how much risk of error the researcher is willing to take.

Commonly used values:


• 5% (0.05)
• 1% (0.01)

Meaning of α = 0.05

There is a 5% chance of rejecting H₀ even if it is true (Type-I error).

Lower α → more strict testing.

Why important?
It determines the rejection region and controls probability of error.

Step 3: Choose the Appropriate Statistical Test

The researcher selects the test based on:

• The objective of the study


• Type of data (nominal, interval, ratio)
• Sample size
• Number of groups compared
• Known/unknown population variance
Common tests include:

• t-test → Compare means


• Z-test → Large samples
• Chi-square test → Association between variables
• ANOVA → Compare more than two means
• Correlation/Regression → Relationship between variables

Why important?
Wrong test = wrong decision.

Step 4: Setting Decision Rule / Finding Critical Value

Using significance level (α), we find the critical value from statistical tables (t-table, Z-table, chi-
square table).

The critical value determines the rejection region.

Decision rule structure:

• If test statistic > critical value → Reject H₀

• If test statistic ≤ critical value → Do not reject H₀

Why important?
It provides an objective rule for decision-making.

Step 5: Calculate the Test Statistic

Using sample data, the researcher computes the test statistic such as:

For Mean

Z = (x̄ – μ) / (σ / √n)
t = (x̄ – μ) / (s / √n)

For Variance

Chi-square = Σ (O – E) ² / E

Statistical tools like Excel, SPSS, or calculators often compute this value.

Why important?
This gives the numerical evidence needed to evaluate hypotheses.

Step 6: Make Decision (Reject or Do Not Reject H₀)

After calculating the test statistic, we compare it with the critical value.

✔ If calculated value > critical value

→ Strong evidence against H₀


→ Reject H₀

✔ If calculated value ≤ critical value


→ Not enough evidence to reject H₀
→ Do not reject H₀

Important Note:
We never “accept H₀,” we only “fail to reject it.”

Why important?
Decision must be based on statistical evidence, not assumptions.

Step 7: Interpretation of Results

This step explains the meaning of the statistical decision in simple, practical terms.

Example:

If H₀ is rejected:
“There is significant evidence that advertising increases sales.”

If H₀ is not rejected:
“There is insufficient evidence to say advertising increases sales.”

Interpretation bridges the gap between numbers and real-world insight.

Step 8: Reporting the Results

A complete hypothesis test ends with a summary that includes:

• Hypotheses
• Test used
• Level of significance
• Sample data
• Test statistic
• Decision
• Interpretation

This final step ensures clarity and transparency.

Why These Steps Are Necessary?

• Ensures objectivity
• Reduces researcher bias
• Makes results scientifically valid
• Ensures conclusions are trustworthy
• Helps in managerial decision-making

HYPOTHESIS TESTING – PARAMETRIC & NON-PARAMETRIC TEST


Hypothesis testing uses statistical tests to decide whether sample evidence supports or rejects a
claim about a population.
These tests are divided into parametric and non-parametric tests.
1. PARAMETRIC TESTS

Parametric tests use numerical data and assume that the population follows a normal distribution.

Used when:

• Data is quantitative
• Sample size is large or population is normal
• Population parameters (mean, SD) are known/estimated
• Measurement scale: Interval or Ratio

The two important parametric tests are:

A. Z–TEST

Meaning

Z-test is used to test hypothesis when the sample size is large (n > 30) or population variance is
known.
It examines whether the sample mean differs significantly from population mean or whether two
sample means differ.

Uses of Z-Test

1. Test population mean

2. Test difference between two means

3. Test proportion or difference in proportions

Example:

Testing whether average salary of employees is different from the industry average.

Steps (Short)

1. State H₀ and H₁

2. Choose α (0.05)

3. Compute Z value

4. Compare with critical Z (±1.96 at 5%)

5. Decision: Reject or not reject H₀

B. t–TEST

Meaning

t-test is used when sample size is small (n < 30) and population variance is unknown .
It checks whether sample mean differs significantly from population mean or whether two samples
differ.
Types of t- Test

1. One sample t-test → compare sample mean with population mean

2. Independent t-test → compare mean of two groups

3. Paired t-test → same group measured before-after

Uses

• Testing performance before-after training


• Comparing male vs female satisfaction
• Small sample mean testing

Steps

Same as Z-test, but t-values depend on degree of freedom (df = n – 1).

2. NON-PARAMETRIC TESTS

Non-parametric tests are used when data does not follow normal distribution, or when data is
categorical, ordinal, or sample size is very small.

Used when:

• Data is qualitative (Yes/No, Gender, Category)


• No assumption of normality
• Variables are nominal or ordinal
• Frequency counts are used

The most important test is Chi-Square Test.

A. CHI–SQUARE TEST (χ² Test)

Meaning

Chi-square test checks whether there is an association/relationship between two categorical


variables, or whether observed frequency differs from expected frequency.

Types of Chi-Square Tests

1. Test of Independence
→ To check relationship between variables
Example: Gender & Product Preference

2. Goodness of Fit Test


→ To check if observed data fits expected distribution

Uses

• Test whether education level affects purchase choice


• Test if voting preference depends on gender
• Test survey response patt
Characteristics of Chi-Square

• Works on frequencies, not means


• non-parametric, distribution-free
• Requires large sample
• Values are always positive

Steps in Chi-Square Test

1. Set H₀ (no relationship) and H₁ (relationship exists)

2. Prepare contingency table

3. Calculate expected frequencies

4. Apply χ² formula

5. Compare with critical χ² value (df = (r−1) (c−1))

6. Decision: Reject or accept H₀

Key Differences (Parametric vs Non-Parametric Tests)

Basis Parametric (Z, t-test) Non-Parametric (Chi-square)

Data Type Numerical Categorical

Distribution Normal distribution required No assumption

Example Tests Z-test, t-test Chi-square

Measures Used Mean, SD Frequencies

Accuracy More accurate Less accurate but flexible

Sample Size Often large/small depending Often requires large

You might also like