UNIT 4 DATA ANALYSIS USING SPREDSHEET
DATA ANALYSIS USING SPREADSHEET
In modern business research, spreadsheets such as Microsoft Excel and Google Sheets are essential
tools. They help researchers store data, clean it, analyse it, and present it visually. These tools are
powerful yet simple, making them suitable for both beginners and professional analysts.
Spreadsheets act as a bridge between raw data and meaningful insights. They help transform large
and messy data into organized, understandable, and decision-friendly information.
Meaning of Spreadsheet in Data Analysis
A spreadsheet is a grid-like software that organizes data into rows and columns.
It allows a researcher to:
• Enter and store data
• Apply formulas and functions
• Perform calculations
• Summarize data
• Identify patterns
• Create charts and graphs
• Present results clearly
✔ Simple Meaning
A spreadsheet is a tool that helps in analysing data quickly and accurately using formulas, charts,
and tables.
Why Spreadsheet is Important in Data Analysis?
1. Easy to Use
Even non-technical users can perform complex analysis.
2. Organized Data Structure
Rows = observations
Columns = variables
3. Fast Calculations
Formulas can calculate totals, averages, percentages, correlations, etc.
4. Data Cleaning
Helps remove errors, duplicates, missing values, and inconsistencies.
5. Visualization Tools
Charts and graphs make interpretation simple.
6. Cost-Effective
Tools like Google Sheets are completely free.
7. Widely Used
Almost all industries use spreadsheets for data analysis.
Key Steps in Data Analysis Using Spreadsheet.
1. Data Entry and Preparation
Before analysis, data must be entered cleanly into a spreadsheet.
Each row represents one respondent, and each column represents one variable.
Data Preparation Includes:
• Naming variables clearly
• Formatting cells (numbers, dates, text)
• Removing blank rows
• Handling missing data
• Standardizing responses
A well-organized worksheet makes analysis smooth.
2. Data Cleaning
Data often contains errors, duplicates, or inconsistencies.
Spreadsheet tools help fix them.
Techniques:
• Remove Duplicates
• Find and Replace
• TRIM (), CLEAN (), SUBSTITUTE () functions
• Checking outliers
• Converting text to numbers
• Handling missing data (delete, replace, or code)
Clean data = Accurate results.
3. Sorting Data
Sorting helps arrange data in a meaningful order.
Examples:
• Sort employees by salary
• Sort students by marks
• Sort customers by age
Sorting can be ascending or descending.
4. Filtering Data
Filtering shows only the rows that match certain conditions.
Example:
• Customers from Delhi
• Sales > ₹20,000
• Students who scored more than 80%
Filter is very useful for focusing on specific data segments.
5. Using Formulas and Functions for Analysis
Spreadsheets offer hundreds of powerful functions for calculations.
A. Basic Functions
Used for simple calculations:
• SUM () → Total
• AVERAGE () → Mean
• COUNT () → Number of entries
• MAX ()/MIN () → Highest/Lowest value
B. Conditional Functions
• IF () → Logical test
• SUMIF () → Sum based on condition
• COUNTIF () → Count based on condition
• AVERAGEIF () → Average based on condition
Example:
=SUMIF (B2:B20, ">50", C2:C20)
(Adds sales for quantities greater than 50)
C. Statistical Functions
• STDEV () → Standard deviation
• VAR () → Variance
• CORREL () → Correlation between two variables
• LINEST () → Regression
These help measure variability, relationship, and prediction.
6. Pivot Tables — The Heart of Analysis
Pivot tables summarize large data instantly.
Uses:
• Total sales by region
• Average marks by class
• Count of customers by gender
• Revenue by product category
Pivot tables convert raw data into meaningful tables automatically.
They are one of the most powerful tools in Excel/Sheets.
7. Data Visualization (Charts & Graphs)
Charts help interpret data visually.
Common Charts:
• Bar Chart → Comparisons
• Pie Chart → Proportions
• Line Chart → Trends
• Histogram → Distribution
• Scatter Plot → Relationships
Charts help non-technical readers understand findings easily.
8. Data Interpretation
After performing calculations, summarizing data, and visualizing results, the final step is
interpreting what the data means.
Example Interpretation:
• If correlation between study time and marks = 0.82 → Strong positive relationship
• If sales increased in line chart → upward trend
• Pivot table shows highest sales from South Region → managerial insight
Interpretation turns numbers into knowledge.
Advantages of Using Spreadsheet for Data Analysis
1. Easy to use and learn
2. Fast calculations and automation
3. Ideal for small and medium datasets
4. Excellent visualization tools
5. Helpful for report writing
6. Accessible and low-cost
7. Supports formulas, tables, and charts
Limitations of Spreadsheet
1. Not suitable for very large datasets
2. Risk of manual errors
3. Limited advanced statistical analysis
4. Slow performance when data is heavy
5. Less secure than advanced statistical software
DATA PREPARATION AND CLEANING
Data analysis begins only after the data has been properly prepared and cleaned.
Raw data collected from surveys, questionnaires, interviews, or spreadsheets is usually messy,
incomplete, or inaccurate.
If the data is not cleaned, the analysis will be misleading and incorrect.
Therefore, Data Preparation and Cleaning is one of the most crucial stages in the entire research
process.
Meaning of Data Preparation
Data preparation refers to the process of organizing, structuring, transforming, and preparing raw
data so that it becomes suitable for analysis.
✔ Simple Meaning:
Data preparation means getting the data ready for analysis by arranging it, correcting it, and
ensuring it is usable.
It acts as a bridge between data collection and data analysis.
Meaning of Data Cleaning
Data cleaning is the process of removing errors, fixing inconsistencies, filling missing values,
correcting incorrect entries, and removing duplicates to ensure the data is accurate and valid.
✔ Simple Meaning:
Data cleaning means correcting wrong data, removing unwanted data, and improving data quality.
Clean data = Correct results
Dirty data = Misleading results
Importance of Data Preparation and Cleaning
1. Ensures accuracy
Errors in raw data can lead to wrong conclusions.
2. Improves reliability
Clean data increases trust in results.
3. Reduces noise
Removes irrelevant or duplicate information.
4. Enables correct analysis
Many statistical tools require structured data.
5. Prevents costly mistakes
Incorrect insights can cause bad business decisions.
6. Saves time during analysis
Clean data makes analysis faster and smoother.
Steps in Data Preparation and Cleaning
Data preparation consists of several systematic steps:
1. Data Editing
Data editing means checking the collected data for completeness, consistency, and accuracy.
Tasks in Data Editing:
• Ensuring there are no skipped questions
• Checking that responses are logical
• Verifying that entries are within acceptable range
• Identifying contradictions
Example:
A 12-year-old respondent cannot have 15 years of work experience.
Editing is the first check of data correctness.
2. Data Coding
Data coding involves converting responses into numerical values so that they can be analysed
easily.
Example:
Gender:
Male = 1
Female = 2
Satisfaction (Likert Scale):
Strongly Agree = 5
Agree = 4
Neutral = 3
Disagree = 2
Strongly Disagree = 1
Coding helps in using formulas, statistical tools, and pivot tables.
3. Data Classification
Classification means arranging data into meaningful categories, groups, or classes.
Types:
• Chronological (time-based)
• Geographical
• Categorical (gender, education, income group)
• Frequency-based (how many people selected option X)
Classification helps simplify large data into understandable segments.
4. Data Tabulation
Tabulation means organizing data into tables for easy understanding and comparison.
Types:
• Simple tables
• Cross-tabulation (two variables)
Example: Gender vs. Satisfaction level.
Tabulation is essential for analysis, pivot tables, and report writing.
5. Handling Missing Data
Raw data often has missing values.
They must be handled carefully, using techniques like:
Techniques to handle missing data:
• Delete the row (if few missing values)
• Replace with mean/median
• Use “Not Applicable”
• Use predictive estimation
Example: If income is missing for one respondent, replace with average income of the group.
6. Handling Outliers
Outliers are extreme values that distort analysis.
Example: A salary of ₹1 crore in a dataset of ₹10,000–₹80,000.
Strategies:
• Verify if outlier is a mistake
• Remove it if incorrect
• Keep it if logically valid
7. Removing Duplicates
Duplicate responses occur when the respondent submits twice.
Spreadsheets offer:
Excel → Data → Remove Duplicates
Google Sheets → Data Cleanup → Remove duplicates
8. Standardizing Data
Standardization ensures uniformity.
Examples:
• “Delhi” vs “delhi” vs “DELHI” → make them identical
• Dates in same format
• Categories spelled consistently
9. Data Transformation
Transforming data into a usable form.
Examples:
• Converting percentage into decimals
• Merging columns
• Splitting text into different columns
• Creating new variables (age groups, income categories)
10. Final Validation of Data
Before analysis, final checks include:
• Are all required variables present?
• Are formulas working correctly?
• Are there any impossible values?
• Are categories consistent?
This step ensures data is analysis-ready.
Techniques & Tools for Data Cleaning (Spreadsheet)
✔ Functions used:
• TRIM () → Removes extra spaces
• CLEAN () → Removes unwanted characters
• SUBSTITUTE () → Replaces text
• LOWER (), UPPER () → Standardizes case
• IFERROR () → Removes formula errors
• FILTER () → Filter data
• Remove Duplicates → Excel / Sheets
These functions make cleaning fast and accurate.
Advantages of Proper Data Preparation and Cleaning
1. High-quality, reliable data
2. More accurate statistical analysis
3. Better decision-making
4. Reduced errors and bias
5. Improved clarity in reporting
6. Saves time and cost in later stages
Consequences of Poor Data Cleaning
• Incorrect conclusions
• Misleading insights
• Higher errors in analysis
• Loss of credibility
• Wrong business decisions
Clean data is the backbone of good research.
GRAPHICAL PRESENTATION OF DATA
Graphical presentation of data means showing numerical information in the form of charts, graphs,
and diagrams so that it becomes easy to understand, compare, and interpret.
Graphs convert complex tables into a visual form, making patterns, trends, and relationships clear
at a glance.
Importance of Graphical Presentation
• Makes data easy to read and understand
• Shows trends, patterns, and comparisons quickly
• Helps in decision-making
• Highlights key insights without reading long tables
• Useful in reports, presentations, and analysis
Types of Graphical Presentation
Below are the most commonly used graphs in research and spreadsheet-based analysis:
1. Bar Chart (or Bar Graph)
Represents data using rectangular bars of equal width but varying height.
Used for:
• Comparing categories
• Showing differences between groups
Example: Sales of four products.
2. Column Chart
Similar to bar chart but vertical.
Used widely in Excel for business data.
Used for:
• Time-wise comparisons
• Growth trends
• Category analysis
3. Pie Chart
A circle divided into sectors to show proportion or percentage of each category.
Used for:
• Market share
• Budget allocation
• Composition analysis
4. Line Graph
Shows data points connected by lines.
Used for:
• Trends over time
• Stock prices, sales growth, temperature changes
Line graphs clearly show upward or downward movement.
5. Histogram
Represents frequency distribution of continuous data.
Used for:
• Showing distribution of marks
• Age distribution
• Income distribution
Histograms help identify skewness and spread of data.
6. Scatter Plot
Shows relationship between two variables using dots.
Used for:
• Correlation analysis
• Example: study hours vs. marks
If points form a pattern → relationship exists.
7. Area Chart
Similar to line chart but filled with colour under the line.
Used for:
• Showing contribution over time
• Cumulative data
8. Radar/Spider Chart
Shows multi-dimensional data on different axes.
Used for:
• Comparing product performance on several attribute
Advantages of Graphical Presentation
• Easy to understand
• Attractive and visually appealing
• Highlights trends and comparisons
• Saves time
• Useful for presentations and reports
Limitations
• Not suitable for very large datasets
• May mislead if not scaled properly
• Cannot show highly detailed information like tables
FREQUENCY DISTRIBUTION
Meaning
Frequency Distribution is a systematic arrangement of data that shows how many times
(frequency) each value or group of values occurs in a dataset.
Simple Meaning:
It tells how often each value appears.
It converts raw, unorganized data into a structured form.
Types of Frequency Distribution
1. Ungrouped Frequency Distribution
Used when data values are small and not repeated many times.
Example: Marks of 10 students with exact values.
2. Grouped Frequency Distribution
Used when data is large and spread out.
Values are grouped into class intervals like 0–10, 10–20, etc.
Components of Grouped Frequency Table:
• Class intervals
• Frequency
• Class boundaries
• Midpoint (Class mark)
• Cumulative frequency
Importance of Frequency Distribution
• Converts raw data into orderly form
• Helps identify patterns
• Useful for graphs (histogram, frequency polygon)
• Makes comparison easy
• Foundation for further analysis (mean, median, SD)
DESCRIPTIVE STATISTICS
Meaning
Descriptive statistics are statistical tools used to summarize, describe, and present the main
features of a dataset.
✔ Simple Meaning:
They describe what the data looks like its average, spread, and shape.
It does NOT make predictions; it only summarizes.
Components of Descriptive Statistics
Descriptive statistics mainly include:
1. Measures of Central Tendency
These show the center of the data.
(a) Mean (Average)
[
\text{Mean} = \frac{\sum X}{N}
]
Most commonly used.
(b) Median
Middle value when data is arranged in order.
Used when data has extreme values (outliers).
(c) Mode
Value that occurs most frequently.
Useful for categorical data
2. Measures of Dispersion (Variability)
These show how spread out the data is.
(a) Range
[
\text{Range} = \text{Maximum} - \text{Minimum}
]
(b) Variance
Shows how much values deviate from the mean.
(c) Standard Deviation (SD)
Square root of variance.
Higher SD → more variation
Lower SD → data closely groupe
3. Measures of Shape
Describe how data is distributed.
(a) Skewness
Shows whether data is tilted left or right.
(b) Kurtosis
Shows whether the curve is flat or peaked.
4. Frequency Tables and Graphs
Descriptive Statistics also include visual summaries like:
• Histogram
• Bar chart
• Pie chart
• Frequency polygon
Why Descriptive Statistics Are Important?
• Summarize large data simply
• Help understand distribution and patterns
• Show average and variation
• Basis for further inferential analysis
• Useful in decision-making and reporting
STEPS IN HYPOTHESIS TESTING
Hypothesis testing is a systematic statistical procedure used to decide whether the sample data
provides enough evidence to support or reject a claim about a population.
It helps researchers move from assumptions to scientific conclusions.
Hypothesis testing ensures that decisions are based on data, not opinion.
Step 1: Formulation of Hypotheses (H₀ and H₁)
This is the foundation of hypothesis testing.
Null Hypothesis (H₀)
It states that there is no effect, no difference, or no relationship.
It represents the existing belief or status quo.
Examples:
• H₀: Average customer satisfaction is equal in both stores.
• H₀: Advertising does not increase sales.
• H₀: There is no relationship between income and spending.
Alternative Hypothesis (H₁ or Ha)
It states that there is an effect, a difference, or a relationship.
Examples:
• H₁: Satisfaction differs between stores.
• H₁: Advertising increases sales.
H₀ is tested directly; if evidence is strong, we reject it in Favor of H₁.
Why important?
It gives direction and purpose to the entire analysis.
Step 2: Select the Level of Significance (α)
Significance level tells how much risk of error the researcher is willing to take.
Commonly used values:
• 5% (0.05)
• 1% (0.01)
Meaning of α = 0.05
There is a 5% chance of rejecting H₀ even if it is true (Type-I error).
Lower α → more strict testing.
Why important?
It determines the rejection region and controls probability of error.
Step 3: Choose the Appropriate Statistical Test
The researcher selects the test based on:
• The objective of the study
• Type of data (nominal, interval, ratio)
• Sample size
• Number of groups compared
• Known/unknown population variance
Common tests include:
• t-test → Compare means
• Z-test → Large samples
• Chi-square test → Association between variables
• ANOVA → Compare more than two means
• Correlation/Regression → Relationship between variables
Why important?
Wrong test = wrong decision.
Step 4: Setting Decision Rule / Finding Critical Value
Using significance level (α), we find the critical value from statistical tables (t-table, Z-table, chi-
square table).
The critical value determines the rejection region.
Decision rule structure:
• If test statistic > critical value → Reject H₀
• If test statistic ≤ critical value → Do not reject H₀
Why important?
It provides an objective rule for decision-making.
Step 5: Calculate the Test Statistic
Using sample data, the researcher computes the test statistic such as:
For Mean
Z = (x̄ – μ) / (σ / √n)
t = (x̄ – μ) / (s / √n)
For Variance
Chi-square = Σ (O – E) ² / E
Statistical tools like Excel, SPSS, or calculators often compute this value.
Why important?
This gives the numerical evidence needed to evaluate hypotheses.
Step 6: Make Decision (Reject or Do Not Reject H₀)
After calculating the test statistic, we compare it with the critical value.
✔ If calculated value > critical value
→ Strong evidence against H₀
→ Reject H₀
✔ If calculated value ≤ critical value
→ Not enough evidence to reject H₀
→ Do not reject H₀
Important Note:
We never “accept H₀,” we only “fail to reject it.”
Why important?
Decision must be based on statistical evidence, not assumptions.
Step 7: Interpretation of Results
This step explains the meaning of the statistical decision in simple, practical terms.
Example:
If H₀ is rejected:
“There is significant evidence that advertising increases sales.”
If H₀ is not rejected:
“There is insufficient evidence to say advertising increases sales.”
Interpretation bridges the gap between numbers and real-world insight.
Step 8: Reporting the Results
A complete hypothesis test ends with a summary that includes:
• Hypotheses
• Test used
• Level of significance
• Sample data
• Test statistic
• Decision
• Interpretation
This final step ensures clarity and transparency.
Why These Steps Are Necessary?
• Ensures objectivity
• Reduces researcher bias
• Makes results scientifically valid
• Ensures conclusions are trustworthy
• Helps in managerial decision-making
HYPOTHESIS TESTING – PARAMETRIC & NON-PARAMETRIC TEST
Hypothesis testing uses statistical tests to decide whether sample evidence supports or rejects a
claim about a population.
These tests are divided into parametric and non-parametric tests.
1. PARAMETRIC TESTS
Parametric tests use numerical data and assume that the population follows a normal distribution.
Used when:
• Data is quantitative
• Sample size is large or population is normal
• Population parameters (mean, SD) are known/estimated
• Measurement scale: Interval or Ratio
The two important parametric tests are:
A. Z–TEST
Meaning
Z-test is used to test hypothesis when the sample size is large (n > 30) or population variance is
known.
It examines whether the sample mean differs significantly from population mean or whether two
sample means differ.
Uses of Z-Test
1. Test population mean
2. Test difference between two means
3. Test proportion or difference in proportions
Example:
Testing whether average salary of employees is different from the industry average.
Steps (Short)
1. State H₀ and H₁
2. Choose α (0.05)
3. Compute Z value
4. Compare with critical Z (±1.96 at 5%)
5. Decision: Reject or not reject H₀
B. t–TEST
Meaning
t-test is used when sample size is small (n < 30) and population variance is unknown .
It checks whether sample mean differs significantly from population mean or whether two samples
differ.
Types of t- Test
1. One sample t-test → compare sample mean with population mean
2. Independent t-test → compare mean of two groups
3. Paired t-test → same group measured before-after
Uses
• Testing performance before-after training
• Comparing male vs female satisfaction
• Small sample mean testing
Steps
Same as Z-test, but t-values depend on degree of freedom (df = n – 1).
2. NON-PARAMETRIC TESTS
Non-parametric tests are used when data does not follow normal distribution, or when data is
categorical, ordinal, or sample size is very small.
Used when:
• Data is qualitative (Yes/No, Gender, Category)
• No assumption of normality
• Variables are nominal or ordinal
• Frequency counts are used
The most important test is Chi-Square Test.
A. CHI–SQUARE TEST (χ² Test)
Meaning
Chi-square test checks whether there is an association/relationship between two categorical
variables, or whether observed frequency differs from expected frequency.
Types of Chi-Square Tests
1. Test of Independence
→ To check relationship between variables
Example: Gender & Product Preference
2. Goodness of Fit Test
→ To check if observed data fits expected distribution
Uses
• Test whether education level affects purchase choice
• Test if voting preference depends on gender
• Test survey response patt
Characteristics of Chi-Square
• Works on frequencies, not means
• non-parametric, distribution-free
• Requires large sample
• Values are always positive
Steps in Chi-Square Test
1. Set H₀ (no relationship) and H₁ (relationship exists)
2. Prepare contingency table
3. Calculate expected frequencies
4. Apply χ² formula
5. Compare with critical χ² value (df = (r−1) (c−1))
6. Decision: Reject or accept H₀
Key Differences (Parametric vs Non-Parametric Tests)
Basis Parametric (Z, t-test) Non-Parametric (Chi-square)
Data Type Numerical Categorical
Distribution Normal distribution required No assumption
Example Tests Z-test, t-test Chi-square
Measures Used Mean, SD Frequencies
Accuracy More accurate Less accurate but flexible
Sample Size Often large/small depending Often requires large