0% found this document useful (0 votes)

17 views20 pages

Understanding Inferential Statistics Basics

Uploaded by

Yogesh V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views20 pages

Understanding Inferential Statistics Basics

Uploaded by

Yogesh V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module 5

Introduction to Inferential Statistics

What is Inferential Statistics?

Inferential statistics is a branch of statistics that helps us make conclusions or generalizations
about a population by analyzing data collected from a sample. Since it is often impractical,
costly, or time-consuming to study an entire population, inferential statistics allows decision-
makers to rely on sample data to draw valid and reliable inferences.
In simple terms, inferential statistics answers the question:
“What can we say about the whole group based on a small part of it?”

Key Purposes of Inferential Statistics

1. Test Hypotheses
Inferential statistics is used to test assumptions or claims about a population.
• A hypothesis is a statement that can be tested using data.
• Statistical tests help determine whether observed differences or effects are real or
occurred by chance.
Example:
Testing whether a new pricing strategy leads to higher average sales compared to the old
strategy.

2. Compare Groups
Inferential statistics allows comparison between two or more groups to identify significant
differences.
• Common comparisons include departments, regions, time periods, or customer
segments.
• Tests such as t-tests and ANOVA are used for group comparisons.
Example:
Comparing average productivity levels of employees across different departments.

3. Understand Relationships
It helps analyze relationships between variables to see how one variable is related to another.
• This includes identifying the direction and strength of relationships.
• Techniques such as correlation and regression are commonly used.
Example:
Examining whether increased advertising expenditure is associated with higher sales.

4. Support Managerial Decisions Using Data

Inferential statistics plays a crucial role in evidence-based decision-making.
• Reduces reliance on intuition or assumptions.
• Provides statistical justification for strategic and operational decisions.
• Helps assess risks and predict outcomes.
Example:
Deciding whether to expand into a new market based on customer survey data.

Examples
1. Does a new marketing strategy increase sales?
Sample sales data before and after implementing the strategy is analyzed to determine
whether the increase is statistically significant.
2. Is employee productivity different across departments?
Productivity data from selected employees in each department is compared to identify
meaningful differences.
3. Do customer satisfaction scores vary by region?
Survey responses from different regions are analyzed to understand regional
variations and improve service quality.

Why Use Python for Inferential Statistics?

Inferential statistics involves drawing conclusions about a population based on sample data.
This process requires multiple calculations, assumption checks, and interpretation of results.
Python is a powerful tool that simplifies these tasks and makes statistical analysis more
accurate, efficient, and accessible.

Advantages of Python
1. Free and Open Source
Python is completely free to download and use. There are no licensing or subscription costs,
unlike many commercial statistical software packages.
• Students can install Python on personal laptops without cost.
• Colleges and institutions do not need to purchase expensive licenses.
• A large global community continuously improves Python and its libraries.
This makes Python an economical and sustainable choice for academic and professional use.

2. Handles Large Datasets Easily

Python is designed to work efficiently with large volumes of data.
• It can analyze thousands or even millions of observations quickly.
• Libraries such as Pandas and NumPy allow fast data manipulation and analysis.
• Tasks like filtering data, grouping variables, and computing statistics can be done in a
few lines of code.
This capability is especially useful in real-world scenarios such as market research, finance,
operations, and business analytics.

3. Reduces Calculation Errors

Manual calculations and spreadsheet-based analysis are prone to human errors, especially
when datasets are large or tests are complex.
• Python performs calculations automatically and consistently.
• Built-in statistical functions reduce the risk of formula mistakes.
• The same code can be reused, ensuring repeatability and accuracy.
As a result, Python improves the reliability and credibility of statistical results.

4. Widely Used in Industry and Academia

Python is one of the most widely adopted tools for data analysis and research.
• Used by companies in finance, marketing, healthcare, technology, and consulting.
• Commonly used in universities for teaching statistics, data science, and research
methods.
• Supports advanced statistical tests, data visualization, and reporting.
Parametric Tests
Parametric tests are a category of inferential statistical tests used to make conclusions about a
population based on sample data. These tests rely on certain assumptions about the
underlying data and are widely used in business, management, and social science research.

What Are Parametric Tests?

Parametric tests are statistical techniques that:
1. Assume Data Follows a Normal Distribution
• Parametric tests generally assume that the data is normally distributed (bell-shaped
curve).
• This assumption ensures accurate estimation of population parameters.
• Normality is especially important for small sample sizes.
Example:
Employee performance scores that cluster around an average value with fewer extreme
scores.

2. Use Means and Standard Deviations

• These tests focus on population parameters, mainly:
o Mean (average)
o Standard deviation (variability)
• Comparisons between groups are typically based on differences in means.
Example:
Comparing average sales revenue across different teams.

3. Are Powerful When Assumptions Are Met

• “Power” refers to the ability of a test to correctly detect a true effect.
• When assumptions such as normality and equal variances are satisfied, parametric
tests:
o Provide more precise results
o Require smaller sample sizes
o Are more sensitive to differences between groups
This makes parametric tests highly effective when data conditions are appropriate.
Assumption Testing
Before applying parametric tests such as t-tests or ANOVA, it is important to verify whether
the data satisfies certain basic conditions called assumptions. These assumptions ensure that
the statistical results are valid and reliable. If assumptions are violated, the conclusions drawn
from the analysis may be misleading.

Key Assumptions
1. Normality
Meaning:
The data should follow a roughly bell-shaped (normal) distribution.
Why it matters:
Parametric tests rely on the mean and standard deviation, which are meaningful when data is
normally distributed.
Example:
Exam scores of students where most scores are around the average, with fewer very high or
very low scores.

2. Independence
Meaning:
Each observation in the dataset should be independent of the others.
Why it matters:
One observation should not influence another, as dependence can distort test results.
Example:
Sales figures from different employees, where each employee’s performance is measured
separately.

3. Homogeneity of Variance
Meaning:
The variability (spread) of data should be approximately equal across groups.
Why it matters:
Parametric tests compare group means assuming similar variability.
Example:
Sales figures across regions where each region shows similar variation in sales amounts.

Code:
from [Link] import shapiro
shapiro(data)

(a) Normality Test – Shapiro-Wilk Test

Concept
The Shapiro-Wilk test is used to check whether a dataset follows a normal distribution. It is
especially suitable for small to medium sample sizes.
• It compares the sample distribution with a normal distribution.
• The output includes a p-value, which guides the decision.

Decision Rule
• p-value > 0.05 → Data is assumed to be normally distributed
• p-value ≤ 0.05 → Data is not normally distributed

Interpretation Example
• If exam score data produces a p-value of 0.12, normality is assumed.
• If sales data produces a p-value of 0.02, normality is not assumed, and a non-
parametric test may be considered.
from [Link] import shapiro
shapiro(data)
Equality of Variance – Levene’s Test
Levene’s Test is used to check whether the variances (spread of data) of two or more
groups are equal. This assumption is known as homogeneity of variance and is a key
requirement for many parametric tests such as the independent t-test and ANOVA.
In simple terms, Levene’s Test answers the question:
“Do different groups show similar variability in their data?”

Why Equality of Variance Matters

Parametric tests compare group means assuming that each group has a similar level of
variability.
• If variances are equal, standard parametric tests can be safely applied.
• If variances are unequal, test results may be inaccurate or misleading.
Example:
Comparing sales performance of two regions where one region shows very stable sales and
the other shows highly fluctuating sales.

Using Levene’s Test in Python

Python provides Levene’s Test through the SciPy library.
from [Link] import levene
levene(group1, group2)
 group1, group2: Data values from two different groups
 The function returns:
• Test statistic
• p-value
Decision Rule
• p-value > 0.05 → Variances are considered equal
(Homogeneity of variance assumption is satisfied)
• p-value ≤ 0.05 → Variances are not equal
(Assumption is violated)
Interpretation Example
• p-value = 0.18
→ Variances are equal
→ Independent t-test or ANOVA can be used safely
• p-value = 0.03
→ Variances are unequal
→ Consider alternatives such as Welch’s t-test or non-parametric tests

Example
Suppose a company wants to compare employee productivity between Department A and
Department B.
• Before performing an independent t-test on mean productivity:
o Levene’s Test is conducted
o If variances are equal, proceed with the standard t-test
o If not, adjust the analysis accordingly
One-Sample t-Test
The one-sample t-test is a parametric statistical test used to compare the mean of a sample
with a known or assumed population mean.
It helps determine whether the observed sample average is significantly different from a
benchmark value, standard, or target.
In simple terms, the test answers the question:
“Is the sample average meaningfully different from what we expected?”

When Is a One-Sample t-Test Used?

• When data is numerical (e.g., sales, marks, productivity)
• When population standard deviation is unknown
• When the sample size is small or moderate
• When data is approximately normally distributed

Example
Scenario:
A retail store expects its average daily sales to be ₹50,000 based on past performance or
management targets.
• Sales data is collected for 30 days
• The sample mean is calculated from this data
Business Question:
Is the actual average daily sales different from the expected ₹50,000?
The one-sample t-test helps management decide whether:
• The store is performing as expected, or
• There is a significant increase or decrease in sales that needs attention

Hypotheses
In hypothesis testing, two competing statements are defined:
Null Hypothesis (H₀)
There is no significant difference between the sample mean and the population mean.
𝐻0 :Sample mean = Population mean

This represents the current belief or standard.

Alternative Hypothesis (H₁)

There is a significant difference between the sample mean and the population mean.
𝐻1 :Sample mean ≠ Population mean

This suggests that the observed difference is not due to random chance.

Interpretation (Conceptual)
• If the p-value > 0.05
→ Fail to reject H₀
→ Actual sales are not significantly different from expected sales
• If the p-value ≤ 0.05
→ Reject H₀
→ Actual sales are significantly different from expected sales

from [Link] import ttest_1samp

ttest_1samp(data, 50000)
Independent Samples t-Test
The independent samples t-test (also called the two-sample t-test) is a parametric test used
to compare the means of two independent (unrelated) groups.
The key objective is to determine whether the difference between the two group means is
statistically significant or likely due to random variation.
In simple terms, it answers the question:
“Are the average values of two different groups meaningfully different?”

When Is an Independent t-Test Used?

• When comparing two separate groups
• When data is numerical
• When observations in one group do not influence the other
• When normality and equality of variance assumptions are reasonably met

Example 1: Sales Performance

• Group A: Male employees
• Group B: Female employees
Question:
Is there a significant difference in average sales performance between male and female
employees?

Example 2: Customer Satisfaction

• Group A: Urban customers
• Group B: Rural customers
Question:
Do urban and rural customers differ significantly in their average satisfaction scores?

Hypotheses
As with all inferential tests, two hypotheses are defined:
Null Hypothesis (H₀)
There is no significant difference between the means of the two groups.
𝐻0 :Mean of Group A = Mean of Group B
Alternative Hypothesis (H₁)
There is a significant difference between the means of the two groups.
𝐻1 :Mean of Group A ≠ Mean of Group B

This is a two-tailed test, as the difference can be in either direction.

Python Code (Simple Example)

Below is a basic Python example using the SciPy library to perform an independent samples
t-test.

Interpretation of Output
• p-value > 0.05
→ Fail to reject H₀
→ No significant difference between group means
• p-value ≤ 0.05
→ Reject H₀
→ Significant difference exists between group means
Paired Samples t-Test
The paired samples t-test (also called the dependent t-test) is a parametric test used when
the same subjects are measured twice under two different conditions or at two different
time points.
Instead of comparing two separate groups, this test compares the difference within the same
group.
In simple terms, it answers the question:
“Has there been a meaningful change after an intervention or over time?”

When Is a Paired t-Test Used?

• When the same individuals, products, or units are measured twice
• When data is numerical and approximately normally distributed
• When observations are naturally paired (before–after, pre–post)

Example 1: Employee Productivity

• Measurement 1: Productivity before training
• Measurement 2: Productivity after training
Question:
Did the training program significantly improve employee productivity?

Example 2: Customer Satisfaction

• Measurement 1: Satisfaction score before service improvement
• Measurement 2: Satisfaction score after service improvement
Question:
Did the service improvement initiative lead to a significant change in customer satisfaction?

Hypotheses
The paired t-test focuses on the difference between paired observations.
Null Hypothesis (H₀)
There is no significant change between the two measurements.
𝐻0 :Mean difference = 0

This implies that the intervention or time change had no effect.

Alternative Hypothesis (H₁)

There is a significant change between the two measurements.
𝐻1 :Mean difference ≠ 0

This is a two-tailed test, as the change could be an increase or a decrease.

Interpretation (Conceptual)
• p-value > 0.05
→ Fail to reject H₀
→ No significant change observed
• p-value ≤ 0.05
→ Reject H₀
→ Significant change occurred after the intervention

Python Code

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a parametric statistical technique used to

compare the means of more than two groups at the same time.

Instead of conducting multiple t-tests (which increases the risk of error), ANOVA
tests all group means together using a single statistical procedure.

In simple terms, ANOVA answers the question:

“Do all groups perform similarly, or is at least one group significantly different?”

Why ANOVA Is Needed

• Comparing more than two groups using t-tests increases Type I error (false
positives).
• ANOVA controls this error and provides a more reliable conclusion.
• It analyzes variation:
o Between groups (differences among group means)
o Within groups (variation inside each group)
Example 1: Sales Across Regions

• Groups: North, South, and West regions

• Data: Monthly sales figures from each region

Question:
Is the average sales performance the same across all regions, or does at least one
region perform differently?

Example 2: Customer Satisfaction Across Branches

• Groups: Branch A, Branch B, Branch C, Branch D

• Data: Customer satisfaction survey scores

Question:
Do customer satisfaction levels differ significantly among branches?

Hypotheses

ANOVA is based on the following hypotheses:

Null Hypothesis (H₀)

There is no significant difference among group means.

𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = ⋯ = 𝜇𝑘

This means all groups have the same average value.

Alternative Hypothesis (H₁)

At least one group mean is different from the others.

𝐻1 :At least one 𝜇 is different

ANOVA does not identify which group is different; it only indicates that a
difference exists.
Interpretation (Conceptual)

• p-value > 0.05

→ Fail to reject H₀
→ No significant difference among group means
• p-value ≤ 0.05
→ Reject H₀
→ At least one group differs significantly

Post Hoc Testing

Post hoc tests are conducted after an ANOVA test shows a statistically significant
result.

ANOVA answers the question:

“Is there at least one difference among group means?”

However, ANOVA does not identify which specific groups differ from each
other. This is where post hoc testing becomes necessary.

In simple terms, post hoc tests answer the question:

“Exactly which groups are different?”

Why Not Multiple t-Tests?

• Performing many t-tests increases the chance of Type I error (false

positives).
• Post hoc tests control this error while making multiple comparisons.
• They provide reliable pairwise comparisons between groups.

Common Post Hoc Test – Tukey’s HSD

What Is Tukey’s HSD?

Tukey’s Honest Significant Difference (HSD) test is one of the most widely used
post hoc tests.

• Compares all possible pairs of group means

• Controls overall error rate
• Works best when group sizes are equal or nearly equal

When Is Tukey’s HSD Used?

• After a significant ANOVA result

• When the assumption of equal variances is met
• When the goal is to compare every group with every other group

Example

Suppose ANOVA shows a significant difference in customer satisfaction across

four branches.

Tukey’s HSD can determine:

• Whether Branch A differs from Branch B

• Whether Branch C differs from Branch D
• Which branches perform significantly better or worse

Interpretation (Conceptual)

Tukey’s HSD provides:

• Mean differences between group pairs

• Confidence intervals
• Adjusted p-values

Decision Rule:

• p-value ≤ 0.05 → Significant difference between the pair

• p-value > 0.05 → No significant difference between the pair
Interpretation
• Identifies which specific groups differ
• Useful for managerial decision-making
Two-Way ANOVA
10.1 Concept
Two-Way Analysis of Variance (Two-Way ANOVA) is a parametric statistical technique
used to analyze the effect of two independent variables (factors) on a single dependent
variable.
Unlike one-way ANOVA, which studies only one factor at a time, two-way ANOVA allows
researchers to:
• Examine the individual impact of each factor
• Understand whether the factors interact with each other
In simple terms, it answers the question:
“How do two different factors, separately and together, influence an outcome?”

Key Components of Two-Way ANOVA

• Independent Variables (Factors): Categorical variables (e.g., training type, region)
• Dependent Variable: Numerical outcome (e.g., performance score, sales)
• Interaction Effect: Whether the effect of one factor depends on the level of the other
factor

Example 1: Training Type and Experience Level

• Factor A: Training Type (Online, Classroom)
• Factor B: Experience Level (Junior, Senior)
• Dependent Variable: Employee performance score
Business Question:
Does training type affect performance?
Does experience level affect performance?
Does the impact of training type differ between junior and senior employees?
Example 2: Region and Promotion Type
• Factor A: Region (North, South)
• Factor B: Promotion Type (Discount, Cashback)
• Dependent Variable: Sales revenue
Business Question:
Do sales vary by region?
Does promotion type influence sales?
Does the effectiveness of promotion type change across regions?

Outputs of Two-Way ANOVA

Two-way ANOVA produces three key results:

1. Main Effect of Factor A

• Examines whether different levels of Factor A have a significant impact on the
dependent variable.
• Ignores the influence of Factor B while testing this effect.
Example:
Does training type significantly affect employee performance, regardless of experience level?

2. Main Effect of Factor B

• Examines whether different levels of Factor B influence the dependent variable.
• Ignores the influence of Factor A while testing this effect.
Example:
Does experience level significantly affect performance, regardless of training type?

3. Interaction Effect
• Examines whether the effect of one factor depends on the level of the other factor.
• This is often the most insightful result in managerial analysis.
Example:
Is classroom training more effective for junior employees but not for senior employees?

Interpretation (Conceptual)
• Significant main effects indicate that a factor independently influences the outcome.
• A significant interaction effect indicates that the combined influence of factors is
important and should be interpreted together.
Plotting Interactions
Interaction plots are graphical tools used to visually represent the interaction effect
between two independent variables on a dependent variable. They are commonly used
alongside Two-Way ANOVA to better understand complex relationships in data.
In simple terms, interaction plots answer the question:
“Does the effect of one factor change depending on the level of another factor?”

Purpose of Interaction Plots

1. Visualize How One Factor Changes the Effect of Another
• Interaction plots show multiple lines on a graph, each representing a level of one
factor.
• The x-axis usually represents one factor, while the lines represent levels of the second
factor.
• The y-axis represents the dependent variable.
Interpretation:
• Parallel lines: Little or no interaction between factors
• Non-parallel or crossing lines: Presence of interaction effect

2. Simplify Statistical Interpretation

• Numerical ANOVA tables can be difficult for non-technical audiences.
• Interaction plots convert statistical output into intuitive visual insights.
• Managers can quickly identify patterns and relationships.

Business Example
In a study analyzing the effect of training type and experience level on employee
performance:
• X-axis: Training type
• Lines: Experience level (Junior, Senior)
• Y-axis: Performance score
If the performance improvement from training differs between junior and senior employees,
the lines will not be parallel, indicating an interaction.
3. Helps in Strategic Decisions
Interaction plots support:
• Targeted strategy formulation
• Resource allocation
• Policy and intervention design
Example:
If a promotion works well in urban regions but not in rural regions, managers can tailor
marketing strategies accordingly.

Common questions

A two-way ANOVA extends beyond a one-way ANOVA by analyzing the effects of two independent variables on a single dependent variable, allowing examination of both main effects and interaction effects between factors . The interaction effect measures whether the effect of one factor depends on the level of another, revealing insights about combined factor influence not captured in a one-way ANOVA . Thus, it increases the depth of analysis by highlighting whether factors independently or jointly influence an outcome, providing a more comprehensive understanding of data relationships .

Levene’s Test is crucial for checking homogeneity of variance, an assumption necessary for parametric tests such as t-tests or ANOVA because these tests compare group means under the assumption of equal variability . If Levene's test returns a p-value greater than 0.05, the variances are considered equal, and standard parametric tests can be safely applied. However, a p-value less than or equal to 0.05 indicates unequal variances, suggesting a risk of inaccurate or misleading results from standard tests, necessitating alternatives like Welch's t-test or non-parametric tests .

ANOVA is necessary for comparing more than two group means simultaneously to avoid the increased risk of Type I error that arises from performing multiple t-tests . While ANOVA determines if at least one group mean differs significantly from others, it does not specify which groups differ. This is where post hoc tests, such as Tukey’s HSD, are used to identify specific group differences . A limitation emerges as post hoc tests might increase the complexity of statistical analysis, requiring stringent control of Type I error while making pairwise comparisons .

An independent samples t-test compares means between two unrelated groups to check for a significant difference in means, requiring assumptions of normality and equal variance . It involves hypotheses that the means are equal (H₀) or not equal (H₁) between the two groups . In contrast, a paired samples t-test evaluates the means within the same group measured twice under different conditions (before-after scenarios), focusing on the mean difference between paired observations. Its hypothesis structure is that the mean difference is zero (H₀) or not zero (H₁).

In hypothesis testing using t-tests and ANOVA, the decision rule revolves around p-values. If the p-value is greater than 0.05, there is insufficient statistical evidence to reject the null hypothesis, suggesting no significant effect or difference . Conversely, a p-value less than or equal to 0.05 leads to rejection of the null hypothesis, indicating a statistically significant effect or difference . Significance is thus a threshold past which results are unlikely due to random chance, implying practical or theoretical importance depending on the context .

Tukey’s Honest Significant Difference (HSD) test is used following a significant ANOVA result to identify which specific group means differ from each other since ANOVA only indicates that at least one difference exists . Its preconditions include a significant ANOVA result, equal variances across groups, and normally distributed data . Tukey’s HSD controls the familywise error rate, providing reliable pairwise comparisons among groups, especially when the groups are of equal or nearly equal size .

The Shapiro-Wilk test checks for normality by comparing the sample distribution with a normal distribution and providing a p-value that aids in decision-making. A p-value greater than 0.05 implies normality, while a p-value less than or equal to 0.05 suggests that the data is not normally distributed . This test is especially suitable for small to medium sample sizes . Its limitation lies in larger datasets, where it might become too sensitive, potentially detecting non-normality even when it is negligible for practical purposes .

Key assumptions for parametric tests such as t-tests or ANOVA are normality, independence, and homogeneity of variance. Normality requires the data to follow a roughly bell-shaped distribution because parametric tests rely on the mean and standard deviation, which are meaningful only with normally distributed data . Independence means that each observation should be independent of others to avoid influence that could distort results . Homogeneity of variance assumes that the variability of data should be approximately equal across groups, ensuring reliable comparison of group means . These assumptions are critical as violations can lead to misleading conclusions and unreliable results .

Interaction plots visually represent the interaction effects between two factors analyzed in a two-way ANOVA, aiding understanding by demonstrating how one factor's effect changes depending on another factor . Parallel lines in a plot indicate little or no interaction, while non-parallel or crossing lines suggest a significant interaction effect . These plots help simplify complex statistical interpretations for non-technical audiences, converting numerical outputs into intuitive insights, and facilitate managerial decision-making by quickly identifying patterns and relationships .

A one-sample t-test compares the mean of a sample to a known or assumed population mean to determine significant differences . The null hypothesis (H₀) posits no significant difference between the sample mean and the population mean (H₀: Sample mean = Population mean). Conversely, the alternative hypothesis (H₁) suggests a significant difference (H₁: Sample mean ≠ Population mean). The test applies when the population standard deviation is unknown, the sample size is small or moderate, and the sample data is approximately normally distributed .

Understanding Normality Tests in Statistics
No ratings yet
Understanding Normality Tests in Statistics
7 pages
Descriptive Statistics Overview and Methods
No ratings yet
Descriptive Statistics Overview and Methods
7 pages
Statistical Methods for Model Evaluation
No ratings yet
Statistical Methods for Model Evaluation
3 pages
Introduction to Statistics and Types
No ratings yet
Introduction to Statistics and Types
64 pages
Key Statistical Concepts for Data Science
No ratings yet
Key Statistical Concepts for Data Science
12 pages
Inferential Statistics with Python Guide
No ratings yet
Inferential Statistics with Python Guide
11 pages
Unit III
No ratings yet
Unit III
12 pages
Inferential Statistics
No ratings yet
Inferential Statistics
4 pages
Appropriate Tests for Normal Data
No ratings yet
Appropriate Tests for Normal Data
30 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
2 pages
Inferential Statistics for Data Science
100% (1)
Inferential Statistics for Data Science
10 pages
Statistical Methods Revision Guide
No ratings yet
Statistical Methods Revision Guide
9 pages
Advanced Educational Statistics Overview
No ratings yet
Advanced Educational Statistics Overview
5 pages
Statics 2
No ratings yet
Statics 2
56 pages
Types of Statistical Tests Explained
No ratings yet
Types of Statistical Tests Explained
19 pages
Introduction to Hypothesis Testing in R
No ratings yet
Introduction to Hypothesis Testing in R
47 pages
Research Methodology Workshop Overview
No ratings yet
Research Methodology Workshop Overview
72 pages
BPCC 108 Solved
No ratings yet
BPCC 108 Solved
50 pages
Inferential Statistics Overview
No ratings yet
Inferential Statistics Overview
11 pages
Unit IV
No ratings yet
Unit IV
6 pages
Inferential Statistics in Data Science
No ratings yet
Inferential Statistics in Data Science
76 pages
Inbound 8636243415802565747
No ratings yet
Inbound 8636243415802565747
20 pages
Choosing Statistical Tests: A Guide
No ratings yet
Choosing Statistical Tests: A Guide
6 pages
Understanding Inferential Statistics
No ratings yet
Understanding Inferential Statistics
3 pages
Statistical Concepts and Tests Explained
No ratings yet
Statistical Concepts and Tests Explained
7 pages
Choosing Statistical Tests: Types & Uses
No ratings yet
Choosing Statistical Tests: Types & Uses
10 pages
Understanding Statistical Tests
No ratings yet
Understanding Statistical Tests
16 pages
Data Analysis and Hypothesis Testing Guide
No ratings yet
Data Analysis and Hypothesis Testing Guide
13 pages
Normality and Variance Tests in Statistics
No ratings yet
Normality and Variance Tests in Statistics
28 pages
Understanding Statistical Tests and Types
No ratings yet
Understanding Statistical Tests and Types
10 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
50 pages
Introduction to Statistics and Analysis
No ratings yet
Introduction to Statistics and Analysis
125 pages
Understanding Statistical Tests of Significance
No ratings yet
Understanding Statistical Tests of Significance
6 pages
Statistical Modeling with Python Insights
No ratings yet
Statistical Modeling with Python Insights
25 pages
Inferential Statistics
No ratings yet
Inferential Statistics
49 pages
Online Statistical Science Encyclopedia
No ratings yet
Online Statistical Science Encyclopedia
37 pages
Understanding Inferential Statistics
100% (3)
Understanding Inferential Statistics
16 pages
Statistics
No ratings yet
Statistics
6 pages
Statistical Techniques
No ratings yet
Statistical Techniques
37 pages
Statistical Instruments in Research Analysis
No ratings yet
Statistical Instruments in Research Analysis
36 pages
Key Concepts in Statistics Explained
No ratings yet
Key Concepts in Statistics Explained
8 pages
Descriptive Statistics and R Analysis
No ratings yet
Descriptive Statistics and R Analysis
56 pages
Choosing Statistical Tests for Research
No ratings yet
Choosing Statistical Tests for Research
28 pages
Understanding Correlation and Statistical Tests
No ratings yet
Understanding Correlation and Statistical Tests
32 pages
Introduction to Statistics Lab Guide
100% (1)
Introduction to Statistics Lab Guide
75 pages
Statistical Methods, Parametric and Non-Parametric Tests-2
No ratings yet
Statistical Methods, Parametric and Non-Parametric Tests-2
11 pages
Basic Statistics For Agricultural Research
No ratings yet
Basic Statistics For Agricultural Research
19 pages
Nonparametric vs Parametric Tests Guide
No ratings yet
Nonparametric vs Parametric Tests Guide
48 pages
Key Concepts in Statistics Explained
No ratings yet
Key Concepts in Statistics Explained
10 pages
Unit5 Problems
No ratings yet
Unit5 Problems
17 pages
Selecting Statistical Tests Guide
No ratings yet
Selecting Statistical Tests Guide
6 pages
Parametric Statistics Analysis with R
No ratings yet
Parametric Statistics Analysis with R
118 pages
Biostatistics: Key Concepts Explained
100% (1)
Biostatistics: Key Concepts Explained
10 pages
Overview of Statistical Tests
No ratings yet
Overview of Statistical Tests
2 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
36 pages
Understanding Parametric Statistics
No ratings yet
Understanding Parametric Statistics
232 pages
Key Concepts in Business Research Methodology
No ratings yet
Key Concepts in Business Research Methodology
7 pages
Water Quality Analysis Report
No ratings yet
Water Quality Analysis Report
22 pages
Sample Size Determination Insights
No ratings yet
Sample Size Determination Insights
10 pages
Joint Distribution of Ball Draws
No ratings yet
Joint Distribution of Ball Draws
36 pages
BART R Package for Bayesian Modeling
No ratings yet
BART R Package for Bayesian Modeling
66 pages
ML Regular QP - Answer Keys - Student Version
No ratings yet
ML Regular QP - Answer Keys - Student Version
7 pages
CSBS CB304 Question Bank
No ratings yet
CSBS CB304 Question Bank
13 pages
Understanding Variability in Statistics
No ratings yet
Understanding Variability in Statistics
9 pages
Comprehensive Statistical Resources Guide
No ratings yet
Comprehensive Statistical Resources Guide
3 pages
Reference Values for Cook's Distance
No ratings yet
Reference Values for Cook's Distance
19 pages
Probabilistic Slope Stability Review
No ratings yet
Probabilistic Slope Stability Review
55 pages
Unit Root Testing in Econometrics
No ratings yet
Unit Root Testing in Econometrics
16 pages
MAT361 Probability Distributions Exam
No ratings yet
MAT361 Probability Distributions Exam
8 pages
Random Number Generator: 1 to 5
No ratings yet
Random Number Generator: 1 to 5
1 page
Inferential Statistics Overview and Applications
No ratings yet
Inferential Statistics Overview and Applications
43 pages
Gujarati Econometrics Overview
No ratings yet
Gujarati Econometrics Overview
8 pages
Simulating Random Walks in Python
No ratings yet
Simulating Random Walks in Python
31 pages
SPSS and Key Statistical Concepts Explained
No ratings yet
SPSS and Key Statistical Concepts Explained
2 pages
Unit 6 Quntitative
No ratings yet
Unit 6 Quntitative
48 pages
Fundamentals of Engineering Statistics
No ratings yet
Fundamentals of Engineering Statistics
5 pages
AP Statistics Syllabus Overview
No ratings yet
AP Statistics Syllabus Overview
3 pages
LSS BB Body of Knowledge
No ratings yet
LSS BB Body of Knowledge
5 pages
Strongest Linear Regression Analysis
No ratings yet
Strongest Linear Regression Analysis
5 pages
Key Statistical Formulas Overview
No ratings yet
Key Statistical Formulas Overview
7 pages
Binomial Distribution and Probability Concepts
No ratings yet
Binomial Distribution and Probability Concepts
3 pages
Mean Score Analysis of Student Data
No ratings yet
Mean Score Analysis of Student Data
7 pages
Sampling and Sample Size Computation: Sampling Methods in Research
No ratings yet
Sampling and Sample Size Computation: Sampling Methods in Research
4 pages
Histogram Construction for Family Data
No ratings yet
Histogram Construction for Family Data
30 pages
Statistics Revision Notes
No ratings yet
Statistics Revision Notes
9 pages
Econometrics Exam Questions and Solutions
No ratings yet
Econometrics Exam Questions and Solutions
15 pages
Biostatistics Course Overview
No ratings yet
Biostatistics Course Overview
2 pages

Understanding Inferential Statistics Basics

Uploaded by

Understanding Inferential Statistics Basics

Uploaded by

Module 5

Introduction to Inferential Statistics

What is Inferential Statistics?

Key Purposes of Inferential Statistics

4. Support Managerial Decisions Using Data

Why Use Python for Inferential Statistics?

2. Handles Large Datasets Easily

3. Reduces Calculation Errors

4. Widely Used in Industry and Academia

What Are Parametric Tests?

2. Use Means and Standard Deviations

3. Are Powerful When Assumptions Are Met

(a) Normality Test – Shapiro-Wilk Test

Why Equality of Variance Matters

Using Levene’s Test in Python

When Is a One-Sample t-Test Used?

This represents the current belief or standard.

Alternative Hypothesis (H₁)

from [Link] import ttest_1samp

When Is an Independent t-Test Used?

Example 1: Sales Performance

Example 2: Customer Satisfaction

This is a two-tailed test, as the difference can be in either direction.

Python Code (Simple Example)

When Is a Paired t-Test Used?

Example 1: Employee Productivity

Example 2: Customer Satisfaction

This implies that the intervention or time change had no effect.

Alternative Hypothesis (H₁)

This is a two-tailed test, as the change could be an increase or a decrease.

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a parametric statistical technique used to

In simple terms, ANOVA answers the question:

Why ANOVA Is Needed

• Groups: North, South, and West regions

Example 2: Customer Satisfaction Across Branches

• Groups: Branch A, Branch B, Branch C, Branch D

ANOVA is based on the following hypotheses:

Null Hypothesis (H₀)

There is no significant difference among group means.

This means all groups have the same average value.

Alternative Hypothesis (H₁)

At least one group mean is different from the others.

𝐻1 :At least one 𝜇 is different

• p-value > 0.05

Post Hoc Testing

ANOVA answers the question:

In simple terms, post hoc tests answer the question:

Why Not Multiple t-Tests?

• Performing many t-tests increases the chance of Type I error (false

Common Post Hoc Test – Tukey’s HSD

What Is Tukey’s HSD?

• Compares all possible pairs of group means

When Is Tukey’s HSD Used?

• After a significant ANOVA result

Suppose ANOVA shows a significant difference in customer satisfaction across

Tukey’s HSD can determine:

• Whether Branch A differs from Branch B

Tukey’s HSD provides:

• Mean differences between group pairs

• p-value ≤ 0.05 → Significant difference between the pair

Key Components of Two-Way ANOVA

Example 1: Training Type and Experience Level

Outputs of Two-Way ANOVA

1. Main Effect of Factor A

2. Main Effect of Factor B

Purpose of Interaction Plots

2. Simplify Statistical Interpretation

Common questions

How does a two-way ANOVA differ from a one-way ANOVA in terms of objectives and outcomes, particularly focusing on interaction effects?

How does a two-way ANOVA differ from a one-way ANOVA in terms of objectives and outcomes, particularly focusing on interaction effects?

Discuss the implications of using Levene’s test for checking homogeneity of variance before applying parametric tests like t-tests or ANOVA.

Discuss the implications of using Levene’s test for checking homogeneity of variance before applying parametric tests like t-tests or ANOVA.

Why is the Analysis of Variance (ANOVA) necessary when comparing more than two group means, and what is the limitation of using post hoc tests following ANOVA?

Why is the Analysis of Variance (ANOVA) necessary when comparing more than two group means, and what is the limitation of using post hoc tests following ANOVA?

What are the main differences in application and hypothesis between an independent samples t-test and a paired samples t-test?