HYPOTHESIS TESTING
Thinking
Hypothesis testing is a formal statistical procedure used to determine whether there is enough
evidence in a sample of data to infer that a specific condition or relationship is true for an entire
population. It helps distinguish between actual effects and patterns that may have occurred by
random chance.
Core Components
A hypothesis test involves evaluating two mutually exclusive statements:
Null Hypothesis ( H 0): The default assumption that there is no effect, no relationship, or
no difference. It is assumed to be true until proven otherwise.
Alternative Hypothesis ( H 1 or H a ): The claim you want to find evidence for, suggesting
that a real effect or difference exists.
Standard 5-Step Process
While variations exist, most researchers follow these five primary steps:
1. State the Hypotheses: Define both H 0 and H 1 clearly.
2. Set the Decision Criteria: Choose a significance level (α ), typically 0.05 (5%), which
represents the risk you are willing to take of being wrong.
3. Perform a Statistical Test: Collect data and calculate a test statistic (e.g., Z-score, t-
score) to measure how far the sample results deviate from the null hypothesis.
4. Determine the p-value: Find the probability of seeing your results if the null hypothesis
were true.
5. Make a Decision: Compare the p-value to α :
o If p-value ≤ α : Reject the null hypothesis (statistically significant).
o If p-value ¿ α : Fail to reject the null hypothesis (not enough evidence).
Common Errors in Testing
There are two main types of mistakes that can occur during the decision-making process:
Type I Error (False Positive): Rejecting a true null hypothesis (saying there is an effect
when there isn't).
Type II Error (False Negative): Failing to reject a false null hypothesis (missing a real
effect).
Example problem involving type1 and type 2 error
In statistical hypothesis testing, errors occur when a decision based on sample data does not
reflect the truth in the population
.
Core Definitions
Null Hypothesis ( H 0): The assumption that there is no effect or no difference.
Type I Error (False Positive): Rejecting H 0 when it is actually true.
Type II Error (False Negative): Failing to reject H 0 when it is actually false.
Example Problem: Medical Screening for a Disease
Imagine a researcher is testing a new screening tool for a specific virus.
H 0: The patient does not have the virus.
H a : The patient does have the virus.
Error Type Scenario Reality Consequence
The test says the The patient is
Type I (False Unnecessary stress, isolation, or potentially
patient has the actually
Positive) harmful medical treatments.
virus. healthy.
The patient The patient remains undiagnosed and
Type II (False The test says the
actually has the untreated, potentially spreading the disease
Negative) patient is healthy.
virus. or suffering worsening health.
Other Real-World Scenarios
Criminal Justice System:
o H 0 : The defendant is innocent.
o Type I: Convicting an innocent person.
o Type II: Letting a guilty person go free.
Fire Alarms:
o H 0 : There is no fire.
o Type I: The alarm rings when there is no fire (False Alarm).
o Type II: The alarm fails to ring during an actual fire (Missed Alarm).
Spam Filtering:
o H0
: The email is legitimate.
o Type I: Marking a legitimate email as spam.
o Type II: Allowing an actual spam email into the inbox.
Key Trade-offs
There is an inverse relationship between these errors. Reducing the significance level (α ) lowers
the risk of Type I errors but increases the risk of Type II errors (𝛽). To reduce both errors
simultaneously, researchers typically must increase the sample size.
In statistical hypothesis testing, a
one-tailed test is used when the research hypothesis specifies a direction (e.g., higher or lower),
while a two-tailed test is used when the hypothesis only checks for a difference (either
direction).
Here are examples of committing errors for both, based on a significance level (α ) of 0.05.
1. One-Tailed Test Example (Right-Tailed)
A company believes a new, cheaper, and faster shipping method increases average delivery
speed.
Null Hypothesis ( H 0): Mean delivery time μ ≤3 days (No improvement)
Alternative Hypothesis ( H a ): Mean delivery time μ>3 days (Faster, specifically
looking for a decrease in time, but worded here as an increase in "speed," often tested as a
reduction in time—let's use the standard "reduction" phrasing for clarity: H a ∶ μ<3).
Significance Level (α ): 0.05
Committing a Type I Error (False Positive):
Scenario: The new shipping method is actually no faster than the old one ( μ=3 days),
but due to random, lucky, fast deliveries in the sample, the test indicates a significantly
lower mean.
Action: The company rejects H 0 and adopts the new method.
Result: They spend money on a new system that provides no real improvement.
Committing a Type II Error (False Negative):
Scenario: The new method is actually faster, but the sample data doesn't provide enough
evidence to show it.
Action: The company fails to reject H 0.
Result: The company misses out on a more efficient shipping process.
2. Two-Tailed Test Example
A manufacturing plant produces screws that must be exactly 50mm long. They test to see if a
machine change causes the mean length to differ from 50mm.
Null Hypothesis ( H 0): Mean length μ=50 mm (Machine is working correctly)
Alternative Hypothesis ( H a ): Mean length μ ≠50 mm (Machine is producing screws
that are either too long or too short).
Significance Level (α ): 0.05 (α /2=0.025 in each tail)
Committing a Type I Error (False Positive):
Scenario: The machine is actually working perfectly ( μ=50 mm), but a random sample
happened to have a few slightly longer screws.
Action: The company rejects H 0 , concluding the machine is broken, and shuts down
production.
Result: Unnecessary shutdown and maintenance costs.
Committing a Type II Error (False Negative):
Scenario: The machine is broken ( μ=51 mm), but the random sample happens to fall
within the acceptable range.
Action: The company fails to reject H 0.
Result: The company continues producing defective screws.
Key Takeaways on Errors
Type I Error (α ): Rejecting H 0 when it is true (False Positive). You conclude there is a
difference when there isn't.
Type II Error ( β ): Failing to reject H 0 when it is false (False Negative). You miss an
actual effect.
One-tailed tests have more power to detect an effect in the specified direction, but they
cannot detect effects in the opposite direction.
Two-tailed tests are more conservative, protecting against missing a difference in either
direction.
Type I & Type II Errors | Differences, Examples, Visualizations
Jan 18, 2021 — Type I & Type II Errors | Differences, Examples, Visualizations *
Example: Type I vs Type II error You decide to get tested for CO...
One-Tailed vs Two-Tailed Tests: A Comprehensive Guide | Mida Blog
Oct 1, 2024 — Quick Reference * One-Tailed Tests look for effects in one direction
(better or worse) * Two-Tailed Tests look for effects in both...
Type II Error: Definition, Example, vs. Type I Error - Investopedia
Jul 26, 2025 — The difference between a type II error and a type I error is that a type I
error rejects the null hypothesis when it is true (i.e.
Directionality of Tests
One-Tailed Test: Used when you predict a change in only one specific direction (e.g.,
"new medicine is better").
Two-Tailed Test: Used when you want to see if there is any difference in either direction
(higher or lower).
steps in hypothesis testing
Hypothesis testing is a systematic statistical process used to determine whether there is enough
evidence in a sample of data to support a particular belief about a population
. In 2026, researchers and data scientists continue to follow a standard multi-step procedure to
ensure the reliability of their conclusions.
Standard Steps in Hypothesis Testing
1. State the Hypotheses: Define the Null Hypothesis ( H 0), which represents the "status
quo" or no effect, and the Alternative Hypothesis ( H a or H 1), which is the claim you are
testing.
2. Set the Significance Level (α ): Choose a threshold for "strong evidence," commonly set
at 0.05 (5%). This represents the risk you are willing to take of rejecting a true null
hypothesis (Type I error).
3. Select the Appropriate Test: Identify the correct statistical test (e.g., Z-test, T-test, or
ANOVA) based on your data type, sample size, and whether the population standard
deviation is known.
4. Collect Data and Calculate the Test Statistic: Gather your sample data and use the
chosen formula to calculate a test statistic (like a z -score or t -value). This value
measures how far your sample results are from the null hypothesis.
5. Determine the P-Value or Critical Value:
o P-Value Approach: Calculate the probability of observing your result (or one
more extreme) if the null hypothesis were true.
o Rejection Region Approach: Find the critical value that separates the "safe"
zone from the "rejection" zone based on your significance level.
6. Make a Decision:
o If P-value ≤ α , or if your test statistic falls in the rejection region, reject H 0.
o If P-value ¿ α , you fail to reject H 0.
7. Interpret the Results: State your conclusion in plain language relative to the original
research question. For example, "There is sufficient evidence to suggest that the new
treatment is more effective".
Key Considerations for 2026
Effect Size: Modern standards strongly recommend reporting the Effect Size (e.g.,
Cohen's d ) alongside the P-value to show the practical significance of the results, not just
the statistical significance.
Assumptions: Before testing, you must verify that your data meets specific assumptions,
such as normality and independence of observations.
Software Usage: In 2026, most calculations are performed using statistical software (like
R, Python, or SPSS), which automates the generation of P-values and test statistics.
hypothesis testing example problem using steps in hypothesis
To solve a hypothesis testing problem, follow these standardized steps
.
Example Problem
A company claims that the average life of its lightbulbs is 2,000 hours. A consumer group
suspects the actual life is shorter. They test a random sample of 36 bulbs and find a sample
mean of 1,900 hours with a standard deviation of 300 hours. Is there enough evidence at a 0.05
significance level to support the consumer group's suspicion?
Steps in Hypothesis Testing
1. State the Hypotheses
o Null Hypothesis ( H 0): The status quo; μ=2,000.
o Alternative Hypothesis ( H a ): The claim to be tested; μ<2,000 (Left-tailed test).
2. Choose the Significance Level (α )
o The problem specifies α =0.05 . This represents the risk of rejecting a true null
hypothesis.
3. Identify the Test Statistic
o Since the sample size is large (n=36 ≥30 ), we use the Z-test.
x̄ −μ
o Formula: z=
σ /√ n
4. Calculate the Test Statistic
o x̄=1,900 , μ=2,000, σ =300, n=36.
1,900−2,000 −100 −100
o z= = = =−2.00
300/ √ 36 300 /6
𝑧=1,900−2,000300/36√=−100300/6=−10050=−𝟐.𝟎𝟎
50
o Critical Value Method: For a left-tailed test at α =0.05 , the critical 𝑧 value is -
5. Determine the P-Value or Critical Value
o P-Value Method: A 𝑧-score of -2.00 corresponds to a P-value of approximately
1.645.
0.0228 (using a Z-table).
6. Make a Decision
o Comparison: Since the calculated z (-2.00) is less than the critical value (-1.645),
or since the P-value (0.0228) is less than α (0.05), we reject the null hypothesis.
7. State the Conclusion
o There is sufficient statistical evidence at the 0.05 level to conclude that the
average life of the lightbulbs is significantly less than 2,000 hours.
Tools for Practice
Hypothesis Testing Calculator – Automates calculations for various tests.
P-Value Calculator – Find exact significance levels from test statistics.