Lab 2 Report: Hypothesis Testing
Summary – Data Science with Python
1. Setup and Significance Level
To begin the analysis, the necessary Python libraries (NumPy, Pandas, Matplotlib, and SciPy)
were imported. A standard significance level was established to serve as the threshold for
all subsequent hypothesis tests.
Evaluation:
As shown in the output above, the environment was successfully initialized. The significance
level is set to 0.05, meaning we accept a 5% risk of rejecting the null hypothesis when it is
actually true. This value will be used to compare against the calculated p-values in the
following tests.
1.1. Independent Two-Sample T-Test
We performed an independent two-sample t-test to determine if there is a statistically
significant difference between the means of two independent groups (Group 1 and Group 2).
● Null Hypothesis ( ): There is no significant difference between the means of Group 1
and Group 2.
● Alternative Hypothesis ( ): There is a significant difference between the means.
Evaluation:
The statistical analysis yielded a T-statistic of -6.1279 and a P-value of 0.000009.
● Mean Comparison: Group 1 has a mean of 27.50, while Group 2 has a significantly
higher mean of 33.00.
● Conclusion: Since the p-value ( ) is effectively zero and strictly less than (
), we REJECT the Null Hypothesis.
The box plot visualization further confirms this result, showing a clear separation
between the interquartile ranges of the two groups, indicating that the teaching method
(or variable) applied to Group 2 resulted in significantly higher scores than Group 1.
1.2. Paired T-Test
A paired t-test was conducted to evaluate the effect of a treatment by comparing
measurements taken from the same subjects "Before" and "After" the medication.
● Null Hypothesis ( ): There is no significant difference between the measurements
before and after the treatment.
● Alternative Hypothesis ( ): There is a significant difference.
Evaluation:
The test resulted in a high T-statistic of 23.4787 and a P-value of 0.000000.
● Mean Comparison: The mean value dropped from 133.50 (Before) to 126.50 (After),
with a mean difference of 7.00.
● Conclusion: With a p-value < 0.05, we REJECT the Null Hypothesis.
The statistical evidence suggests that the treatment had a significant effect. The line plot
visualization supports this, showing a consistent downward trend for almost all subjects
(represented by the green "After" points being generally lower than the red "Before"
points).
1.3. One-Sample T-Test
A one-sample t-test was performed to check if the sample mean significantly differs from a
hypothesized population mean of 75.
● Null Hypothesis ( ): The sample mean is equal to the population mean (75).
● Alternative Hypothesis ( ): The sample mean is not equal to 75.
Evaluation:
The analysis produced a T-statistic of 5.3044 and a P-value of 0.000251.
● Mean Comparison: The calculated Sample Mean is 79.50, which is higher than the
hypothesized Population Mean of 75.
● Conclusion: Since the p-value ( ) is less than the significance level (
), we REJECT the Null Hypothesis.
We conclude that the sample mean is statistically significantly different from the
population mean. The histogram visualization illustrates this difference clearly, where the
green line (Sample Mean) is shifted noticeably to the right of the red dashed line
(Population Mean).
2. ANOVA (Analysis of Variance)
Following the T-tests, we performed a One-Way ANOVA to compare the means of three
different groups (Region A, Region B, and Region C) to see if at least one region performs
differently from the others.
● Null Hypothesis ( ): All group means are equal.
● Alternative Hypothesis ( ): At least one group mean is different from the others.
Evaluation:
The ANOVA test resulted in a significant F-statistic of 150.02 and a P-value of
approximately 0.00.
● Mean Comparison: Region B showed the highest performance (Mean: 58.30), followed
by Region A (Mean: 48.90), and Region C (Mean: 41.00).
● Conclusion: Since the p-value is well below the significance level ( ), we
REJECT the Null Hypothesis.
This confirms that sales performance varies significantly by region. The box plot
visualization clearly displays these differences, with Region B's distribution positioned
much higher than the other two.
3. Chi-Squared Test
We examined the relationship between two categorical variables: Gender (Men, Women) and
Product Preference (Product A, B, C).
● Null Hypothesis ( ): There is no association between gender and product preference
(variables are independent).
● Alternative Hypothesis ( ): There is a significant association.
Evaluation:
The Chi-squared test yielded a statistic of 7.0204 and a P-value of 0.029890.
● Conclusion: The p-value ( ) is less than ( ); therefore, we REJECT the Null
Hypothesis.
This indicates a statistically significant association between gender and product
preference. Looking at the bar chart, we can observe specific differences, such as Men
having a higher preference for Product A compared to Women, while Women show a
stronger preference for Product B.
4. Normality Tests (Shapiro-Wilk)
To validate assumptions for parametric tests, we checked two datasets ("Normal Data" and
"Skewed Data") for normality using the Shapiro-Wilk test.
● Null Hypothesis ( ): The data is normally distributed.
● Alternative Hypothesis ( ): The data is NOT normally distributed.
Evaluation:
● Dataset 1 (Normal Data): The p-value is 0.6722, which is greater than 0.05. We FAIL TO
REJECT , confirming the data follows a normal distribution. The Q-Q plot (top right)
shows points perfectly hugging the red diagonal line.
● Dataset 2 (Skewed Data): The p-value is 0.000033, which is less than 0.05. We
REJECT , indicating the data is not normal. The histogram (bottom left) is clearly
skewed, and the Q-Q plot (bottom right) deviates significantly from the red line.
8. Summary of All Tests
Finally, all hypothesis tests performed in this lab were aggregated into a summary table for a
comprehensive overview.
Conclusion of Lab 2:
Through this lab, we successfully applied various statistical tests using Python libraries
([Link]). We learned to interpret P-values relative to the significance level ( ) to
make data-driven decisions. The combination of statistical metrics and visualizations (Box
plots, Histograms, Q-Q plots) provided robust evidence for rejecting or failing to reject null
hypotheses across different scenarios.