Normality Tests and T-Tests Analysis
Normality Tests and T-Tests Analysis
1. Normality Test: To complete this homework, you will need the dataset Kinesiology_1.csv which you
will find on Moodle.
a) Check to see if the variable HR follows a normal distribution. Formulate the appropriate hypothesis
and report the corresponding p-value. What is your verdict?
H0: The variable HR is normally distributed.
Ha: The variable HR is not normally distributed.
α = 0.05
> Data <- [Link]("Kinesiology_1.csv")
> Data_HR <- Data$HR
> df <- [Link](y = Data_HR)
> p <- ggplot(df, aes(sample = y)) +
+ stat_qq() +
+ stat_qq_line() +
+ theme_classic()
>p
> [Link](Data_HR)
data: Data_HR
W = 0.96431, p-value = 0.07662
From Shapiro-Wilk test, the critical value is 0.96431 and the p-value is 0.07662. Here,
0.07662 > α, we cannot reject the null hypothesis. There is insufficient evidence to
claim that HR is not normally distributed.
b) Repeat (a) only for 5_min and then only for 15_min.
5_min:
H0: HR is normally distributed for the REST group 5_min.
Ha: HR is not normally distributed for the REST group 5_min.
α = 0.05
> [Link](Data_HR_5)
data: Data_HR_5
W = 0.91906, p-value = 0.1864
Using the Shapiro-Wilk test, the critical value is 0.91906 and the p-value is 0.1864. Here,
0.1864 > α, we cannot reject the null hypothesis. There is insufficient evidence to claim that
HR is not normally distributed for the REST group 5_min.
15_min:
H0: HR is normally distributed for the REST group 15_min.
Ha: HR is not normally distributed for the REST group 15_min.
α = 0.05
> [Link](Data_HR_15)
Using the Shapiro-Wilk test, the critical value is 0.96845 and the p-value is 0.8345. Here,
0.8345 > α, we cannot reject the null hypothesis. There is insufficient evidence to claim that
HR is not normally distributed for the REST group 15_min.
2. Confidence interval using the T distribution: It is recommended that the average hours of sleep an
adult should receive daily is 8. As a graduate student, this can be difficult to achieve some times. The
following is a set of 10 measurements from my sleep schedule the past 10 days:
3 6 7 7 6 5 7 3 6 8
a) Create a 2 tailed 95% confidence interval with the mean and standard error of the above dataset
using a T-distribution.
> a <- c(3,6,7,7,6,5,7,3,6,8)
> describe (a)
𝝈
𝒔=
√𝒏
𝟏.𝟔𝟗
Since, 𝝈 = 𝟏. 𝟔𝟗 and n = 10, we have 𝒔 = = 𝟎. 𝟓𝟑𝟒𝟒 , x̅ = 5.8
√𝟏𝟎
[x̅ - t9, 0.025 *(SE), x̅ + t9, 0.025 *(SE)]
[5.8 – 2.262(.5344), 5.8 + 2.262(.5344)]
[4.5911, 7.008]
(In this question, you can use a software to compute some descriptive statistics, but you should complete
the problem by hand)
3. One-Sample T-test: The US CDC reports that the average weight of healthy 12-hour-old infants is
7.5 lb. A sample of 10 newborn babies from a low-income neighborhood yielded the following weights (in
pounds) at 12 hours after birth:
6.0 8.6 7.5 8.2 8.0 8.1 6.4 6.0 7.2 4.8
The researcher wants to know if we can conclude that babies from this neighborhood are underweight with
α = 0.01.
a) Write the null and alternate hypotheses.
Ho: The babies from low-income neighborhoods weighted same as the population
infant weight of 7.5lbs.
Ho: µ = 7.5
H1: The babies from low-income neighborhoods weigh less than 7.5lbs.
H1: µ < 7.5
b) The researcher argues that a one-sided test is needed. Can you support her claim logically? Do you
think a one-sided test could be justified here? Explain.
Here, we are conducting test to see if the babies of low-income neighborhoods weigh less than
7.5 lbs. only, so one – sided test is sufficient. If, however, we were determining a difference in
weights, then a two-sided test would be necessary to determine both inequalities.
c) Run a one-sample t-test using the sample data above. What is your p-value from your results?
> babies = c(6,8.6,7.5, 8.2, 8, 8.1, 6.4, 6, 7.2, 4.8)
> [Link](babies, mu = 7.5)
data: babies
t = -1.079, df = 9, p-value = 0.3086
alternative hypothesis: true mean is not equal to 7.5
95 percent confidence interval:
6.199468 7.960532
sample estimates:
mean of x
7.08
We found a p-value for a two-sided t test. To get, p-value for a one-sided t test, we divide the
p-value listed above by 2. Thus, our p-value is 0.1543.
4. One-Sample T-test: To complete this question, use the dataset Kinesiology_1.csv again. We will do a
one-sample t-test on the variable HT, assuming that the test is two-sided with α = 0.05. We are interested
in seeing if the mean height equals 170 cm or not.
a) Write the null and alternate hypotheses.
H0: μHT = 170 (mean height is equal to 170 cm)
HA: μHT ≠ 170 (mean height is not equal to 170 cm)
b) Run a one-sample t-test on the variable HT. What is your p-value from your results?
> Data_HT <- Data$HT
> [Link](Data_HT, mu = 170)
data: Data_HT
t = 6.0426, df = 59, p-value = 1.099e-07
alternative hypothesis: true mean is not equal to 170
95 percent confidence interval:
173.4112 176.7888
sample estimates:
mean of x
175.1
c) What is your conclusion to our hypotheses?
Since p-value is less than 0.05, we reject the null hypothesis. The mean grade of our sample
is significantly different from 170.
[Link] questions: Complete the following concept questions from the book, in Chapter 4:
1, 8, 12, 13
1) True
8) True
12) False, there must be assumptions that are met when the t test is applied.
13) False, the degrees of freedom for the t test do depend on the sample size.
The confidence interval for sleep hours, calculated using a T-distribution, was [4.5911, 7.008], which does not include the recommended average of 8 hours. This implies that there is strong evidence against the null hypothesis that the student's average sleep aligns with the recommended 8 hours. It suggests that the student gets significantly less sleep, highlighting a potential lifestyle issue or health concern that requires attention .
The Shapiro-Wilk test was employed to assess the normality of HR data for the REST group at both 5_min and 15_min intervals. For the 5_min interval, the test produced a p-value of 0.1864, and for the 15_min interval, it generated a p-value of 0.8345. In both cases, the p-values exceeded the alpha level of 0.05, leading to the conclusion that there was no significant departure from normality. This meant sufficient evidence was lacking to reject the null hypothesis of normal distribution for both intervals .
The one-sample T-test on infant weights yielded a p-value of 0.1543, which is greater than the significance level of 0.01. Thus, statistically, we fail to reject the null hypothesis, implying insufficient evidence exists to claim that infants from the low-income neighborhood are underweight compared to the population average of 7.5 lbs. This outcome underscores the need for more substantial data to establish such a claim with statistical confidence .
A one-sample T-test is justified in this context because it allows the comparison of the sample mean from the observed infant weights against a known population mean (7.5 lbs), while taking into account the standard deviation of the sample and sample size. This test is ideal when evaluating mean differences in a specific direction, providing a method to infer if the observed sample can statistically be considered different from the general reference population .
A T-distribution is more appropriate in situations where the sample size is small (typically less than 30) and the population standard deviation is unknown. The T-distribution accounts for the additional uncertainty due to these factors, resulting in wider confidence intervals than those produced by the normal distribution. This makes it a better fit for the sleep data example, where only 10 measurements are available and the population standard deviation is not known .
The Shapiro-Wilk test assesses the normality of a dataset by comparing the order statistics of the sample to the expected order statistics under a normal distribution. It calculates a W statistic, where a value close to 1 indicates a distribution close to normal. For the HR variable, the test resulted in a W of 0.96431 with a p-value of 0.07662, which is greater than the significance level of 0.05. This indicates that there was insufficient evidence to reject the null hypothesis, suggesting that HR is approximately normally distributed .
The assumption of normality is critical for the validity of T-tests, especially when sample sizes are small, as it ensures the sampling distribution of the mean approximates normality. For the HR and REST group data, the Shapiro-Wilk test was used to check this assumption, concluding that the data was approximately normally distributed. This validation allowed reliable application of T-tests in subsequent analysis while ensuring that the violation of normality wouldn't skew the results .
To construct a 95% confidence interval for average sleep hours, first compute the sample mean (x̅) and standard deviation (s). Given n=10 sleep measurements, use x̅=5.8 and s=1.69. Calculate the standard error as SE = s/√n = 0.5344. Using the T-distribution with df=n-1=9 and the critical value for α=0.025, find the T-value (approximately 2.262). Formulate the interval as [x̅ - T*SE, x̅ + T*SE], resulting in [4.5911, 7.008]. Assumptions include the sample being random and approximately normal, regardless of whether the population's normal distribution is initially unknown .
The one-sample T-test conducted on the variable HT resulted in a p-value of 1.099e-07, which is significantly less than the significance level of 0.05. This indicates a statistically significant difference in the mean height from the hypothesized value of 170 cm. Therefore, we reject the null hypothesis and conclude that the mean height is different from 170 cm .
A one-sided T-test tests for the possibility of a relationship in only one direction, whereas a two-sided test examines both directions for differences. In the context of examining infant weight, the research question specifically sought to determine if infants were underweight compared to the population average. Since the concern was only about whether the mean weight was less than 7.5 lbs, a one-sided test was appropriate to capture this single-direction hypothesis without considering an overestimate alternative .