Hassan Gilani
Danyal Ajaz
Present
ANalysis Of VA riance
Introduction
Analysis of variance (ANOVA) is a method for testing the hypothesis that there is no difference between two or more population means (usually at least three)
Often used for testing the hypothesis that there is no difference between a number of treatments
Why Look at Variance When Interested in Means?
Three groups tightly spread about their respective means, the variability within each group is relatively small Easy to see that there is a difference between the means of the three groups
Why Look at Variance When Interested in Means?
Three groups have the same means as in previous figure but the variability within each group is much larger Not so easy to see that there is a difference between the means of the three groups
Why Look at Variance When Interested in Means?
To distinguish between the groups, the variability between (or among) the groups must be greater than the variability of, or within, the groups If the within-groups variability is large compared with the between-groups variability, any difference between the groups is difficult to detect
To determine whether or not the group means are significantly different, the variability between groups and the variability within groups are compared
Assumptions for ANOVA
One-Way ANOVA
When there is only one qualitative variable which denotes the groups and only one measurement variable (quantitative), a one-way ANOVA is carried out For a one-way ANOVA the observations are divided into mutually exclusive categories, giving the one-way classification
ASSUMPTIONS
Each of the populations is Normally distributed with the same variance (homogeneity of variance) The observations are sampled independently, the groups under consideration are independent
ANOVA is robust to moderate violations of its assumptions, meaning that the probability values (P-values) computed in an ANOVA are sufficiently accurate even if the assumptions are violated
Notation
Consider I groups, whose means we want to compare Let , = 1,2,3 be the sample size of group
Our null Hypothesis is equal means: 0 : 1 = 2 =
Notation
: Sample j in group i 1 , 2 , : Sample means : Mean of all observations = 1 + 2 + 3 +
Levels of Stress
N = 15
Sample
1 2 3 4 5
2 3 7 2 6
Sample
1 2 3 4 5
10 8 7 5 10
Sample
1 2 3 4 5
10 13 14 13 15
Normal
Announced Layoff
During Layoff
Sum of Squares With In Groups
Variance
2 3 7 2 6
Variance
10 8 7 5 10
Variance
10 13 14 13 15
Sum of Squares Sum of Squares Sum of Squares
Variance Between Groups
2 3 7 2 6 10 8
Variance Between Groups
Variance from the mean Total Variance Total Sum of Squares
7
5 10 10 13 14 13 15
Within-Groups Variance
Remember assumption that the population variances of the three groups is the same
Under this assumption, the variances of the groups all estimate this common value
True population variance = Within-groups variance = within-groups mean square= error mean square = 2
Within-Groups Variance
For groups with equal sample size this is given by the average of the variances of the groups
2 1 =
2
=1
1 =
=1 =1
For unequal sample sizes, the variances are weighted by their degrees of freedom
2 =
=1
1 2
Since the population variances are assumed to be equal, the estimate of the population variance, derived from the separate within-group estimates, is valid whether or not the null hypothesis is true
2 3 7 2 6
-4 = -2 -4 = -1 -4 = 3
4 1 9 4 4
10 8 7 5 10
10 13 14 13 15
-4 = -2
-4 = 2
Mean = 4, Sum = 22
Sum = 18
Sum = 14
22 + 18 + 14 = 54 SSW
Sum of Squares Within Groups
Between-Groups Variance
If the null hypothesis is true, the groups can be considered as random samples from the same population (assumed equal variances, because the null hypothesis is true, then the population means are equal) The three means are observations from the same sampling distribution of the mean 2 The sampling distribution of the mean has variance This gives a second method of obtaining an estimate of the population variance The observed variance of the treatment means is an estimate of and is given by
=
=1
Between-Groups Variance
For equal sample sizes, the between-groups variance is then given by:
2 =
=1
For unequal sample sizes, the between-groups variance is given by:
2 =
=1
2
3 7 2 6 10 8 7
SST Mean = 8.3 Total Sum of Squares = 257.3
5 10
10 13 14 13 15
Between Sum of Squares
SST
= SSW + SSB = 203.3
SSB
Testing the Null Hypothesis
If the null hypothesis is true then the between-groups variance 2 and the within-groups variance 2 are both estimates of the population variance 2 If the null hypothesis is not true, the population means are not all equal, 2 then will be greater then the population variance 2 , it will be increased by the treatment differences To test the null hypothesis we compare the ratio of 2 and 2 to 1 using an F-test, F statistic is given by: 2 = 2 With I-1 and N-1 degrees of freedom
F Distribution and F-test
The F distribution is the continuous distribution of the ratio of two estimates of variance The F distribution has two parameters: degrees of freedom numerator (top) and degrees of freedom denominator (bottom) The F-test is used to test the hypothesis that two variances are equal The validity of the F-test is based on the requirement that the populations from which the variances were taken are Normal
Relative Frequency
Accept Null Hypothesis Reject Null Hypothesis
Score
SSB dfB SSW dfW
=
=
203.3 2
= 101.67
101.67 4.5
Number Samples - 1
= 22.59
54 12
= 4.5
F Value
Number Observations Number of Samples
Accept Null Hypothesis Reject Null Hypothesis
Relative Frequency
22.59
Score
Summary
Within-Groups Mean Square
This is the average of the variances in each group This estimates the population variance regardless of whether or not the null hypothesis is true This is also called the error mean square
Between-Groups Mean Square
This is calculated from the variance between groups This is calculated from the variance between groups This estimates the population variance, if the null hypothesis is true
Use an F-test to compare these two estimates of variance