CHAPTER FOUR
CHI-SQUARE DISTRIBUTIONS
A Chi-square (x2) distribution is a continuous distribution ordinarily derived as the sampling distribution
of a sum of squares of independent standard normal variables.
Characteristics of the square distributions
1. It is a continuous distribution
2. The X2 dist has a single parameter; the degree of freedom, ν
3. The mean of the chi-square distribution is ν
4. The variance of the chi-square distribution is 2ν. Thus, the mean and Variance depend on the degree
of freedom.
5. It is based on a comparison of the sample of observed data (results) with the expected results under
the assumption that the null hypothesis is true.
6. It is a skewed distribution and only non negative values of the variable X2 are possible. The
skewness decreases as ν increases; and when V increases without limit it approaches a normal
distribution. It extends indefinitely in the positive direction
7. The area under the curve is 1.0
Having the above characteristics, X2 dist has the following areas of application:
1. Test for independence between two variables
2. Goodness of fit tests (Binomial, Normal, and Poisson )
3. Testing for the equality of several proportions
4.1.1. TEST FOR THE INDEPENDENCE BETWEEN TWO VARIABLES
A X2 test of independence is used to analyze the frequencies of two variables with multiple categories to
determine whether the two variables are independent. That is, the Chi-square distribution involves using
sample data to test for the independence of two variables. The sample data are given in to a two way
table called a contingency table. Because the X2 test of independence uses a contingency table, the test
is sometimes referred to as CONTINGENCY ANALYSIS (Contingency table test). The X2 test is used
to analyze, for example, the following cases:
Whether employee absenteeism is independent of job classification
Whether beer preference is independent of sex (gender)
Whether favorite sport is independent of nationality.
Whether type of financial investment is independent of geographic region.
The steps and procedures are similar with hypothesis testing.
Example:
1. A company planning a TV advertising campaign wants to determine which TV shows its target audience
watches and thereby to know whether the choice of TV program an individual watches is independent of
the individuals income. The table supporting this is shown below. Use a 5% level of significance and
the null hypothesis.
1
Income Type of Show
Basketball Movie News Total
Low 143 70 37 250
Medium 90 67 43 200
High 17 13 20 50
Total 250 150 100 500
Solution
1. Ho: Choice of TV program an individual watches is independent of the individuals income
Ha: Income and Choice of TV program are not independent
2. Decision rule
= 0.05
ν = (R-1) (C-1)1*
= (3-1) (3-1)
=4
X , ν = X20.05, 4 = 9.49
2
Reject Ho if sample X2 is greater than 9.49
3. Compute the test statistic
In computing the test statistic our first task is to estimate the expected frequencies (eij = rioj/n); where
ri = Observed freq total for row i.
Cj = observed freq total for column j
n = sample size
e11 = 250x250/500 = 125 e21 = 200 x 250/500 = 100 e31 = 50 x 250/500 =25
e12= 250x150/500 = 75 e22 = 200 x 150/500=60 e32 = 50 x 150/500=15
e13 = 250x100/500 =50 e23 = 200x100/500 =40 e33 = 50x100/500 =10
A test of the null hypothesis that variables are independent of one another is based on the magnitudes of
the differences between the observed frequencies and the expected frequencies. Large differences
between oij and eij provide evidence that the null hypothesis is false. The test is based on the following
Chi-square test statistic.
Oij eij 2 f o f e 2
2
Or
2
eij fe
Where:
Oij (fo) = observed frequency for contingency table category in row i and column j.
Eij (fe) = expected frequency for contingency table in row i and column j.
1
For the RxC contingency table, the degrees of freedom are calculated as (R-1) (C-1). The degrees of freedom
refers to the number of expected frequencies that can be chosen freely provided the row and column totals of
expected frequencies are identical to the row and column totals of the observed frequency table.
2
2
143 1252 70 752 37 502 90 1002 67 602 17 252 13 152
125 75 50 100 60 25 15
20 10 2
43 40
21.174
2
10 40
4. Reject the null hypothesis that choice of TV program is independent from income level.
2. A human resource manager at EAGLE Inc. was interested in knowing whether the voluntary absence
behavior of the firm’s employees was independent of marital status. The employee files contained data
on marital status and on voluntary absenteeism behavior for a sample of 500 employees is shown below.
Marital Status
Absence behavior Married Divorced Widowed Single Total
Often absent 36 16 14 34 100
Seldom absent 64 34 20 82 200
Never absent 50 50 16 84 200
Total 150 100 50 200 500
Test the hypothesis that absence behavior is independent of marital status at a significance level of 1%.
Solution
1. Ho: Voluntary absence behavior is independent of marital status
Ha: Voluntary absence behavior and marital status are dependent
2. = 0.01
V = (R-1) (C-1)
= (3-1) (4-1) = 6
X2 ,ν= X2 0.01,6 = 16.81
Reject Ho if sample X2 > 16.81
3. Sample X2
Observed freq Expected Freq (fo-fe)2 f o f e 2
(fo) (fe)
fe
36 30 36 1.200
64 60 16 0.267
50 60 100 1.667
16 20 16 0.800
34 40 36 0.900
50 40 100 2.500
14 10 16 1.600
20 20 0 0.000
16 20 16 0.800
34 40 36 0.900
82 80 4 0.050
84 80 16 0.200
f f 2
10.883
of e
e
4. Do not reject Ho; because 10.883 is less than 16.81.
Voluntary absence and marital status are independent.
3
3. The personnel administrator of XYZ Company provided the following data as an example of selection
among 40 male and 40 female applicants for 12 open positions.
Applicant Status
Selected Not selected Total
Male 7 33 40
Female 5 35 40
Total 12 68 80
a. The X2 test of independence was suggested as a way of determining if the decision to hire 7 malls
and females should be interpreted as having a selection bias in favor of males. Conduct the test of
independence using = 0.10. What is your conclusion?
b. Using the same test, would the decision to hire 8 malls and 4 females suggest concern for a selection
bias?
c. How many males could be hired for the 12 open positions before the procedure would concern for a
selection bias?
Solution
a.
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the applicant are
independent).
Ha: There is selection bias in favor of males. (Selection status and gender of the applicant are not
independent).
2. = 0.1
V = (R-1) (C-1)
= (2-1) (2-1) = 1
X ,ν= X2 0.1,1 = 2.71
2
Reject Ho if sample X2 > 2.71
3. Sample X2
Observed freq Expected Freq (fo-fe)2 f o f e 2
(fo) (fe)
fe
7 6 1 0.1667
33 34 1 0.0294
5 6 1 0.1667
35 34 1 0.0294
fo fe 2
0.3922
f
e
4. Do not reject Ho; because 0.392 is less than 2.71.
There is no selection bias in favor of male applicants.
b.
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the applicant are
independent).
4
Ha: There is selection bias in favor of males. (Selection status and gender of the applicant are not
independent).
2. = 0.1
V = (R-1) (C-1)
= (2-1) (2-1) = 1
X ,ν= X2 0.1,1 = 2.71
2
Reject Ho if sample X2 > 2.71
3. Sample X2
Observed freq Expected Freq (fo-fe)2 f o f e 2
(fo) (fe)
fe
8 6 4 0.6667
32 34 4 0.1176
4 6 4 0.6667
36 34 4 0.1176
fo fe 2
1.5686
f
e
4. Do not reject Ho; because 1.569 is less than 2.71.
There is no selection bias in favor of male applicants.
c. There is no shortcut method to answer this question. Therefore, lets try by increasing the number of male
applicants who are accepted and decreasing the number of female applicants who are females.
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the applicant are
independent).
Ha: There is selection bias in favor of males. (Selection status and gender of the applicant are not
independent).
2. = 0.1
V = (R-1) (C-1)
= (2-1) (2-1) = 1
X2 ,ν= X2 0.1,1 = 2.71
Reject Ho if sample X2 > 2.71
3. Sample X2
Observed freq Expected Freq (fo-fe)2 f o f e 2
(fo) (fe)
fe
9 6 9 1.5000
31 34 9 0.2647
3 6 9 1.5000
37 34 9 0.2647
f f 2
3.5294
of e
e
4. Reject Ho; because 3.5294 is less than 2.71.
Therefore, 8 male and 4 female applicants must be hired for the 12 open positions so as to avoid
selection bias in favor of males.
5
The Chi-square test for independence is useful in helping to determine whether a relationship exists
between two variables, but it does not enable us to estimate or predict the values of one variable based
on the value of the other. If it is determined that a dependence does exist between two quantitative
variables, then the techniques of regression analysis are useful in helping to find a mathematical
formula that expresses the nature of mathematical relationship.
Small expected frequencies can lead to inordinately large chi-square values with the chi-square test of
independence. Hence contingency tables should not be used with expected cell values of less than 5.
One way to avoid small expected values is to combine columns or rows whenever possible and
whenever doing so makes sense.
4.1.2. GOODNESS-OF-FIT TESTS (BINOMIAL, NORMAL, POISSON)
The chi-square test is widely used for a variety of analyses. One of the more important uses of Chi-
Square is the goodness-of-fit test. That is, it can be used to decide whether a particular probability
distribution, such as the binomial, Poisson or normal, is the appropriate distribution. This is an
important ability, because as decision makers using statistics, we will need to choose a certain
probability distribution to represent the distribution of the data we happen to be considering.
In tests of hypothesis (Chapter 5), we assumed that the population was normal and tested the hypothesis
=o, p = Po, etc. But what if we want to check on the assumption of normality it self? The
multinomial χ2 goodness–of–fit test can be applied.
The null hypothesis for a goodness-off it test in that the distribution of the population from which a
sample is taken is the one specified. The alternative hypothesis is that the actual distribution is not the
specified distribution. Generally, a researcher specifies only the name of distribution and uses the
sample data to estimate the particular parameters of the distribution. In this situation one degree of
freedom is lest for each parameter that has to be estimated. However, if the research completely
specifies the distribution including parameter values, then no additional degrees of freedom is lost.
Null hypothesis Parameters to be Degrees of
estimated freedom lost
Ho: Population is normal , 2
Ho: Population is normal with x 1
Ho: Population is normal with = y 1
Ho: Population is normal with x, = y None 0
Ho: Population is Poisson λ 1
Ho: Population is Poisson with λ=Z None 0
Ho: Population is binomial with p = b None 0
Example (Binomial)
1. Mrs. Tsion, Saleswoman for MOON Paper Company, has five accounts to visit per day. It is suggested
that sales by Mrs. Tsion May be described by the binomial distribution, with the probability of selling
each account being 0.4. Given the following frequency distribution of Mrs. Tsion’s number of sales per
day, can we conclude that the data do in fact follow the binomial distribution? Use the 0.05 significance
level.
6
No. of sales day 0 1 2 3 4 5
Frequency 10 41 60 20 6 3
Solution
1. Ho: The frequency distribution is Binomial with n = 5 and P = 0.4
Ha: The frequency distribution is not binomial with n = 5 and P = 0.4
2. = 0.05
K-1 –m = 5-1-0 = 4
X2, ν = X2 0.05,4 = 9.49
Reject Ho if sample x2 is greater than 9.49
3. Sample χ2.
No. of sales Prob. with n= Observed Expected Freq (fo-fe)2 f o f e 2
per day 5, p = 0.4 freq (fo) (fe = npi)
fe
0 .0778 10 10.892 0.7957 0.0731
1 .2592 41 36.288 22.2029 0.6119
2 .3456 60 48.384 134.9315 2.7888
3 .2304 20 32.256 150.2095 4.6567
4&5 .0870 9 12.18 10.1124 0.8302
f o f e 2 8.9607
f
e
4. Do not reject Ho. The data are well described by the binomial distribution with n=5 and P=0.4.
2. A professional baseball player, Philippos, was at bat five times in each of 100 games. Philippos claims
that he has a probability of 0.4 of getting a hit each time he goes to bat. Test his claim at the 0.05 level
by seeing if the following data are distributed binomially.
No. of hits / game 0 1 2 3 4 5
No. of games with that number of hits 12 38 27 17 5 1
Solution
1. Ho: The freq. Distribution can be best described by binomial distribution with n=5, P=0.4
Ha: The freq. Distribution can’t be best described by binomial distribution with n=5, P=0.4
2. = 0.05
V = K-1 –m = 5-1-0 = 4
X2,ν = X2 0.05,4 = 9.49
Reject Ho if sample χ2 > 9.49
3. Sample χ2
7
No. of hits No. of games with Prob. with Expected freq (fo-fe)2 f o f e 2
per game that no. of hit (fo) n=5, P=0.4 (fe = npi)
fe
0 12 .0778 7.78 17.8084 2.2890
1 38 .2592 25.92 145.9264 5.6249
2 27 .3456 34.56 57.1536 1.6538
3 17 .2304 23.04 36.4816 1.5834
4&5 6 .0870 8.70 4.2900 0.8379
f o f e 2 11.9940
f
e
4. Reject Ho. The # of hit over the same in not binomially distributed
3. The Ethiopian postal service is interested in modeling the “mangled letter” problem. It has been
suggested that any letter sent to a certain area has a 0.15 chance of being mangled. Since the post office
is so big, it can be assumed that two letters chances of being mangled are independent. A sample of 310
people was selected, and two test letters were mailed to each of them. The number of people receiving
zero, one, or two mangled letters was 260, 40, and 10, respectively. At the 0.10 level of significance, is
it reasonable to conclude that the number of mangled letters received by people follows a binomial
distribution with P = 0.15?
Solution
1. Ho: The number of mangled letters received by people follows a binomial distribution with n = 2, p
= 0.15.
Ha: The number of mangled letters received by people doesn’t follow a binomial distribution. With n
=2, P = 0.15.
2. = 0.1
V = K-1 – m = 3-1-0 = 2
X2, ν = X2 0.1,2 = 4.61
Reject Ho if sample x2 > 4.61
3. Sample χ2
No. of mangled Observed Prob. with Expected freq (fo-fe)2 f o f e 2
letters freq. (fo) n=2 P=0.15 (fe = npi)
fe
0 260 0.7225 223.9750 1297.8006 5.7944
1 40 0.2550 79.0500 1524.9025 19.2904
2 10 0.0225 6.9750 9.1506 1.3119
f f 2
26.3967
of e
e
4. Reject Ho. The number of hit over the game is not binomially distributed with n = 2 and P = 0.15.
Example (Poisson)
1. It is hypothesized that the number of breakdowns per month of a computer system at a major university
follows a Poisson distribution with μ = 2. The data below show the observed number of breakdowns per
month during a sample of 100 months. Use a 5% level of significance and test the null hypothesis.
8
Breakdowns 0 1 2 3 4 5 and above
Observed freq. 14 20 34 22 5 3
Solution
1. Ho: The population distribution of breakdowns is Poisson with μ = 2.
Ha: The population distribution of breakdowns is not Poisson with μ = 2.
2. = 0.05
V = K-1 – m = 6-1-0 = 5
X2, ν = X2 0.05,5 = 11.07
Reject Ho if sample x2 > 11.07
3. Sample χ2
Breakdowns Observed Prob. with Expected freq (fo-fe)2 f o f e 2
freq. (fo) λ=2 (fe = npi)
fe
0 14 0.1353 13.53
0.2209 0.0163
1 20 0.2707 27.07
49.9849 1.8465
2 34 0.2707 27.07
48.0249 1.7741
3 22 0.1804 18.04
15.6816 0.8693
4 5 0.0902 9.02
16.1604 1.7916
5 or more 5 0.0527 5.27
0.0729 0.0138
f o f e 2 6.3117
f
e
4. Do not Reject Ho. The number of breakdowns per month of a computer system at the university
follows a Poisson distribution with μ = 2.
2. Suppose that a teller supervisor believes that the distribution of random arrivals at a local bank is
Poisson and sets out to test this hypothesis by gathering information. The following data represent a
distribution of frequency of arrivals during one minute intervals at a bank. Use α = 0.05 to test these data
in an effort to determine whether they are Poisson distributed.
No. of arrivals 0 1 2 3 4 5 and above
Observed freq. 7 18 25 17 12 5
Solution
Before we solve the question, first we have to compute the arrival rate per minute, and hence one degree
of freedom is lost.
number of arrivals *
observed frequency 0 * 7 18 *1 25 * 2 17 * 3 12 * 4 5 * 5 192
2.3 cust / min
Observed frequency 84 84
1. Ho: The arrival of customers at a bank is Poisson distributed with λ = 2.3
Ha: The arrival of customers at a bank is not Poisson distributed with λ = 2.3
2. = 0.05
V = K-1 – m = 6-1-1 = 4
X2, ν = X2 0.05,4 = 11.07
9
Reject Ho if sample χ2 > 9.488
3. Sample χ2
Number of Observed Prob. with Expected freq (fo-fe)2 f o f e 2
arrivals freq. (fo) λ=2.3 (fe = npi)
fe
0 7 0.1003 8.4252
2.0312 0.2411
1 18 0.2306 19.3704
1.8778 0.0969
2 25 0.2652 22.2768
7.4158 0.3329
3 17 0.2033 17.0772
0.0060 0.0003
4 12 0.1169 9.8196
4.7541 0.4841
5 or more 5 0.0837 7.0308
4.1241 0.5866
f o f e 2 1.795
f
e
4. Do not Reject Ho. The arrival of customers at a bank follows a Poisson distribution with λ = 2.3.
3. The number of automobile accidents occurring per day in a particular city is believed to have a poisson
distribution. A sample of 80 days during the past year gives the data shown below. Do the data support
the belief that the number of accidents per day has a poisson distribution? Use α = 0.05.
No. of accidents 0 1 2 3 4
Observed freq. (days) 34 25 11 7 3
Solution
Before we solve the question, first we have to compute the occurrence rate per day, and hence one
degree of freedom is lost.
number of accidents *
observed frequency
0 * 34 25 * 1 11 * 2 7 * 3 3 * 4 80 1accident / day
Observed frequency 80 80
1. Ho: The occurrence of accidents per day follows a poisson distribution with λ = 1.0
Ha: The occurrence of accidents per day does not follow a poisson distribution with λ = 1.0
2. = 0.05
V = K-1 – m = 4-1-1 = 2
X2, ν = X2 0.05,2 = 5.99
Reject Ho if sample χ2 > 5.99
3. Sample χ2
Number of Observed Prob. with Expected freq (fo-fe)2 f o f e 2
accidents freq. (fo) λ=1.0 (fe = npi)
fe
0 34 0.3679 29.4320 20.8666 0.7090
1 25 0.3679 29.4320 19.6426 0.6674
2 11 0.1839 14.7120 13.7789 0.9366
3 or more 10 0.0803 6.4240 12.7878 1.9906
fo fe 2
4.3036
fe
10
4. Do not Reject Ho. The occurrence of accidents per day follows a poisson distribution with λ = 1.0
Example (Normal)
1. Suppose that Ato Paulos developed an overall attitude scale to determine how his company’s employees
feel toward their company. In theory the scores can vary from 0 to 50. Ato Paulos pretests his
measurement instrument on a randomly selected group of 100 employees. He tallies the scores and
summarizes them into six categories as shown below. Are these pretest scores approximately normally
distributed with μ = 24.9 and σ = 7.194? Use α = 0.05.
Score category 10-15 15-20 20-25 25-30 30-35 35-40
Frequency 11 14 24 28 13 10
Solution
1. Ho: The attitude scores are normally distributed with μ = 24.9 and σ = 7.194
Ha: The attitude scores are not normally distributed with μ = 24.9 and σ = 7.194
2. = 0.05
V = K-1 – m = 6-1-0 = 5
X2, ν = X2 0.05,5 = 11.07
Reject Ho if sample χ2 > 11.07
3. Sample χ2
X
With Z , the expected probability of each category can be obtained as follows:
For category 10-15 Probability
10 24.9 0.48077
z10 2.07
7.194
15 24.9 - 0.41621
z15 1.38
7.194
Expected probability 0.06456
For category 15-20 Probability
15 24.9 0.41621
z15 1.38
7.194
20 24.9 - 0.25175
z 20 0.68
7.194
Expected probability 0.16446
For category 20-25 Probability
20 24.9 0.25175
z 20 0.68
7.194
25 24.9 +0.00399
z 25 0.01
7.194
Expected probability 0.25574
For category 25-30 Probability
11
25 24.9 0.00399
z 25 0.01
7.194
30 24.9 +0.26115
z 30 0.71
7.194
Expected probability 0.25716
For category 30-35 Probability
30 24.9 0.26115
z 30 0.71
7.194
35 24.9 0.41924
z 35 1.40
7.194
Expected probability 0.25716
For category 30-35 Probability
35 24.9 0.41924
z 35 1.40
7.194
40 24.9 0.48214
z 40 2.10
7.194
Expected probability 0.06290
The six probabilities do not sum to 1.00. Even though observed frequencies were obtained only for these
six categories, getting a score less than 10 or greater than 40 was also possible. Because 0.5 of the
probabilities lie in each half of a normal distribution and utilizing the sum of expected probabilities on
each side of the mean, 24.9, we can obtain a probability of the < 10 category: 0.5 – (0.06456 + 0.16446
+ 0.25175) = 0.01923. Similarly, we can obtain the probability of >40 category: 0.5 – (0.00399 +
0.25716 + 0.15809 + 0.06290) = 0.01786. expected frequencies can then be obtained by multiplying
each expected probability by the total frequency (100), as shown below.
Score category Probability Expected freq
(fe = npi)
< 10 0.01923 1.923
10 – 15 0.06456 6.456 8.379
15 – 20 0.16446 16.446 16.446
20 – 25 0.25574 25.574 25.574
25 - 30 0.25716 25.716 25.716
30 -35 0.15809 15.809 15.809
35 – 40 0.06290 6.290 8.076
> 40 0.01786 1.786
As the < 10 and > 40 categories have values of less than 5, each must be combined with the adjacent
category. As a result, the < 10 category becomes part of the 10 – 15 category and the > 40 category
becomes part of the 35 – 40 category.
12
Expected freq
Score category Probability (fe = npi)
10 – 15 0.08379 8.379
15 – 20 0.16446 16.446
20 – 25 0.25574 25.574
25 - 30 0.25716 25.716
30 -35 0.15809 15.809
35 – 40 0.08076 8.076
The value of the chi-square can then be computed.
Score Observed Expected freq
2
f o f e 2
category freq. (fo) Probability (fe = npi) (fo-fe)
fe
10 – 15 11 0.08379 8.379 6.8696 0.8199
15 – 20 14 0.16446 16.446 5.9829 0.3638
20 – 25 24 0.25574 25.574 2.4775 0.0964
25 - 30 28 0.25716 25.716 5.2167 0.2029
30 -35 13 0.15809 15.809 7.8905 0.4991
35 – 40 10 0.08076 8.076 3.7018 0.4584
f o f e 2 2.4409
f
e
4. Do not Reject Ho. The attitude score are normally distributed with mean 24.9 and standard deviation
7.194.
2. The director of a major soccer team believes that the ages of purchasers of game tickets are normally
distributed. If the following data represent the distribution of ages for a sample of observed purchasers
of major soccer game tickets, use the chi-square goodness-of-fit test to determine whether this
distribution is significantly different from the normal distribution. Assume that α = 0.05.
Age of purchaser 10-20 20-30 30-40 40-50 50-60 60-70
Frequency 16 44 61 56 35 19
Solution
1. Ho: The ages of purchasers of soccer game tickets are normally distributed.
Ha: The ages of purchasers of soccer game tickets aren’t normally distributed
2. = 0.05
V = K-1 – 2 = 6-1-2 = 3
X2, ν = X2 0.05,3 = 7.81
Reject Ho if sample χ2 > 7.81
3. Sample χ2
13
Age category Observed freq Mid point (M) fm fm2
10-20 16 15 240 3,600
20-30 44 25 1,100 27,500
30-40 61 35 2,135 74,725
40-50 56 45 2,520 113,400
50-60 35 55 1,925 105,875
60-70 19 65 1,235 80,275
231 fm = 9, 155 fm
2
= 405,375
X
fm 9,155 39.63
n 231
fm 2
9,1552
fm 2
n
405,375
231
S 13.60
n 1 231 1
X
With Z , the expected probability of each category can be obtained as follows:
For category 10-20 Probability
10 39.63 0.48537
z10 2.18
13.6
20 39.63 - 0.42507
z 20 1.44
13.6
Expected probability 0.06030
For category 20-30 Probability
20 39.63 0.42507
z 20 1.44
13.6
30 39.63 - 0.26115
z 30 0.71
13.6
Expected probability 0.16392
For category 30-45 Probability
30 39.63 0.26115
z 30 0.71
13.6
40 39.63 + 0.01197
z 40 0.03
13.6
Expected probability 0.27312
For category 40-50 Probability
50 39.63 0.27637
z 50 0.76
13.6
14
40 39.63 - 0.01197
z 40 0.03
13.6
Expected probability 0.26440
For category 50-60 Probability
60 39.63 0.43319
z 60 1.50
13.6
50 39.63 - 0.27637
z 50 0.76
13.6
Expected probability 0.15682
For category 60-70 Probability
70 39.63 0.48713
z 70 2.23
13.6
60 39.63 - 0.43319
z 60 1.50
13.6
Expected probability 0.05394
The six probabilities do not sum to 1.00. Even though observed frequencies were obtained only for these
six categories, getting a score less than 10 or greater than 70 is also possible.
For < 10:
Probability between 10 and the mean = 0.06030 + 0.16392 + 0.26115 = 0.48537.
Probability < 10 = 0.5 – 0.48537 = 0.01463
For > 70
Probability between 70 and the mean = 0.05394 + 0.15682 + 0.2640 + 0.01197 = 0.48713
Probability > 70 = 0.5 – 0.48713 = 0.01287
Then, the expected frequencies can be obtained by multiplying each expected probability by the total
frequency (231) as follows:
Age category Probability Expected freq
(fe = npi)
< 10 0.01463 3.380
10 – 20 0.06030 13.929 17.309
20 – 30 0.16392 37.866 37.866
30 – 40 0.27312 63.091 63.091
40 - 50 0.26440 61.076 61.076
50 – 60 0.15682 36.225 36.225
60 – 70 0.05394 12.460
> 70 0.01287 2.973 15.433
Since the < 10 and > 70 categories have values of less than 5, each must be combined with the adjacent
category. As a result, the < 10 category becomes part of the 10 – 20 category and the > 70 category
becomes part of the 60 – 70 category.
15
Age category Probability Expected freq
(fe = npi)
10 – 20 0.07493 17.309
20 – 30 0.16392 37.866
30 – 40 0.27312 63.091
40 - 50 0.26440 61.076
50 – 60 0.15682 36.225
60 – 70 0.06681 15.433
The value of the chi-square can then be computed.
Score Observed Expected freq
2
f o f e 2
category freq. (fo) Probability (fe = npi) (fo-fe)
fe
10 – 20 16 0.07493 17.309 1.7135 0.0990
20 – 30 44 0.16392 37.866 37.6260 0.9937
30 – 40 61 0.27312 63.091 4.3723 0.0693
40 - 50 56 0.26440 61.076 25.7658 0.4219
50 – 60 35 0.15682 36.225 1.5006 0.0414
60 – 70 19 0.06681 15.433 12.7235 0.8244
f o f e 2 2.4497
f
e
4. Do not Reject Ho. The age of purchasers of soccer game tickets are normally distributed.
3. The instructor for Introductory Statistics course attempts to construct the final examination so that the
grades are normally distributed with a mean of 65. From the sample of grades appearing in the
accompanying frequency distribution table, can you conclude that they have achieved his objective? Use
α = 0.05.
Grade 30-40 40-50 50-60 60-70 70-80 80-90
Frequency 4 17 29 49 33 18
Solution
1. Ho: The grades of students are normally distributed with a mean of 65.
Ha: The grades of students are not normally distributed with a mean of 65.
2. = 0.05
V = K-1 – 1 = 5-1-1 = 3
X2, ν = X2 0.05,3 = 7.81
Reject Ho if sample χ2 > 7.81
3. Sample χ2
Grade Observed freq Mid point (M) fm fm2
30-40 4 35 140 4,900
40-50 17 45 765 34,425
50-60 29 55 1,595 87,725
60-70 49 65 3,185 207,025
70-80 33 75 2,475 185,625
16
80-90 18 85 1,530 130,050
150 fm = 9, 690 fm
2
= 649,750
fm 2
9,6902
fm 2
n
649,750
150
S 12.63
n 1 150 1
X
With Z , the expected probability of each category can be obtained as follows:
For category 30-40 Probability
30 65 0.49720
z30 2.77
12.63
40 65 - 0.47615
z40 1.98
12.63
Expected probability 0.02105
For category 40-50 Probability
40 65 0.47615
z40 1.98
12.63
50 65 - 0.38298
z50 1.19
12.63
Expected probability 0.09317
For category 50-60 Probability
50 65 0.38298
z50 1.19
12.63
60 65 - 0.15542
z60 0.40
12.63
Expected probability 0.22756
For category 60-70 Probability
60 65 0.15542
z60 0.40
12.63
70 65 + 0.15542
z70 0.40
12.63
Expected probability 0.31084
For category 70-80 Probability
80 65 0.38298
z80 1.19
12.63
70 65 - 0.15542
z70 0.40
12.63
17
Expected probability 0. 22756
For category 80-90 Probability
90 65 0.47615
z90 1.98
12.63
80 65 - 0.38298
z80 1.19
12.63
Expected probability 0.09317
The six probabilities do not sum to 1.00. Even though observed frequencies were obtained only for these
six categories, getting a score less than 30 or greater than 90 is also possible.
For < 30:
Probability between 30 and the mean = 0.02105 + 0.09317 + 0.22756 + 0.15542 = 0.49720.
Probability < 10 = 0.5 – 0.49720 = 0.00280
For > 90
Probability between 90 and the mean = 0.15542 + 0.22756 + 0.09317 = 0.47615
Probability > 90 = 0.5 – 0.47615 = 0.02385
Then, the expected frequencies can be obtained by multiplying each expected probability by the total
frequency (150) as follows:
Age category Probability Expected freq
(fe = npi)
< 30 0.00280 0.42
30-40 0.02105 3.1575 17.553
40-50 0.09317 13.9755
50-60 0.22756 34.134 34.134
60-70 0.31084 46.626 46.626
70-80 0.22756 34.134 34.134
80-90 0.09317 13.9755
> 90 0.02385 3.5775 17.553
Since the < 30, 30-40 and > 90 categories have values of less than 5, they must be combined with the
adjacent categories. As a result, the < 30 and 30-40 categories become part of the 40 – 50 category; and
the > 90 category becomes part of 80-90 category.
Age category Probability Expected freq
(fe = npi)
40-50 0.11702 17.553
50-60 0.22756 52.5664
60-70 0.31084 71.8040
70-80 0.22756 52.5664
80-90 0.11702 17.553
18
The value of the chi-square can then be computed.
Score Observed Expected freq
2
f o f e 2
category freq. (fo) Probability (fe = npi) (fo-fe)
fe
40-50 21 0.11702 17.5530 11.8818 0.6769
50-60 29 0.22756 34.1340 26.3580 0.7722
60-70 49 0.31084 46.6260 5.6359 0.1209
70-80 33 0.22756 34.1340 1.2860 0.0377
80-90 18 0.11702 17.5530 0.1998 0.0114
f o f e 2
f 1.6190
e
4. Do not Reject Ho. YES. The grades of students are normally distributed with a mean of 65.
19