0% found this document useful (0 votes)
22 views19 pages

Chi-Square Distribution Analysis Guide

The document discusses Chi-square distributions, highlighting their characteristics, applications, and the process for conducting Chi-square tests for independence and goodness-of-fit tests. It explains how to analyze data using contingency tables to determine relationships between variables, as well as the implications of small expected frequencies. Additionally, it outlines the null and alternative hypotheses for various tests and provides examples of practical applications in different scenarios.

Uploaded by

seadkelil45
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views19 pages

Chi-Square Distribution Analysis Guide

The document discusses Chi-square distributions, highlighting their characteristics, applications, and the process for conducting Chi-square tests for independence and goodness-of-fit tests. It explains how to analyze data using contingency tables to determine relationships between variables, as well as the implications of small expected frequencies. Additionally, it outlines the null and alternative hypotheses for various tests and provides examples of practical applications in different scenarios.

Uploaded by

seadkelil45
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CHAPTER FOUR

CHI-SQUARE DISTRIBUTIONS
A Chi-square (x2) distribution is a continuous distribution ordinarily derived as the sampling distribution
of a sum of squares of independent standard normal variables.
Characteristics of the square distributions
1. It is a continuous distribution
2. The X2 dist has a single parameter; the degree of freedom, ν
3. The mean of the chi-square distribution is ν
4. The variance of the chi-square distribution is 2ν. Thus, the mean and Variance depend on the degree
of freedom.
5. It is based on a comparison of the sample of observed data (results) with the expected results under
the assumption that the null hypothesis is true.
6. It is a skewed distribution and only non negative values of the variable X2 are possible. The
skewness decreases as ν increases; and when V increases without limit it approaches a normal
distribution. It extends indefinitely in the positive direction
7. The area under the curve is 1.0

Having the above characteristics, X2 dist has the following areas of application:
1. Test for independence between two variables
2. Goodness of fit tests (Binomial, Normal, and Poisson )
3. Testing for the equality of several proportions

4.1.1. TEST FOR THE INDEPENDENCE BETWEEN TWO VARIABLES

A X2 test of independence is used to analyze the frequencies of two variables with multiple categories to
determine whether the two variables are independent. That is, the Chi-square distribution involves using
sample data to test for the independence of two variables. The sample data are given in to a two way
table called a contingency table. Because the X2 test of independence uses a contingency table, the test
is sometimes referred to as CONTINGENCY ANALYSIS (Contingency table test). The X2 test is used
to analyze, for example, the following cases:
 Whether employee absenteeism is independent of job classification
 Whether beer preference is independent of sex (gender)
 Whether favorite sport is independent of nationality.
 Whether type of financial investment is independent of geographic region.
The steps and procedures are similar with hypothesis testing.

Example:
1. A company planning a TV advertising campaign wants to determine which TV shows its target audience
watches and thereby to know whether the choice of TV program an individual watches is independent of
the individuals income. The table supporting this is shown below. Use a 5% level of significance and
the null hypothesis.

1
Income Type of Show
Basketball Movie News Total
Low 143 70 37 250

Medium 90 67 43 200
High 17 13 20 50

Total 250 150 100 500


Solution
1. Ho: Choice of TV program an individual watches is independent of the individuals income
Ha: Income and Choice of TV program are not independent
2. Decision rule
 = 0.05
ν = (R-1) (C-1)1*
= (3-1) (3-1)
=4
X , ν = X20.05, 4 = 9.49
2

Reject Ho if sample X2 is greater than 9.49


3. Compute the test statistic
In computing the test statistic our first task is to estimate the expected frequencies (eij = rioj/n); where
ri = Observed freq total for row i.
Cj = observed freq total for column j
n = sample size
e11 = 250x250/500 = 125 e21 = 200 x 250/500 = 100 e31 = 50 x 250/500 =25
e12= 250x150/500 = 75 e22 = 200 x 150/500=60 e32 = 50 x 150/500=15
e13 = 250x100/500 =50 e23 = 200x100/500 =40 e33 = 50x100/500 =10

A test of the null hypothesis that variables are independent of one another is based on the magnitudes of
the differences between the observed frequencies and the expected frequencies. Large differences
between oij and eij provide evidence that the null hypothesis is false. The test is based on the following
Chi-square test statistic.
Oij  eij 2  f o  f e 2
 
2
Or   
2

eij fe
Where:
Oij (fo) = observed frequency for contingency table category in row i and column j.
Eij (fe) = expected frequency for contingency table in row i and column j.

1
For the RxC contingency table, the degrees of freedom are calculated as (R-1) (C-1). The degrees of freedom
refers to the number of expected frequencies that can be chosen freely provided the row and column totals of
expected frequencies are identical to the row and column totals of the observed frequency table.

2
2 
143  1252  70  752  37  502  90  1002  67  602  17  252  13  152
125 75 50 100 60 25 15


20  10 2

43  40
 21.174
2

10 40
4. Reject the null hypothesis that choice of TV program is independent from income level.
2. A human resource manager at EAGLE Inc. was interested in knowing whether the voluntary absence
behavior of the firm’s employees was independent of marital status. The employee files contained data
on marital status and on voluntary absenteeism behavior for a sample of 500 employees is shown below.
Marital Status
Absence behavior Married Divorced Widowed Single Total
Often absent 36 16 14 34 100
Seldom absent 64 34 20 82 200
Never absent 50 50 16 84 200
Total 150 100 50 200 500

Test the hypothesis that absence behavior is independent of marital status at a significance level of 1%.
Solution
1. Ho: Voluntary absence behavior is independent of marital status
Ha: Voluntary absence behavior and marital status are dependent
2.  = 0.01
V = (R-1) (C-1)
= (3-1) (4-1) = 6
X2 ,ν= X2 0.01,6 = 16.81
Reject Ho if sample X2 > 16.81
3. Sample X2
Observed freq Expected Freq (fo-fe)2  f o  f e 2
(fo) (fe)
fe
36 30 36 1.200
64 60 16 0.267
50 60 100 1.667
16 20 16 0.800
34 40 36 0.900
50 40 100 2.500
14 10 16 1.600
20 20 0 0.000
16 20 16 0.800
34 40 36 0.900
82 80 4 0.050
84 80 16 0.200
f  f  2
10.883
 of e
e

4. Do not reject Ho; because 10.883 is less than 16.81.


Voluntary absence and marital status are independent.

3
3. The personnel administrator of XYZ Company provided the following data as an example of selection
among 40 male and 40 female applicants for 12 open positions.

Applicant Status
Selected Not selected Total
Male 7 33 40
Female 5 35 40
Total 12 68 80

a. The X2 test of independence was suggested as a way of determining if the decision to hire 7 malls
and females should be interpreted as having a selection bias in favor of males. Conduct the test of
independence using = 0.10. What is your conclusion?
b. Using the same test, would the decision to hire 8 malls and 4 females suggest concern for a selection
bias?
c. How many males could be hired for the 12 open positions before the procedure would concern for a
selection bias?

Solution
a.
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the applicant are
independent).
Ha: There is selection bias in favor of males. (Selection status and gender of the applicant are not
independent).
2.  = 0.1
V = (R-1) (C-1)
= (2-1) (2-1) = 1
X ,ν= X2 0.1,1 = 2.71
2

Reject Ho if sample X2 > 2.71


3. Sample X2
Observed freq Expected Freq (fo-fe)2  f o  f e 2
(fo) (fe)
fe
7 6 1 0.1667
33 34 1 0.0294
5 6 1 0.1667
35 34 1 0.0294
 fo  fe  2
0.3922
 f
e

4. Do not reject Ho; because 0.392 is less than 2.71.


There is no selection bias in favor of male applicants.
b.
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the applicant are
independent).

4
Ha: There is selection bias in favor of males. (Selection status and gender of the applicant are not
independent).
2.  = 0.1
V = (R-1) (C-1)
= (2-1) (2-1) = 1
X ,ν= X2 0.1,1 = 2.71
2

Reject Ho if sample X2 > 2.71


3. Sample X2
Observed freq Expected Freq (fo-fe)2  f o  f e 2
(fo) (fe)
fe
8 6 4 0.6667
32 34 4 0.1176
4 6 4 0.6667
36 34 4 0.1176
 fo  fe  2
1.5686
 f
e

4. Do not reject Ho; because 1.569 is less than 2.71.


There is no selection bias in favor of male applicants.
c. There is no shortcut method to answer this question. Therefore, lets try by increasing the number of male
applicants who are accepted and decreasing the number of female applicants who are females.
1. Ho: There is no selection bias in favor of males. (Selection status and gender of the applicant are
independent).
Ha: There is selection bias in favor of males. (Selection status and gender of the applicant are not
independent).
2.  = 0.1
V = (R-1) (C-1)
= (2-1) (2-1) = 1
X2 ,ν= X2 0.1,1 = 2.71
Reject Ho if sample X2 > 2.71
3. Sample X2
Observed freq Expected Freq (fo-fe)2  f o  f e 2
(fo) (fe)
fe
9 6 9 1.5000
31 34 9 0.2647
3 6 9 1.5000
37 34 9 0.2647
f  f  2
3.5294
 of e
e

4. Reject Ho; because 3.5294 is less than 2.71.


Therefore, 8 male and 4 female applicants must be hired for the 12 open positions so as to avoid
selection bias in favor of males.

5
The Chi-square test for independence is useful in helping to determine whether a relationship exists
between two variables, but it does not enable us to estimate or predict the values of one variable based
on the value of the other. If it is determined that a dependence does exist between two quantitative
variables, then the techniques of regression analysis are useful in helping to find a mathematical
formula that expresses the nature of mathematical relationship.

Small expected frequencies can lead to inordinately large chi-square values with the chi-square test of
independence. Hence contingency tables should not be used with expected cell values of less than 5.
One way to avoid small expected values is to combine columns or rows whenever possible and
whenever doing so makes sense.

4.1.2. GOODNESS-OF-FIT TESTS (BINOMIAL, NORMAL, POISSON)

The chi-square test is widely used for a variety of analyses. One of the more important uses of Chi-
Square is the goodness-of-fit test. That is, it can be used to decide whether a particular probability
distribution, such as the binomial, Poisson or normal, is the appropriate distribution. This is an
important ability, because as decision makers using statistics, we will need to choose a certain
probability distribution to represent the distribution of the data we happen to be considering.

In tests of hypothesis (Chapter 5), we assumed that the population was normal and tested the hypothesis
=o, p = Po, etc. But what if we want to check on the assumption of normality it self? The
multinomial χ2 goodness–of–fit test can be applied.

The null hypothesis for a goodness-off it test in that the distribution of the population from which a
sample is taken is the one specified. The alternative hypothesis is that the actual distribution is not the
specified distribution. Generally, a researcher specifies only the name of distribution and uses the
sample data to estimate the particular parameters of the distribution. In this situation one degree of
freedom is lest for each parameter that has to be estimated. However, if the research completely
specifies the distribution including parameter values, then no additional degrees of freedom is lost.

Null hypothesis Parameters to be Degrees of


estimated freedom lost
Ho: Population is normal , 2
Ho: Population is normal with   x  1
Ho: Population is normal with  = y  1
Ho: Population is normal with   x,  = y None 0
Ho: Population is Poisson λ 1
Ho: Population is Poisson with λ=Z None 0
Ho: Population is binomial with p = b None 0

Example (Binomial)
1. Mrs. Tsion, Saleswoman for MOON Paper Company, has five accounts to visit per day. It is suggested
that sales by Mrs. Tsion May be described by the binomial distribution, with the probability of selling
each account being 0.4. Given the following frequency distribution of Mrs. Tsion’s number of sales per
day, can we conclude that the data do in fact follow the binomial distribution? Use the 0.05 significance
level.

6
No. of sales day 0 1 2 3 4 5
Frequency 10 41 60 20 6 3

Solution
1. Ho: The frequency distribution is Binomial with n = 5 and P = 0.4
Ha: The frequency distribution is not binomial with n = 5 and P = 0.4
2.  = 0.05
K-1 –m = 5-1-0 = 4
X2, ν = X2 0.05,4 = 9.49
Reject Ho if sample x2 is greater than 9.49
3. Sample χ2.

No. of sales Prob. with n= Observed Expected Freq (fo-fe)2  f o  f e 2


per day 5, p = 0.4 freq (fo) (fe = npi)
fe
0 .0778 10 10.892 0.7957 0.0731
1 .2592 41 36.288 22.2029 0.6119
2 .3456 60 48.384 134.9315 2.7888
3 .2304 20 32.256 150.2095 4.6567
4&5 .0870 9 12.18 10.1124 0.8302
 f o  f e 2 8.9607
 f
e

4. Do not reject Ho. The data are well described by the binomial distribution with n=5 and P=0.4.

2. A professional baseball player, Philippos, was at bat five times in each of 100 games. Philippos claims
that he has a probability of 0.4 of getting a hit each time he goes to bat. Test his claim at the 0.05 level
by seeing if the following data are distributed binomially.

No. of hits / game 0 1 2 3 4 5


No. of games with that number of hits 12 38 27 17 5 1

Solution
1. Ho: The freq. Distribution can be best described by binomial distribution with n=5, P=0.4
Ha: The freq. Distribution can’t be best described by binomial distribution with n=5, P=0.4
2.  = 0.05
V = K-1 –m = 5-1-0 = 4
X2,ν = X2 0.05,4 = 9.49
Reject Ho if sample χ2 > 9.49
3. Sample χ2

7
No. of hits No. of games with Prob. with Expected freq (fo-fe)2  f o  f e 2
per game that no. of hit (fo) n=5, P=0.4 (fe = npi)
fe
0 12 .0778 7.78 17.8084 2.2890
1 38 .2592 25.92 145.9264 5.6249
2 27 .3456 34.56 57.1536 1.6538
3 17 .2304 23.04 36.4816 1.5834
4&5 6 .0870 8.70 4.2900 0.8379
 f o  f e 2 11.9940
 f
e

4. Reject Ho. The # of hit over the same in not binomially distributed

3. The Ethiopian postal service is interested in modeling the “mangled letter” problem. It has been
suggested that any letter sent to a certain area has a 0.15 chance of being mangled. Since the post office
is so big, it can be assumed that two letters chances of being mangled are independent. A sample of 310
people was selected, and two test letters were mailed to each of them. The number of people receiving
zero, one, or two mangled letters was 260, 40, and 10, respectively. At the 0.10 level of significance, is
it reasonable to conclude that the number of mangled letters received by people follows a binomial
distribution with P = 0.15?
Solution
1. Ho: The number of mangled letters received by people follows a binomial distribution with n = 2, p
= 0.15.
Ha: The number of mangled letters received by people doesn’t follow a binomial distribution. With n
=2, P = 0.15.
2.  = 0.1
V = K-1 – m = 3-1-0 = 2
X2, ν = X2 0.1,2 = 4.61
Reject Ho if sample x2 > 4.61
3. Sample χ2
No. of mangled Observed Prob. with Expected freq (fo-fe)2  f o  f e 2
letters freq. (fo) n=2 P=0.15 (fe = npi)
fe
0 260 0.7225 223.9750 1297.8006 5.7944
1 40 0.2550 79.0500 1524.9025 19.2904
2 10 0.0225 6.9750 9.1506 1.3119
f  f  2
26.3967
 of e
e

4. Reject Ho. The number of hit over the game is not binomially distributed with n = 2 and P = 0.15.

Example (Poisson)
1. It is hypothesized that the number of breakdowns per month of a computer system at a major university
follows a Poisson distribution with μ = 2. The data below show the observed number of breakdowns per
month during a sample of 100 months. Use a 5% level of significance and test the null hypothesis.

8
Breakdowns 0 1 2 3 4 5 and above
Observed freq. 14 20 34 22 5 3
Solution
1. Ho: The population distribution of breakdowns is Poisson with μ = 2.
Ha: The population distribution of breakdowns is not Poisson with μ = 2.
2.  = 0.05
V = K-1 – m = 6-1-0 = 5
X2, ν = X2 0.05,5 = 11.07
Reject Ho if sample x2 > 11.07
3. Sample χ2
Breakdowns Observed Prob. with Expected freq (fo-fe)2  f o  f e 2
freq. (fo) λ=2 (fe = npi)
fe
0 14 0.1353 13.53
0.2209 0.0163
1 20 0.2707 27.07
49.9849 1.8465
2 34 0.2707 27.07
48.0249 1.7741
3 22 0.1804 18.04
15.6816 0.8693
4 5 0.0902 9.02
16.1604 1.7916
5 or more 5 0.0527 5.27
0.0729 0.0138
 f o  f e 2 6.3117
 f
e
4. Do not Reject Ho. The number of breakdowns per month of a computer system at the university
follows a Poisson distribution with μ = 2.

2. Suppose that a teller supervisor believes that the distribution of random arrivals at a local bank is
Poisson and sets out to test this hypothesis by gathering information. The following data represent a
distribution of frequency of arrivals during one minute intervals at a bank. Use α = 0.05 to test these data
in an effort to determine whether they are Poisson distributed.

No. of arrivals 0 1 2 3 4 5 and above


Observed freq. 7 18 25 17 12 5

Solution
Before we solve the question, first we have to compute the arrival rate per minute, and hence one degree
of freedom is lost.
 number of arrivals *
  observed frequency  0 * 7  18 *1  25 * 2  17 * 3  12 * 4  5 * 5 192
  
   2.3 cust / min
 Observed frequency 84 84

1. Ho: The arrival of customers at a bank is Poisson distributed with λ = 2.3


Ha: The arrival of customers at a bank is not Poisson distributed with λ = 2.3
2.  = 0.05
V = K-1 – m = 6-1-1 = 4
X2, ν = X2 0.05,4 = 11.07

9
Reject Ho if sample χ2 > 9.488
3. Sample χ2
Number of Observed Prob. with Expected freq (fo-fe)2  f o  f e 2
arrivals freq. (fo) λ=2.3 (fe = npi)
fe
0 7 0.1003 8.4252
2.0312 0.2411
1 18 0.2306 19.3704
1.8778 0.0969
2 25 0.2652 22.2768
7.4158 0.3329
3 17 0.2033 17.0772
0.0060 0.0003
4 12 0.1169 9.8196
4.7541 0.4841
5 or more 5 0.0837 7.0308
4.1241 0.5866
 f o  f e 2 1.795
 f
e
4. Do not Reject Ho. The arrival of customers at a bank follows a Poisson distribution with λ = 2.3.

3. The number of automobile accidents occurring per day in a particular city is believed to have a poisson
distribution. A sample of 80 days during the past year gives the data shown below. Do the data support
the belief that the number of accidents per day has a poisson distribution? Use α = 0.05.

No. of accidents 0 1 2 3 4
Observed freq. (days) 34 25 11 7 3

Solution
Before we solve the question, first we have to compute the occurrence rate per day, and hence one
degree of freedom is lost.
 number of accidents *
  observed frequency 
    0 * 34  25 * 1  11 * 2  7 * 3  3 * 4  80  1accident / day
 Observed frequency 80 80

1. Ho: The occurrence of accidents per day follows a poisson distribution with λ = 1.0
Ha: The occurrence of accidents per day does not follow a poisson distribution with λ = 1.0
2.  = 0.05
V = K-1 – m = 4-1-1 = 2
X2, ν = X2 0.05,2 = 5.99
Reject Ho if sample χ2 > 5.99
3. Sample χ2
Number of Observed Prob. with Expected freq (fo-fe)2  f o  f e 2
accidents freq. (fo) λ=1.0 (fe = npi)
fe
0 34 0.3679 29.4320 20.8666 0.7090
1 25 0.3679 29.4320 19.6426 0.6674
2 11 0.1839 14.7120 13.7789 0.9366
3 or more 10 0.0803 6.4240 12.7878 1.9906
 fo  fe  2
4.3036
 fe

10
4. Do not Reject Ho. The occurrence of accidents per day follows a poisson distribution with λ = 1.0

Example (Normal)
1. Suppose that Ato Paulos developed an overall attitude scale to determine how his company’s employees
feel toward their company. In theory the scores can vary from 0 to 50. Ato Paulos pretests his
measurement instrument on a randomly selected group of 100 employees. He tallies the scores and
summarizes them into six categories as shown below. Are these pretest scores approximately normally
distributed with μ = 24.9 and σ = 7.194? Use α = 0.05.

Score category 10-15 15-20 20-25 25-30 30-35 35-40


Frequency 11 14 24 28 13 10

Solution
1. Ho: The attitude scores are normally distributed with μ = 24.9 and σ = 7.194
Ha: The attitude scores are not normally distributed with μ = 24.9 and σ = 7.194
2.  = 0.05
V = K-1 – m = 6-1-0 = 5
X2, ν = X2 0.05,5 = 11.07
Reject Ho if sample χ2 > 11.07
3. Sample χ2
X 
With Z  , the expected probability of each category can be obtained as follows:

For category 10-15 Probability


10  24.9 0.48077
z10   2.07
7.194
15  24.9 - 0.41621
z15   1.38
7.194
Expected probability 0.06456

For category 15-20 Probability


15  24.9 0.41621
z15   1.38
7.194
20  24.9 - 0.25175
z 20   0.68
7.194
Expected probability 0.16446

For category 20-25 Probability


20  24.9 0.25175
z 20   0.68
7.194
25  24.9 +0.00399
z 25   0.01
7.194
Expected probability 0.25574

For category 25-30 Probability

11
25  24.9 0.00399
z 25   0.01
7.194
30  24.9 +0.26115
z 30   0.71
7.194
Expected probability 0.25716

For category 30-35 Probability


30  24.9 0.26115
z 30   0.71
7.194
35  24.9 0.41924
z 35   1.40
7.194
Expected probability 0.25716

For category 30-35 Probability


35  24.9 0.41924
z 35   1.40
7.194
40  24.9 0.48214
z 40   2.10
7.194
Expected probability 0.06290

The six probabilities do not sum to 1.00. Even though observed frequencies were obtained only for these
six categories, getting a score less than 10 or greater than 40 was also possible. Because 0.5 of the
probabilities lie in each half of a normal distribution and utilizing the sum of expected probabilities on
each side of the mean, 24.9, we can obtain a probability of the < 10 category: 0.5 – (0.06456 + 0.16446
+ 0.25175) = 0.01923. Similarly, we can obtain the probability of >40 category: 0.5 – (0.00399 +
0.25716 + 0.15809 + 0.06290) = 0.01786. expected frequencies can then be obtained by multiplying
each expected probability by the total frequency (100), as shown below.

Score category Probability Expected freq


(fe = npi)
< 10 0.01923 1.923
10 – 15 0.06456 6.456 8.379
15 – 20 0.16446 16.446 16.446
20 – 25 0.25574 25.574 25.574
25 - 30 0.25716 25.716 25.716
30 -35 0.15809 15.809 15.809
35 – 40 0.06290 6.290 8.076
> 40 0.01786 1.786

As the < 10 and > 40 categories have values of less than 5, each must be combined with the adjacent
category. As a result, the < 10 category becomes part of the 10 – 15 category and the > 40 category
becomes part of the 35 – 40 category.

12
Expected freq
Score category Probability (fe = npi)
10 – 15 0.08379 8.379
15 – 20 0.16446 16.446
20 – 25 0.25574 25.574
25 - 30 0.25716 25.716
30 -35 0.15809 15.809
35 – 40 0.08076 8.076

The value of the chi-square can then be computed.

Score Observed Expected freq


2
 f o  f e 2
category freq. (fo) Probability (fe = npi) (fo-fe)
fe
10 – 15 11 0.08379 8.379 6.8696 0.8199
15 – 20 14 0.16446 16.446 5.9829 0.3638
20 – 25 24 0.25574 25.574 2.4775 0.0964
25 - 30 28 0.25716 25.716 5.2167 0.2029
30 -35 13 0.15809 15.809 7.8905 0.4991
35 – 40 10 0.08076 8.076 3.7018 0.4584
 f o  f e 2 2.4409
 f
e

4. Do not Reject Ho. The attitude score are normally distributed with mean 24.9 and standard deviation
7.194.

2. The director of a major soccer team believes that the ages of purchasers of game tickets are normally
distributed. If the following data represent the distribution of ages for a sample of observed purchasers
of major soccer game tickets, use the chi-square goodness-of-fit test to determine whether this
distribution is significantly different from the normal distribution. Assume that α = 0.05.
Age of purchaser 10-20 20-30 30-40 40-50 50-60 60-70
Frequency 16 44 61 56 35 19

Solution
1. Ho: The ages of purchasers of soccer game tickets are normally distributed.
Ha: The ages of purchasers of soccer game tickets aren’t normally distributed
2.  = 0.05
V = K-1 – 2 = 6-1-2 = 3
X2, ν = X2 0.05,3 = 7.81
Reject Ho if sample χ2 > 7.81
3. Sample χ2

13
Age category Observed freq Mid point (M) fm fm2
10-20 16 15 240 3,600
20-30 44 25 1,100 27,500
30-40 61 35 2,135 74,725
40-50 56 45 2,520 113,400
50-60 35 55 1,925 105,875
60-70 19 65 1,235 80,275
231  fm = 9, 155  fm
2
= 405,375

X
 fm  9,155  39.63
n 231
 fm  2
9,1552
 fm 2

n
405,375 
231
S   13.60
n 1 231  1

X 
With Z  , the expected probability of each category can be obtained as follows:

For category 10-20 Probability


10  39.63 0.48537
z10   2.18
13.6
20  39.63 - 0.42507
z 20   1.44
13.6
Expected probability 0.06030

For category 20-30 Probability


20  39.63 0.42507
z 20   1.44
13.6
30  39.63 - 0.26115
z 30   0.71
13.6
Expected probability 0.16392

For category 30-45 Probability


30  39.63 0.26115
z 30   0.71
13.6
40  39.63 + 0.01197
z 40   0.03
13.6
Expected probability 0.27312

For category 40-50 Probability


50  39.63 0.27637
z 50   0.76
13.6

14
40  39.63 - 0.01197
z 40   0.03
13.6
Expected probability 0.26440

For category 50-60 Probability


60  39.63 0.43319
z 60   1.50
13.6
50  39.63 - 0.27637
z 50   0.76
13.6
Expected probability 0.15682

For category 60-70 Probability


70  39.63 0.48713
z 70   2.23
13.6
60  39.63 - 0.43319
z 60   1.50
13.6
Expected probability 0.05394

The six probabilities do not sum to 1.00. Even though observed frequencies were obtained only for these
six categories, getting a score less than 10 or greater than 70 is also possible.
For < 10:
Probability between 10 and the mean = 0.06030 + 0.16392 + 0.26115 = 0.48537.
Probability < 10 = 0.5 – 0.48537 = 0.01463
For > 70
Probability between 70 and the mean = 0.05394 + 0.15682 + 0.2640 + 0.01197 = 0.48713
Probability > 70 = 0.5 – 0.48713 = 0.01287

Then, the expected frequencies can be obtained by multiplying each expected probability by the total
frequency (231) as follows:
Age category Probability Expected freq
(fe = npi)
< 10 0.01463 3.380
10 – 20 0.06030 13.929 17.309
20 – 30 0.16392 37.866 37.866
30 – 40 0.27312 63.091 63.091
40 - 50 0.26440 61.076 61.076
50 – 60 0.15682 36.225 36.225
60 – 70 0.05394 12.460
> 70 0.01287 2.973 15.433

Since the < 10 and > 70 categories have values of less than 5, each must be combined with the adjacent
category. As a result, the < 10 category becomes part of the 10 – 20 category and the > 70 category
becomes part of the 60 – 70 category.

15
Age category Probability Expected freq
(fe = npi)
10 – 20 0.07493 17.309
20 – 30 0.16392 37.866
30 – 40 0.27312 63.091
40 - 50 0.26440 61.076
50 – 60 0.15682 36.225
60 – 70 0.06681 15.433

The value of the chi-square can then be computed.

Score Observed Expected freq


2
 f o  f e 2
category freq. (fo) Probability (fe = npi) (fo-fe)
fe
10 – 20 16 0.07493 17.309 1.7135 0.0990
20 – 30 44 0.16392 37.866 37.6260 0.9937
30 – 40 61 0.27312 63.091 4.3723 0.0693
40 - 50 56 0.26440 61.076 25.7658 0.4219
50 – 60 35 0.15682 36.225 1.5006 0.0414
60 – 70 19 0.06681 15.433 12.7235 0.8244
 f o  f e 2 2.4497
 f
e

4. Do not Reject Ho. The age of purchasers of soccer game tickets are normally distributed.
3. The instructor for Introductory Statistics course attempts to construct the final examination so that the
grades are normally distributed with a mean of 65. From the sample of grades appearing in the
accompanying frequency distribution table, can you conclude that they have achieved his objective? Use
α = 0.05.

Grade 30-40 40-50 50-60 60-70 70-80 80-90


Frequency 4 17 29 49 33 18

Solution
1. Ho: The grades of students are normally distributed with a mean of 65.
Ha: The grades of students are not normally distributed with a mean of 65.
2.  = 0.05
V = K-1 – 1 = 5-1-1 = 3
X2, ν = X2 0.05,3 = 7.81
Reject Ho if sample χ2 > 7.81
3. Sample χ2
Grade Observed freq Mid point (M) fm fm2
30-40 4 35 140 4,900
40-50 17 45 765 34,425
50-60 29 55 1,595 87,725
60-70 49 65 3,185 207,025
70-80 33 75 2,475 185,625

16
80-90 18 85 1,530 130,050
150  fm = 9, 690  fm
2
= 649,750

 fm  2
9,6902
 fm 2

n
649,750 
150
S   12.63
n 1 150  1
X 
With Z  , the expected probability of each category can be obtained as follows:

For category 30-40 Probability


30  65 0.49720
z30   2.77
12.63
40  65 - 0.47615
z40   1.98
12.63
Expected probability 0.02105

For category 40-50 Probability


40  65 0.47615
z40   1.98
12.63
50  65 - 0.38298
z50   1.19
12.63
Expected probability 0.09317

For category 50-60 Probability


50  65 0.38298
z50   1.19
12.63
60  65 - 0.15542
z60   0.40
12.63
Expected probability 0.22756

For category 60-70 Probability


60  65 0.15542
z60   0.40
12.63
70  65 + 0.15542
z70   0.40
12.63
Expected probability 0.31084

For category 70-80 Probability


80  65 0.38298
z80   1.19
12.63
70  65 - 0.15542
z70   0.40
12.63

17
Expected probability 0. 22756

For category 80-90 Probability


90  65 0.47615
z90   1.98
12.63
80  65 - 0.38298
z80   1.19
12.63
Expected probability 0.09317
The six probabilities do not sum to 1.00. Even though observed frequencies were obtained only for these
six categories, getting a score less than 30 or greater than 90 is also possible.
For < 30:
Probability between 30 and the mean = 0.02105 + 0.09317 + 0.22756 + 0.15542 = 0.49720.
Probability < 10 = 0.5 – 0.49720 = 0.00280
For > 90
Probability between 90 and the mean = 0.15542 + 0.22756 + 0.09317 = 0.47615
Probability > 90 = 0.5 – 0.47615 = 0.02385

Then, the expected frequencies can be obtained by multiplying each expected probability by the total
frequency (150) as follows:

Age category Probability Expected freq


(fe = npi)
< 30 0.00280 0.42
30-40 0.02105 3.1575 17.553
40-50 0.09317 13.9755
50-60 0.22756 34.134 34.134
60-70 0.31084 46.626 46.626
70-80 0.22756 34.134 34.134
80-90 0.09317 13.9755
> 90 0.02385 3.5775 17.553

Since the < 30, 30-40 and > 90 categories have values of less than 5, they must be combined with the
adjacent categories. As a result, the < 30 and 30-40 categories become part of the 40 – 50 category; and
the > 90 category becomes part of 80-90 category.
Age category Probability Expected freq
(fe = npi)
40-50 0.11702 17.553
50-60 0.22756 52.5664
60-70 0.31084 71.8040
70-80 0.22756 52.5664
80-90 0.11702 17.553

18
The value of the chi-square can then be computed.

Score Observed Expected freq


2
 f o  f e 2
category freq. (fo) Probability (fe = npi) (fo-fe)
fe
40-50 21 0.11702 17.5530 11.8818 0.6769
50-60 29 0.22756 34.1340 26.3580 0.7722
60-70 49 0.31084 46.6260 5.6359 0.1209
70-80 33 0.22756 34.1340 1.2860 0.0377
80-90 18 0.11702 17.5530 0.1998 0.0114
 f o  f e 2
 f 1.6190
e
4. Do not Reject Ho. YES. The grades of students are normally distributed with a mean of 65.

19

Common questions

Powered by AI

A Chi-square test was used with a significance level of 0.05, comparing observed frequencies to expected frequencies derived from a normally distributed population. Based on the computed Chi-square value of 2.4497, the null hypothesis that ages follow a normal distribution was not rejected as the value was less than the critical Chi-square value of 7.81 .

Degrees of freedom in Chi-square tests for goodness-of-fit are determined by the number of categories minus the number of estimated parameters (K-1). If any parameters of the expected distribution are estimated from the data, each parameter estimation reduces the degrees of freedom by one .

When dealing with small expected values, researchers should combine categories with low frequencies to ensure robustness in Chi-square tests. This practice minimizes errors associated with small sample sizes and avoids misleadingly large Chi-square results, which are more reliable when expected frequencies exceed manageable thresholds .

When certain categories in a distribution do not sum to 1, expected frequencies are calculated by adding up the probabilities of the unobserved events to known categories. By ensuring that combined probabilities account for the entire distribution (sum to 1), expected frequencies for each category can be accurately assigned by multiplying total frequency by adjusted probabilities .

To verify the assumption of normality using Chi-square tests, the null hypothesis states that the population distribution follows a normal distribution. By computing expected frequencies for each category based on estimated or specified parameters of the normal distribution, and comparing these with the observed frequencies, you can determine if the assumption holds. The test checks if deviations from expected frequencies are statistically significant .

A Chi-square test for independence can be used to determine if there is a relationship between voluntary absence behavior and marital status. In this specific case, with a significance level of 0.01, it was concluded that voluntary absence behavior is independent of marital status as the calculated Chi-square statistic of 10.883 is less than the critical value of 16.81 .

A Chi-square test of independence was used to determine if there is a selection bias in favor of male applicants. The null hypothesis stated there was no selection bias, and the test was conducted with a significance level of 0.1. For the case where 7 males and 5 females were hired, the chi-square statistic was 0.3922, which is less than the critical value of 2.71, leading to the conclusion that there was no selection bias .

In scenarios where expected frequencies are critically low, it is advised to combine categories to ensure that the expected frequencies meet the minimum acceptable level, typically above 5. This helps avoid inordinately large Chi-square values that can arise in tests of independence or goodness-of-fit .

Incorrectly assuming a particular probability distribution can lead to biased conclusions, as statistical methods reliant on such assumptions may misrepresent data characteristics. Evaluating this impact involves conducting goodness-of-fit tests, such as the Chi-square test, to compare observed data with expected outcomes under different assumed distributions, helping ensure the correctness of these assumptions .

Reconsideration of the absence of selection bias involved altering the number of accepted male applicants and recalculating the Chi-square statistic. It was identified that hiring 8 males and 4 females yielded a Chi-square value of 3.5294, exceeding the critical value of 2.71, thereby indicating potential selection bias. This suggests that 8 males is the threshold beyond which selection bias concerns arise .

You might also like