BIOSTATISTICS-II
HYPOTHESIS TESTING
Introduction, Statistical problem, null and alternative hypothesis, Type-I and Type-II
errors, level of significance, Test statistics, acceptance and rejection regions, general
procedure for testing of hypothesis.
1. A statement about one or more populations is called Hypothesis.
2. The conjecture or supposition that motivates the research is called Research
hypothesis
3. A hypotheses that are stated in such a way that they may be evaluated by
appropriate statistical techniques is called Statistical hypothesis
4. A hypothesis that is to be tested Null hypothesis
5. A hypothesis is designated by the symbol H0 Null hypothesis
6. A statement of agreement with (or no difference from) conditions presumed to
be true in the population of interest Null hypothesis
7. Can we conclude that a certain population mean is not 50? The null
hypothesis is µ = 50
8. Can we conclude that a certain population mean is not 50? The alternate
hypothesis is µ ≠ 50
9. Can we conclude that a certain population mean is greater than 50? The null
hypothesis is µ < 50
10. Can we conclude that a certain population mean is less than 50? The null
hypothesis is µ > 50
11. Statistic that may be computed from the data of the sample in hypothesis
testing is called Test Statistics
12. If the value of the test statistic that we compute from our sample is one of the
values in the rejection region. The decision rule tells us to Reject the null
hypothesis
13. If the value of the test statistic that we compute from our sample is one of the
values in the non-rejection region. The decision rule tells us to Fail to reject
the null hypothesis
14. The area under the curve of the distribution of the test statistic that is above
the values on the horizontal axis constituting the rejection region is Level of
significance
15. The probability of rejecting a true null [Link] Level of significance
16. The more frequently encountered values of α are 0.01, 0.05, and 0.10
17. The error committed when a true null hypothesis is rejected is called Type I
error
18. The error committed when a false null hypothesis is not rejected is called
Type II error
19. The probability that the computed value of a test statistic is at least as
extreme as a specified value of the test statistic when the null hypothesis is
true P-value
20. When sampling is from a normally distributed population and the population
variance is known, the test statistic for testing is z test
21. When sampling is from a normally distributed population and the population
variance is not known, the test statistic for testing is t test
22. When sampling is from a normally distributed population and the population
variance is not known, the test statistic for testing is t test
23. The values of the test statistic that separate the rejection and non-rejection
regions are called Critical value
24. A statement of belief used in the evaluation of population Hypothesis
25. A claim that there is no difference between the population mean and the
hypothesized value is Null hypothesis
26. The non-rejection region is denoted by 1 – α
27. The quantity 1 — β is referred to Power of test
28. The quantity 1 — β is referred to Power of test
29. Is a new drug superior to a standard drug?: One sided test
30. Is there a difference between the cholesterol levels of men and women? Two
tailed test
31. A quantity obtained by applying certain rule or formula is known as Test
Statistics
32. 1-α is the probability of Acceptance Region
33. If we reject the null hypothesis when it is true, we might be making Type-I
Error
34. Which of the following is an assumption underlying the use of the t-
distributions? The samples are drawn from a normally distributed
population
35. Condition for applying Central Limit Theorem (CLT) which approximate the
sampling distribution of the mean with a normal distribution is? N>30
36. The critical value of a test statistic is determined from The sampling
distribution of the statistic assuming Alternative Hypothesis
37. Which of the following symbols represents a population parameter? µ
38. Which of the following statements sounds like a null hypothesis? There is no
difference between male and female incomes in the population
39. Which of the following is the researcher usually interested in supporting when
he or she is engaging in hypothesis testing? The alternative hypothesis
40. When p<0.05 is reported in a journal article that you read for an observed
relationship, it means that the author has Rejected the null hypothesis
41. Identify which of the following steps would NOT be included in hypothesis
testing Eliminate all outliers
42. The cutoff the researcher uses to decide whether to reject the null hypothesis
is called the Both a and b are correct
43. Which of the following statements is/are true according to the logic of
hypothesis testing? Both b and c are true
44. The “equals” sign (=) is included in which hypothesis when conducting
hypothesis testing? Null
45. A result is called “statistically significant” whenever The p-value is less or
equal to the significance level.
TESTING OF HYPOTHESIS- SINGLE POPULATION
Introduction, testing of hypothesis and confidence interval about the population mean
and proportion for small and large samples.
1. What is the Z-value associated with a 95% confidence interval? 1.96
2. When estimating the population mean with a small sample, the student t
distribution may be used with how many degrees of freedom? n-1
3. If the p-value is greater than alpha in a two-tail test, what conclusion should
you draw? The null hypothesis should not be rejected.
4. If the p-value is less than alpha in a one-tail test, what conclusion should you
draw? The null hypothesis should be rejected.
5. p-value is >0.05 means Result is not significant
6. When sampling is from a normally distributed population and the population
variance is known, the test statistic for testing H0: µ = µ0 is Z test
7. When sampling is from a normally distributed population and the population
variance is unknown, the test statistic for testing H0: µ = µ0 is T test
8. When sampling is from a population that is not normally distributed. the test
statistic for testing H0 is Non-parametric test
9. σ/√n is Standard error of x̄
10. H0: µ = µ0 where µ0 is Hypothesized value
11. If a one-tail Z test for a single population mean is performed, what conclusion
should be drawn if p = 0.03 Result is highly significant
12. If you were performing a two-tail z test for a single population mean, what
would the critical z value be if alpha was chosen as 5%? ±1.96
13. If you were performing a two-tail z test for a single population mean, what
would the critical z value be if alpha was chosen as 1%? ± 2.5758
14. If you were performing a one-tail z test for a single population mean, what
would the critical z value be if alpha was chosen as 5%? ±1.6449
15. If you were performing a one-tail z test for a single population mean, what
would the critical z value be if alpha was chosen as 10%? +1.2816
16. When testing a null hypothesis by means of a two-sided confidence interval,
we reject H0 at the α level of significance if Hypothesized parameter is not
contained within the 100(1- α) percent confidence interval
17. When testing a null hypothesis by means of a two-sided confidence interval,
we cannot reject H0 at the α level of significance if Hypothesized parameter
is contained within the 100(1- α) percent confidence interval
18. What does it mean when you calculate a 95% confidence interval? (The
process you used will capture the true parameter 95% of the time in the long
run/ You can be “95% confident” that your interval will include the population
parameter/ You can be “5% confident” that your interval will not include the
population parameter/ All of the above statements are true
19. What would happen (other things equal) to a confidence interval if you
calculated a 99 percent confidence interval rather than a 95 percent
confidence interval? It will become wider
20. As a general rule, researchers tend to use ____ percent confidence intervals
95%
21. _________ are the values that mark the boundaries of the confidence interval
Confidence limits
22. A ________ is a range of numbers inferred from the sample that has a certain
probability of including the population parameter over the long run.
Confidence interval
23. As sample size goes up, what tends to happen to 95% confidence intervals?
Both a and b
24. For a paired comparison (t-test) with 40 participants, the appropriate Df is 39
25. What is the correct decision in a hypothesis if the data produce a t-statistic
that is in the critical region? Reject H0
26. What effect would increasing the sample size have on a confidence interval?
The confidence interval would decrease in size.
27. The general format for a confidence interval is Point estimate + (critical
value)(standard error).
28. Which of the following will increase the width of a confidence interval
(assuming that everything else remains constant)? Decreasing the sample
size
29. In a situation where the population standard deviation is known and we wish
to estimate the population mean with 90 percent confidence, what is the
appropriate critical value to use? z = 1.645
30. If a decision maker wishes to reduce the margin of error associated with a
confidence interval estimate for a population mean, she can Increase the
sample size
31. When small samples are used to estimate a population mean, in cases where
the population standard deviation is unknown The t-distribution must be
used to obtain the critical value.
32. If an economist wishes to determine whether there is evidence that average
family income in a community exceeds $25,000. The best null hypothesis is μ
< 25,000
33. If the p value is less than in a two-tailed test The null hypothesis should
be rejected.
34. A hypothesis test is to be conducted using an alpha = .05 level. This means
There is a maximum 5 percent chance that a true null hypothesis will be
rejected.
35. The reason for using the t-distribution in a hypothesis test about the
population mean is The population standard deviation is unknown
36. ABC Food Company believes that it has a market share of 25%. The
H o : = .25
appropriate null and alternate hypotheses are:
H a : .25
37. The one-sample z statistic is used instead of the one-sample t statistic when
______.σ is known
38. Which of the following is an option for an null hypothesis? Ha = k
39. Which of the following is the first step in hypothesis testing? Developing a
null and alternative hypothesis
40. A statistics instructor believes that fewer than 20% of ABC College students
attended the opening night midnight showing of the latest Harry Potter movie.
She surveys 84 of her students and finds that 11 attended the midnight
showing. An appropriate alternative hypothesis is p<0.20
41. Of a random sample of 75 health insurance firms, 50 have reported an
increase in profit last year. Of a second random sample of 75 car insurance
firms 44 have reported an increase in profit in the same period. You want to
test the hypothesis that proportion of health insurance firms (PH) reporting an
increase in profit is not different from that of the other (PC). The alternative
hypothesis (𝐻1) for the test to be performed is 𝐻1: 𝑃𝐻 − 𝑃𝐶 ≠ 0
TESTING OF HYPOTHESES-TWO OR MORE POPULATIONS
Introduction, Testing of hypothesis and confidence intervals about the difference of
population means and proportions for small and large samples, Analysis of Variance
and ANOVA Table.
1. HO: µ1 - µ2 = 0, Ha: µ1 - µ2 ≠ 0 is an example of Difference between two
population means of independent samples
2. When two independent simple random samples have been drawn from
normally distributed populations with unknown and unequal variances, the
degree of freedom is n1 + n2 -2
3. When each of two independent simple random samples has been drawn from
a normally distributed population and the two populations have equal but
unknown variances, S2p is Pool the sample variances
4. Two random samples have sizes of n=49 and n=36 respectively. Which of the
following is true for a 95% confidence interval? The confidence interval for
the sample of n=49 is narrower.
5. The following are fat(mg) found in 5 samples of each of two brands of baby
food:
A: 5.7, 4.5, 6.2, 6.3, 7.3
B: 6.3, 5.7, 5.9, 6.4, 5.1
Which of the following procedures is appropriate to test the hypothesis of
equal average fat content in the two types of ice cream? Independent t-test
6. The following are fat(mg) content found in 5 samples of each of two brands of
baby food:
A: 5.7, 4.5, 6.2, 6.3, 7.3
B: 6.3, 5.7, 5.9, 6.4, 5.1
What is the degree of freedom? 8
7. ANOVA is a test for equality of Variances
8. The analysis of variance is a statistical test that is used to compare how many
group means? Three or more
9. A statistical test used to compare more than 2 group means is known as One-
way analysis of variance
10. Which of the following assumptions are required if an independent t-test is to
be used? (Samples are drawn from a normally distributed population/
Homogeneity of variances (equal variances)/ The data are either interval or
ratio scales/ All the above assumptions (A, B and C) are required.
11. If we are testing for the difference between the means of two independent
populations with samples of n1 = 20 and n2 = 20, the number of degrees of
freedom is equal to 38.
12. Which of the following is the correct the null and alternative hypotheses to
determine if the average biostatistics score of batch 1 DPT students differs
from the average biostatistics score of batch 2?
H 0 : A – J = 0 versus H1 : A – J 0
13. A hypothesis test for the difference between two means is considered a two-
tailed test when The null hypothesis states that the population means are
equal.
14. Suppose that a group of 10 people join a weight loss program for 3 months.
Each person’s weight is recorded at the beginning and at the end of the 3
month program. To test whether the weight loss program is effective, the data
should be treated as Paired samples using the t-distribution
15. If we are testing for the difference between the means of paired populations
with samples of n = 20, the number of degrees of freedom is equal to 19.
16. In testing for differences between the means of two paired populations, the
null hypothesis is H 0 : D = 0 .
17. Which of the following is an assumption for the one-way analysis of variance
experimental design? Populations are normally distributed/ The populations
have equal variances/ The observations are independent/ All of the above.
18. Which of the following is the appropriate alternative hypothesis for ANOVA?
Not all population means are equal.
19. x1 = 62.1 x2 = 58.94 x3 = 71.2 The appropriate test to conduct to determine
if the population means are equal is One-way analysis of variance
20. In conducting a one-way analysis of variance where the test statistic is less
than the critical value, which of the following is correct Conclude that all
means are the same and there is no need to conduct the post-hoc
procedure.
21. Prior to conducting a one-way analysis of variance test, it is a good idea to
test to see whether the population variances are equal. One method for doing
this is to use Hartley’s F-max test.
22. In a one-way analysis of variance test in which the levels of the factor being
analyzed are randomly selected from a large set of possible factors, the
design is referred to as A random-effects design.
23. What do ANOVA calculate? F ratios
24. How many levels must there be in one independent variable for an ANOVA to
be used? 2
25. How many dependent variables must you have for an ANOVA to be
conducted? 1 continuous variable
26. Which of the following assumptions must be met to use an ANOVA? ( There
is homogeneity of variance/ The data must be normally distributed/ The
dependent variable must be interval or ratio/ All of these
27. What must a Levene's test be in order to use an ANOVA? Non-significant at
or above p > 0.05
28. Where would you look on an ANOVA output to determine if there is an overall
significant difference? The Sig. column of the ANOVA table
29. What would you use to determine whether significant differences were
observed between all levels of your independent variable? Post-hoc tests
30. What are the two types of affects you must be able to identify from an
ANOVA? Main effects and interactions
31. Analysis of variance is a statistical method of comparing the of several
populations. Means
32. In a study, subjects are randomly assigned to one of three groups: control,
experimental A, or experimental B. After treatment, the mean scores for the
three groups are compared. The appropriate statistical test for comparing
these means is: The Analysis Of Variance
33. The F ratio is typically used to test differences between Three or more
means.
34. The greater the value of the F ratio The less the sample distributions
overlap.
35. The ______ sum of squares measures the variability of the observed values
around their respective treatment means. Error
36. The ________ sum of squares measures the variability of the sample
treatment means around the overall [Link]
37. To determine whether the test statistic of ANOVA is statistically significant, it
can be compared to a critical value. What two pieces of information are
needed to determine the critical value? Sample size, number of groups
38. The error deviations within the SSE statistic measure distances: Within
groups
39. Which of the following is not an assumption for one-way analysis of variance?
Constant variance
40. The treatment sum of squares measures the ______ variability. Between-
treatment
TESTING OF HYPOTHESIS-INDEPENDENCE OF ATTRIBUTES
Introduction, Contingency Tables, Testing of hypothesis about the Independence of
attributes.
1. The value of chi square will always be Positive
2. Chi-square is used to analysis Frequencies
3. On which of the following does the critical value for a chi-square statistic rely?
The degrees of freedom
4. Chi-square is used to analysis Frequencies
5. On which of the following does the critical value for a chi-square statistic rely?
The degrees of freedom
6. Using a chi square test, we can assess whether a set of obtained frequencies
differ from a set of Expected frequencies
7. Imagine you conducted a study to look at the association between whether
expectant mothers in two different age groups (18–30 and 31–43 years and
the gender of the first-born child. Which of the following options would be the
most appropriate method of analyzing these data? Chi-square test
8. When using the chi-square test for differences in two proportions with a
contingency table that has r rows and c columns, the degrees of freedom for
the test statistic will be (r - 1)(c - 1).
9. The degrees of freedom for the Chi-Square test statistic when testing for
independence in a contingency table with 4 rows and 4 columns would be 9
10. What type of data do you need for a chi-square test? Categorical
11. What symbol is used to represent chi-square? χ2
12. The χ2-test should not be used if any expected frequency is Less than 5
13. If all frequencies of classes are same, the value of Chi-square is Zero
14. In order to carry out a χ2-test on data in a contingency table, the observed
values in the table should be Frequencies
15. The degrees of freedom for χ2 are (r-1)(c-1) for a contingency table with r-
rows and c-columns. So for a 2x2 contingency table there are One degrees
of freedom
16. For an r x c contingency table the number of degrees of freedom equals (r-
1)(c-1)
17. For a 3 x 3 contingency table, the numbers of cells in the table are 9
18. The null hypothesis of independence between the variables is tested using
the χ2 -statistic where calculated χ2= ∑(O – E )2/E, if the degrees of freedom,
(r – 1)(c – 1), are greater than 1
19. The shape of the chi-square distribution depends upon Degree of freedom
20. The total area under the curve of a chi-square distribution is 1
21. Chi-square curve ranges from 0 to ∞
22. The value of chi-square statistic is always Non-negative
23. In testing independence in a 2 x 3 contingency table, the number of degrees
of freedom in χ2-distribution is 2
24. Given χ2= 5.8, df = 1, χ20.05(1) = 3.841, χ20.01(1) = 6.635, we make the
following statistical decision We reject Ho at α = 0.05 but not at α = 0.01
25. If χ2= 13.95, df = 4, χ20.05(4) = 9.488, χ20.01(4) = 13.277, we make the
following statistical decision We reject Ho at α = 0.01 and α = 0.05
26. What kind of variables would you cross tabulate? Two or more categorical.
27. Which statistical test is used to identify whether there is a relationship between
two categorical variables? Pearson’s Chi-square test.
28. What does the statistic Cramer’s V indicate? The strength of association
between two categorical variables..
29. What is the null hypothesis for a Chi-square test? Both variables are
independent.
30. If df=1 and α = 0,05, the critical value of Chi square is 3.841
REGRESSION AND CORRELATION
Introduction, cause and effect relationships, examples, simple linear regression,
estimation of parameters and their interpretation. r and R 2. Correlation. Coefficient of
linear correlation, its estimation and interpretation. Multiple regression and
interpretation of its parameters.
1. In order for accurate measures of the linear relationship between two variables
to be achieved, what type of data are required if using Pearson’s correlation
coefficient? Continuous
2. A Pearson correlation of r=-0.6 indicates an increase in X is accompanied
by a decrease in Y; the relationship is moderate.
3. A statistical test used to determine whether a correlation coefficient is
statistically significant is called the Correlation
4. The strength (degree) of the correlation between a set of independent
variables X and a dependent variable Y is measured by Coefficient of
Correlation
5. The percent of total variation of the dependent variable Y explained by the set
of independent variables X is measured by Coefficient of Determination
6. A coefficient of correlation is computed to be -1 means that the relationship
between two variables is strong and but negative
7. Let the coefficient of determination computed to be 0.39 in a problem
involving one independent variable and one dependent variable. This result
means that 39% of the total variation is explained by the independent
variable
8. Relationship between correlation coefficient and coefficient of determination is
that The coefficient of determination is the coefficient of correlation
squared
9. A process by which we estimate the value of dependent variable on the basis
of one or more independent variables is called Regression
10. All data points falling along a straight line is called Linear relationship
11. In simple regression equation, the numbers of variables involved are Two
12. The dependent variable is also called Regressand
13. In the regression equation Y = a+bX, the Y is called Dependent variable
14. In the regression equation Y = a +bX, a is called Y-intercept
15. The graph showing the paired points of (Xi, Yi) is called Scatter diagram
16. The purpose of simple linear regression analysis is to Predict one variable
from another variable
17. A measure of the strength of the linear relationship that exists between two
variables is called Correlation coefficient
18. If both variables X and Y increase or decrease simultaneously, then the
coefficient of correlation will be Positive
19. If the points on the scatter diagram indicate that as one variable increases the
other variable tends to decrease the value of r will be Negative
20. The value of the coefficient of correlation r lies between -1 and +1
21. The range of regression coefficient is -∞ to +∞
22. In the regression equation Y = a + bX, b is called Slope/ Regression
coefficient
23. If the figure +1 signifies perfect positive correlation and the figure -1 signifies a
perfect negative correlation, then the figure 0 signifies No correlation
24. A perfect positive correlation is signified by +1
25. A scatterplot shows Scores on one variable plotted against scores on a
second variable.
26. Suppose the correlation between height and weight for adults is +0.80. What
proportion of the variability in weight can be explained by the relationship with
height? 64%
27. The dependent variable in simple linear regression is also called the? (one
correct choice) Criterion or response or outcome
28. In regression analysis, if the independent variable is measured in kilograms,
the dependent variable Can be any units
29. If the coefficient of determination is equal to 1, then the correlation coefficient
can be either -1 or +1
30. If two variables, x and y, have a very strong linear relationship, then there is
evidence that x causes a change in y
REFERENCE BOOKS
• Walpole, R. E. 1982. ―Introduction to Statistics", 3rd Ed., Macmillan
Publishing Co., Inc. New York. Muhammad, F. 2005.
• Statistical Methods and Data Analysis", KitabMarkaz, Bhawana Bazar
Faisalabad.