0% found this document useful (0 votes)

16 views10 pages

Understanding Null Hypothesis and Errors

Q: What factors can increase the statistical power of a study, thereby reducing the risk of a Type II error?

Increasing the sample size, reducing measurement error, increasing the significance level, and detecting larger effect sizes can increase the statistical power of a study. Higher power reduces the risk of a Type II error .

Q: What is the primary difference between a Type I error and a Type II error in hypothesis testing?

A Type I error occurs when the null hypothesis is rejected when it is actually true, leading to a false positive conclusion . Conversely, a Type II error occurs when the null hypothesis is not rejected when it is false, resulting in a false negative conclusion .

Q: Why are parametric tests like the t-test used for normally distributed data, and how does it differ when sample sizes are small versus large?

Parametric tests like the t-test assume normal distribution, making them suitable for normally distributed data. With large samples, the z distribution is often used, but for small samples (n ≤ 30), the t distribution is preferred to account for sample size variability and degrees of freedom .

Q: How does sample size influence the reliability of statistical tests and the errors involved?

Larger sample sizes increase the reliability of statistical tests by reducing sampling error, which enhances statistical power and reduces the probability of a Type II error. This makes it easier to detect true effects if present .

Q: In what way does the concept of statistical power relate to the likelihood of detecting a true effect in a study?

Statistical power is the probability of correctly rejecting a false null hypothesis, i.e., detecting a true effect when it exists. Higher power increases the likelihood that a study will identify actual effects, reducing the risk of a Type II error .

Q: Why is it important to choose the correct alpha value when determining statistical significance?

Choosing the correct alpha value is crucial because it defines the threshold for statistical significance and directly affects the probability of committing a Type I error. An inappropriate alpha level may lead to false positive or negative conclusions about the hypothesis being tested .

Q: Explain how a confidence interval can provide additional context to inferential statistics beyond a point estimate.

A confidence interval gives a range of values around a point estimate to show the precision of the estimate and the potential variation if the experiment is repeated. This provides more context than a single point estimate by indicating the range where the true population parameter is likely to lie .

Q: What are the potential consequences of failing to consider both Type I and Type II errors when designing a study?

Neglecting to consider both errors can lead to inaccurate conclusions: a Type I error may result in adopting ineffective interventions with potential harm, while a Type II error may lead researchers to overlook beneficial interventions . Poor study design may consequently waste resources and negatively impact public health or scientific understanding.

Q: Discuss the use and interpretation of relative risk in cohort studies.

Relative risk (RR) is a measure of the strength of association in cohort studies, calculated as the incidence of an outcome in the exposed group divided by the incidence in the non-exposed group. An RR greater than 1 indicates higher risk among the exposed, while an RR less than 1 suggests a protective effect .

Q: How does the choice of significance level (alpha) affect the probability of committing Type I and Type II errors?

Lowering the significance level (alpha) reduces the probability of committing a Type I error but increases the probability of committing a Type II error. Conversely, increasing the significance level decreases the likelihood of a Type II error while increasing the risk of a Type I error .

The document discusses the concept of null hypothesis in research, explaining its role in hypothesis testing and the importance of distinguishing between null and alternative hypotheses. It also covers Type I and Type II errors, their implications, and the significance of statistical power in minimizing these errors. Additionally, various methods of randomization in clinical trials are outlined, emphasizing their advantages and disadvantages.

Uploaded by

ArunMarshalin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

Understanding Null Hypothesis and Errors

Uploaded by

ArunMarshalin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

NULL HYPOTHESIS

Examples of Null hypothesis:

Introduction:
A hypothesis is a wide assumption about something that Research Null hypothesis (H0)
is surrounding us. The goal of hypothesis testing is to question General Test specific
make inferences about a population based on a sample. Does tooth Tooth flossing t-test:
Often, we infer whether there’s an effect in the population flossing has ‘no effect’ The mean number of
by looking at differences between groups or relationships affect the on number of cavities per person
between variables in the sample. It’s critical in research number of cavities. does not differ
to write strong hypotheses. cavities? between the flossing
group (µ1) and the non
Null Hypothesis: - flossing group (µ2) in
Also known as hypothesis of no difference. It nullifies the the population; µ1 =
claim that experimental results are different, worse or µ2.
more better than another. The null hypothesis (H0)
answers “No, there’s no effect in the population.” The Does daily Daily Two-proportions z test:
alternative hypothesis (Ha) answers “Yes, there is an meditation meditation The proportion of
effect in the population.” decrease the ‘does not’ people with depression
If the sample provides enough evidence against the claim incidence of decrease the in the daily meditation
that there’s no effect in the population (p ≤ α), then we depression? incidence of group (p1) is greater
can ’reject' the null hypothesis. Otherwise, we ‘fail to depression. than or equal to the no
reject’ the null hypothesis. meditation group (p2)
Null hypotheses often include phrases such as “no effect,” in the population,
“no difference,” or “no relationship.” When written in p1 ≥ p2.
mathematical terms, they always include an equality
(usually =, but sometimes ≥ or ≤). Does the The amount of Linear regression:
amount of text highlighted There is no
Type 1 and type 2 errors in null hypothesis: text in the textbook relationship between
You can never know with complete certainty whether highlighted in has ’no the amount of text
there is an effect in the population. Some percentage of the textbook effect’ on exam highlighted and exam
the time, your inference about the population will be affect exam scores. scores in the
incorrect. When you incorrectly reject the null hypothesis, scores? population; β1 = 0.
it’s called a type I error. When you incorrectly fail to
reject it, it’s a type II error.
TYPE I AND TYPE II ERRORS
Similarities between Null & alternate hypothesis:
 They’re both answers to the research question. Introduction:
 They both make claims about the population. In statistics, Type 1 error indicates false positive
 They’re both evaluated by statistical tests. conclusion; Type 2 error indicates false negative results.
Making a statistical decision always involves
Differences between Null & alternate hypothesis: uncertainties, so the risks of making these errors are
unavoidable in hypothesis testing. The probability of
Null hypothesis Alternate hypothesis making a Type I error is the significance level, or alpha
Definition A claim that there A claim that there is (α), while the probability of making a Type II error is beta
is no effect in the an effect in the (β). These risks can be minimized through careful
population. population. planning in the study design.
Also known H0 Ha Example:
as You decide to get tested for COVID-19 based on mild
Typical No effect An effect symptoms. There are two errors that could potentially
phrases No change A change occur:
used No difference A difference Type I error (false positive): the test result says you have
No relationship A relationship corona virus, but you actually don’t.
Does not increase Increases Type II error (false negative): the test result says you
Does not decrease Decreases don’t have corona virus, but you actually do.
Symbols Equality symbol Inequality symbol (≠,
used (=, ≥, or ≤) <, or >) Error in statistical decision making:
Hypothesis testing starts with the assumption of no
p≤α Rejected Supported difference between groups or no relationship between
P>α Failed to reject Not supported variables in the population - this is the null
hypothesis. It’s always paired with an alternative
hypothesis, which is your research prediction of an actual If your results fall in the critical region of this curve, they
difference between groups or a true relationship are considered statistically significant and the null
between variables. hypothesis is rejected. However, this is a false positive
Example: Null and alternative hypothesis: conclusion, because the null hypothesis is actually true in
To test whether a new drug intervention can alleviate this case.
symptoms of an autoimmune disease.
In this case:
The null hypothesis (H0) is that the new drug has
no effect on symptoms of the disease. The alternative
hypothesis (Ha) is that the drug is effective for alleviating
symptoms of the disease.
Then, we decide whether the null hypothesis can be
rejected based on the data and the results of a statistical
test. If the results show statistical significance, that means
they are ‘very unlikely’ to occur if the null hypothesis is
true. In this case, you would reject your null hypothesis.
But sometimes, this may actually be a Type I error. Type 2 error:
If your findings do not show statistical significance, they Not rejecting the null hypothesis when it’s actually false.
have a high chance of occurring if the null hypothesis is Failing to conclude there was an effect when there
true. Therefore, you fail to reject your null hypothesis. actually was. In reality, your study may not have had
But sometimes, this may be a Type II error. enough statistical power to detect an effect of a certain
size. Power is the extent to which a test can correctly
Null hypothesis detect a real effect when there is one. A power level of
Rejected Not rejected 80% or higher is usually considered acceptable.
True Type 1 error Correct decision The risk of a Type II error is inversely related to the
False positive True negative statistical power of a study. The higher the statistical
Probability = α Probability = 1-α power, lower the probability of making a Type II error.
False Correct decision Type 2 error Statistical power is determined by:
True positive False negative  Size of the effect: Larger effects are more easily
Probability = 1-β Probability = β detected.
 Measurement error: Systematic and random errors in
Type I error: recorded data reduce power.
‘Rejecting the Null hypothesis’ when it is actually true. It  Sample size: Larger samples reduce sampling
means concluding that results are statistically error and increase power.
significant when, in reality, they came about purely by  Significance level: Increasing the significance level
chance or because of unrelated factors. increases power.
The risk of committing this error is the significance To (indirectly) reduce the risk of a Type II error, you can
level (α) you choose. If the p value of your test is lower increase the sample size or the significance level.
than the significance level, it means your results are Type 2 error rate:
statistically significant and consistent with the alternative The alternative hypothesis distribution curve below
hypothesis. If your p value is higher than the significance shows the probabilities of obtaining all possible results if
level, then your results are considered statistically the study were repeated with new samples and the
non-significant. alternative hypothesis were true in the population.
Example: Statistical significance and Type I error The Type II error rate is beta (β), represented by the
In your clinical study, you compare the symptoms of shaded area on the left side. The remaining area under the
patients who received the new drug intervention or a curve represents statistical power, which is 1 – β.
control treatment. Using a t test, you obtain a p value of Increasing the statistical power of your test directly
decreases the risk of making a Type II error.
0.035. This p value is lower than your alpha of 0.05, so
you consider your results statistically significant and
reject the null hypothesis.
However, the p value means that there is a 3.5% chance
of your results occurring if the null hypothesis is true.
Therefore, there is still a risk of making a Type I error.
Type 1 error rate:
The null hypothesis distribution curve below shows the
probabilities of obtaining all possible results if the study
were repeated with new samples and the null hypothesis
were true in the population.
At the tail end, the shaded area represents alpha. It’s also
called a critical region in statistics.
Relation between Type 1 and Type 2 errors: Types of t test:
 Setting a lower significance level decreases a Type I 1. Unpaired t test: Independent t test to compare the
error risk, but increases a Type II error risk. mean of 2 groups.
 Increasing the power of a test decreases a Type II 2. Paired t test: Compare 2 mean of same group at
error risk, but increases a Type I error risk. different time (eg: 3 months)
3. One sample t test: Mean of the group is compared with
known / population mean.

Unpaired t test / Two sample t test / Students t test:

This test is done for independent observations made on
participants of 2 different groups to test if the difference
between the 2 means is by real or by chance.
Eg: Comparison of effects of 2 Anti-Hypertensives. As
per null hypothesis, it is assumed that there is no
difference between means of 2 samples if samples are
taken from the population randomly.
Assumption of Unpaired t test:
1. The 2 samples are different. There is no relationship
between them.
Consequences of Type 1 error: 2. Two groups of sample, being compared are normally
Based on the incorrect conclusion that the new drug distributed.
intervention is effective, over a million patients are 3. Variances of 2 samples are equal.
prescribed the medication, despite risks of severe side Hypothesis of a Two sample t test:
effects and inadequate research on the outcomes. The The corresponding null hypotheses are as follows,
consequences of this Type I error also mean that other Two tailed t test - µ1 = µ2
treatment options are rejected in favor of this Left tailed - µ1 ≥ µ2
intervention. Right tailed - µ1 ≤ µ2
The corresponding alternate hypotheses are as follows,
Consequences of Type 2 error: Two tailed t test - µ1 ≠ µ2
If a Type II error is made, the drug intervention is Left tailed - µ1 < µ2
considered ineffective when it can actually improve Right tailed - µ1 > µ2
symptoms of the disease. This means that a medication
with important clinical significance doesn’t reach a large Steps of calculation:
number of patients who could tangibly benefit from it. 1. Find the observed difference between means of 2
samples. ( )
STUDENTS T TEST 2. Calculate the standard error of difference between 2
means.
Introduction: 3. Calculate the t value = ration of observed difference
It was first designed by [Link] whose pen name between 2 means and calculated SE.
was ‘student’. The ratio of observed difference between 2
means of sample sample to the standard error of the
difference is the same and is denoted by ‘t’. The ‘t’ table
gives highest obtained values of t under different
probabilities (p = 0.1, 0.05, 0.01, 0.001) corresponding to t is the t value, x1 and x2 are the means of the two groups
the degree of freedom serially numbered. If the calculated being compared, s2 is the pooled standard error of the two
t value exceeds the value given under, P = 0.05 in the groups, and n1 and n2 are the number of observations in
table, it is said to be significant at 5% level and null each of the groups.
hypothesis is rejected & alternate hypothesis is accepted. 4. Determine the pool degree of freedom, df = n1+n2 - 2.
5. Compare the calculated value with table value at
Degrees of freedom: particular df to find the level of significance.
It is the quantity which is one less than the independent
number of observations in the sample. In Unpaired ‘t’ test,
df = n1 + n2 - 2. In Paired ‘t’ test, df = n-1.

Criteria for applying ‘t’ test:

1. Random sample
2. Quantitative data
3. Variable normally distributed
4. Sample size, n < 30 ( or < 60 or < 100).
RANDOMIZATION
2. Systematic Randomization:
Introduction: It is applied to field study when population is large,
According to Sir Ronald Aylmer Fisher, “Randomization scattered and non-homogenous. One random number is
relieves the experimenter from the anxiety of considering chosen and every Kth number is chosen for sample (K =
innumerable causes by which data may be disturbed”. Sample interval, K = Total population / desirable sample
Randomization is the random allocation of treatment size).
which means, all participants have the same chance of Advantages:
being allocated to each of the study groups. It is an  Simple and easy to implement.
essential tool for testing the efficacy of an intervention.  Time and labour is relatively small
Basic Benefits of Randomization:  Accurate
1. Elimination of selection bias: Disadvantages:
Allocation concealment should be done. Selection bias  Complete list of population with numbering must be
leads to reduced validity and efficiency of the study. available.
2. Balances arms with respect to variables:
If Intervention groups of a clinical trial are
non-comparable, then results would be erroneous.
3. Forms basis for assumption free statistical tests.

Criteria for Randomization:

1. Unpredictability: Each participant has the same chance
of receiving the intervention.
2. Balance: Treatment groups are of similar size &
composition.
3. Simplicity: Easy for investigator to interpret.
3. Stratified Randomization:
Methods of Randomization: Population under study is first divided into homogenous
1. Simple Randomization: groups called strata and sample is drawn from strata at
Every unit of population has equal chance of being random proportion to its size. No sub sample should be <
selected by methods - unrestricted random sampling - 30 in size.
lottery method, coin flip method, random numbers table, Advantages:
computer generated random numbers. Sample is drawn  Useful for smaller clinical trials
unit by unit. Samples are mostly homogenous and readily  Great accuracy
available population. It is used in clinical trials for testing Disadvantages:
efficacy of drugs.  Can’t be applied to large clinical trials
 Need man power, time consuming
 Sampling becomes complicated if many covariates
are present.

4. Multistage Randomization:
Sampling technique carried out at several stages by
Advantages: random sampling technique. Used in cases of large
Simple and easy to implement. country survey.
Disadvantages: Advantages:
There may be imbalance in number of subjects on each  Flexibility in sampling
treatment at any point of time.  Can use divisions and subdivisions to reduce labour.
5. Cluster Randomization: 7. Minimization Randomization:
This method uses natural / geographical units of Described by Pocock and Simon, it is a technique of
population such as villages, wards, schools, factories, etc. adaptive stratified sampling. Aim is to minimize the
This method was used in expanded vaccination imbalance in samples between the groups. It can be done
programme and Universal immunization programme by in small sample size with multiple prognostic variables.
WHO. Eg: 210 children were surveyed - 7 from each Example:
cluster (total 30 cluster groups from entire population). There are 3 stratification factors:
Immunization coverage = 210 eligible children in Gender - 2 levels
surveyed houses x 100. Age - 3 levels
Advantages: Disease state - 3 levels.
 Less expensive After enrollment of the 50 patients, list is as follows.
 Travelling & expenditure are reduced
 More accurate results Treatment A Treatment B
Disadvantages: Gender Male 16 14
 Requires proper planning & execution Female 10 10
 Higher sampling error can occur Total 26 24
 May fail to reflect the diversity in sampling frame. Age ≤ 40 13 12
41-60 9 6
≥ 61 4 6
Total 26 24
Disease Stage 1 6 4
stage Stage 2 13 16
Stage 3 7 4
Total 26 24

If 51st person enrolls in the study, Male of age 61 years

old, with stage 3.

Treatment A Treatment B Difference

Male 16 14 +
Age ≥ 61 4 6 -
Stage 3 7 4 +
6. Block Randomization:
This is used to equalize the number of subjects on each Total 27 24
treatment. Block size is determined by the researcher and
should be a multiple of the number of groups (i.e with 2 Two possible criteria of assignment:
treatment groups, block size should be 4, 6 or 8). 1. Count the difference in each categories. A is ahead of
B in 2 categories. So assign the next patient to B.
2. Add the overall categories. A is ahead of B (27 > 24).
So assign the next patient to B.
Both criteria will lead to reasonable balance.

Randomization method recommendations:

1. Large study (several 100 participants):
On centre - Block randomization
Multicentre - Stratified / Block.
2. Small study (n≈100)
Stratified / Block - One centre & multicentre.
3. Very small study (n≈50)
Adaptive minimization.
Two treatments of A & B of block size 2x2=4. Possible
treatment allocations within each group are, AABB,
BBAA, ABAB, BABA, ABBA, BAAB.
Advantage:
 Balance between the groups is guaranteed even if
the trial is terminated before enrollment.
Disadvantage:
 Analysis of data is complicated
 People may predict the group assignments of the
participants.
CHI SQUARE TEST Calculation of Chi square test:
1. Make a contingency table. Tabulate the observed
Introduction: frequencies appropriately, and complete the table.
It was developed by Karl Pearson. It is a non-parametric
test. The name Chi-square (χ2) is derived from Greek Column 1 Column 2 Total
letter chi (χ), pronounced as kye. It is a statistical test for Row 1 A B Row total 1
categorical data. Categorical data is also called as (RT1)
nominal data as we use labels instead of numbers. So to Row 2 C D Row total 2
calculate central tendency only mode can be used. (RT2)
Types of Chi square test: Total Column total 1 Column total 2 N
1. The chi-square goodness of fit test is used to test (CT1) (CT2)
whether the frequency distribution of a categorical
variable is different from your expectations. 2. Calculate the expected value
2. The chi-square test of independence is used to test Expected value (E) = CT x RT / Sample total (N)
whether two categorical variables are related to each Eg: Expected value of cell A, E = CT1 x RT1 / N
other. Expected value of cell B, E = RT1 x CT2 / N
Formula for Chi square test: 3. Calculate the Chi square by the formula.
4. Sum up (χ2) of all cells.
χ2 = Σ (O - E)2 5. Calculate the degree of freedom (c-1)(r-1)
----------- 6. Refer to Fisher table for (χ2)df for probability.
E 7. If the calculated (χ2)df is higher than the initial value
O - observed frequency of each cell given in table, then the value is taken as significant. Then
E - expected frequency of each cell null hypothesis is rejected and alternate hypothesis is
accepted.
Applications in medical field:
1. Test of proportion: Limitations in applications:
To determine if there is significant difference between the 1. The chi square test done in 2 x 2 contingency table is
population proportions between two or more groups. Eg: not reliable if the frequency of cells is less than 5. Even
To compare the incidence of ankle edema in patients after Yates correction, test may be misleading.
receiving Amlodipine, clinidipine and benidipine. Null 2. Yates correction can’t be done for tables larger than
hypothesis is that there is no difference in occurrence of 2x2.
ankle edema in the 3 treatment groups. Alternate 3. Interpret chi square with caution if total sample size is
hypothesis is that there is difference in incidence of ankle less than 50.
edema. 4. Chi square test identifies only if there is any
2. Test of Association: association between two variables, but does not tell about
To determine if there is any probability of association or the strength of association.
relationship between two separate attributes. It is a 5. The statistical finding of association does not indicate
hypothesis test of independence. Eg: Amlodipine and the cause and effect.
ankle edema. Null hypothesis - The 2 variables are not 6. Yates correction must be applied, if any expected
associated i.e independent. Alternate hypothesis - the 2 value is < 10 for each cell.
variables are associated. Chi square test measures the
likelihood of association. It is used for binomial or NON - PARAMETRIC TESTS
multinomial sample.
3. Test for Goodness fit: Introduction:
It is used to find out how the observed value of an In non-parametric statistics, data is not required to fit a
occurrence is significantly different than the expected normal distribution. It uses ordinal data, so relies on a
values. It compares the observed sample distribution with rank or order. For central tendency if median is the best
expected probability distribution and determine how well measure, then not parametric test should be applied
they fit to each other. Eg: Observed value of ankle edema irrespective of the sample size.
with clinidipine, amlodipine, benidipine how well they
differ from expected values. Null hypothesis - No Types of Non-Parametric tests:
significant differences between observed and expected 1. Sign test:
value. Alternate hypothesis - There is significant It is the oldest distribution-free test which can be used
difference i.e observed data is not consistent with either in the one-sample or in the paired sample. The null
hypothetical / expected distribution. hypothesis of the sign test is that given a pair of
Requirements for performing Chi square test: measurements (xi, yi), then xi and yi are equally likely to
 Random sample be larger than each other. In ‘in vivo’ experiments to
 Qualitative data or categorical variable evaluate whether a treatment is superior to the other. The
 Lowest expected frequencies not < 5 in any cells. sign test may be used in clinical trials to know whether
either of the two treatments that are provided to study Advantages of Non-parametric tests:
subjects is favored over the other. 1. Easy to use. As there is no need to use parameters, the
2. Signed rank sum tests: data becomes more applicable to a large variety of tests.
The major disadvantage of the sign test is that it considers 2. This type of statistics can be used without the
only the direction of difference between pairs of knowledge of mean and standard deviation.
observations, not the size of the difference. Ranking the 3. Relies on median or use of ordinal data.
observations and then carrying out the statistical analysis 4. They are valid with small sample size even if data are
can solve this issue. Signed rank sum test is more not normally distributed.
powerful than the sign test.
Disadvantages of Non-parametric tests:
3. Wilcoxon Rank-Sum test: 1. Lower degree of confidence
It is the non-parametric analogue to the paired t-test. The 2. Lower power - chance of type 2 error is more.
null hypothesis of Wilcoxon rank-sum test is that the
median difference between pairs of observations is zero. CASE CONTROL STUDY
4. Fisher’s exact test: Introduction:
It is used in the analysis of contingency tables with small It is a type of analytical study. In a Case control study,
sample sizes. It is similar to chi square test, since both the both exposure as well as outcome have occurred when the
tests deal with nominal variables. In Fisher’s exact test, it study has begun: First we take outcome into consideration,
is assumed that the value of the first unit sampled has no and then go back in time taking exposure into
effect on the value of the second unit. consideration; then compare exposure in both diseased
(cases) and non-diseased (controls).
5. Mann-Whitney’s U test: Also called by many names: Retrospective study,
It is a test equivalent of Student’s t-test for comparing Backward looking study, Effect to cause to study,
two groups. Mann-Whitney’s U test works well in the Outcome to exposure study, Disease to risk factor study,
analysis of data obtained from toxicity studies, where the TROHOC study.
number of animals in each group is 27 or less. By
Mann-Whitney’s U test, a significant difference Controls in a Case control study:
(one-sided test) can be detected even with three animals In a case control study, selection of controls is a
in each group. The power to detect a significant prerequisite. If study group is small - 4 controls per case
difference is more with Mann-Whitney’s U test than the (In larger studies - cases and controls 1 : 1 is sufficient).
Fisher’s test. Cases are diseased individuals, Controls are those free
from the disease under study. Controls must be similar to
6. Kruskal - Wallis Non parametric ANOVA by cases, as much as possible except for the absence of
Ranks: disease under study.
It is identical to one-way ANOVA with the data replaced
by their ranks. It has also been stated that this test is an Sources of controls:
extension of the two-group Mann-Whitney’s U  Hospital controls: are often a ‘source of selection
(Wilcoxon rank) test. It assumes that the observations in bias’.
each group come from populations with the same shape  Neighbourhood controls: provide similar
of distribution. socio-economic and living conditions
 Relatives: Sibling controls are unsuitable in genetic
7. Dunn’s multiple comparison test for more than studies.
three groups:  General population: by choosing a random sample
It is used to find the difference of means of 3 or more  Best friends controls
groups. The difference between the two mean scores is
compared with the Probability (critical) value. If the Strength of Association in a Case Control Study:
difference between the two mean scores is greater than Case Control Study cannot provide with incidences, so
the Probability (critical) value, then the difference is Relative Risk cannot be calculated; so in a Case Control
considered significant. Study, we calculate ‘an estimate of Relative Risk’, known
as ‘Odds Ratio’ (CROSS PRODUCT RATIO). Odds
8. Steel’s multiple comparison test for more than Ratio In a 2 × 2 table for a case control study:
three groups:
The power of Steel’s test is higher than the other multiple
comparison tests. Usually the number of groups
employed is four (three treatment groups + one control
group) in most of the animal studies. For a parameter
which shows a strong dose-related pattern, a significant
difference can be detected by Steel’s test, even if the
number of animals in a group is as low as four.
Advantages: development of same disease in both exposed and
 Easy to carry out non-exposed groups.
 Rapid & Inexpensive Examples:
 No risk to subjects 1. Framingham heart study
 Minimal ethical problems 2. Doll & Hills prospective study on smoking and lung
 No loss to follow up/ Attrition cancer
 ‘Particularly suitable to investigate rare diseases’ 2. Retrospective cohort study:
Disadvantages: Known as ‘Historical cohort study’ or ‘Non-concurrent
 Selection of an appropriate control group may be cohort study’. Combines advantages of both Cohort study
difficult and Case control study. Both exposure as well as outcome
 Cannot measure incidence: can only estimate Odds have occurred when the study has begun: First we go
ratio back in time and take only exposure into consideration
 Chances of recall bias (cohorts identified from past hospital/college records),
then look for development of same disease in both
Nested Case Control Study exposed and non-exposed groups Sample size required is
It is a hybrid design where ‘a case control study is nested same as that of prospective cohort study.
in a cohort study’. Is predominantly a type of Cohort Examples:
study (due to forward direction). Usefulness limited for 1. Effect of fetal monitoring on neonatal deaths
studies involving ‘rare diseases AND whose diagnostic 2. PVC exposure and angiosarcoma of liver
tests are very expensive’. 3. Combined prospective-retrospective cohort study:
Study design: Known as ‘Mixed cohort study’. Combines designs of
1. A population is identified and baseline data is both prospective cohort study and retrospective cohort
obtained from interviews, blood or urine tests, etc. study. Both exposure as well as outcome have occurred
2. Population is then followed up for a period of time when the study has begun: First we go back in time and
(Cohort study) for development for the disease take only exposure into consideration (cohorts identified
under study from past hospital/ college records), then look for
3. A Case control study is then carried out. development of same disease in both exposed and
4. Cases: people who developed the disease. non-exposed groups; later cohort is followed
5. Controls: Sample from those who did not develop prospectively into future for outcome.
the disease Examples: Court-Brown & Doll study on effects of
6. Samples/ history collected at baseline are then radiation therapy.
examined Strength of association in cohort study:
Advantages: 1. Relative risk (RR) = Incidence among exposed/
1. Elimination of problem of Recall bias: Interviews are Incidence among non-exposed. RR = I exposed / I
performed at the beginning of the study (at baseline), and non-exposed. Interpretation of RR: Incidence of lung
data are obtained before the disease has developed. disease among exposed is so many times higher as
2. Maintenance of temporal association: If any disease or compared to that among non-exposed
abnormality in a biological characteristic is noted, it is 2. Attributable risk (AR) = (Incidence among exposed –
more likely that it represent risk factors or other Incidence among non-exposed) / Incidence among
pre-morbid characteristics rather than a manifestation of exposed × 100.
early, sub-clinical disease AR = (I exposed – I non-exposed) / I exposed × 100.
3. Economical to conduct: Expensive tests need not be Interpretation of AR: So much disease can be attributed to
conducted on entire population; only carried out among exposure.
cases and controls. 3. Population attributable risk (PAR) = (Incidence among
total – Incidence among non-exposed) / Incidence among
COHORT STUDY total × 100
PAR = (I total – I non-exposed)/ I total × 100
Introduction: Interpretation of PAR: If risk factor is modified or
It is a type of analytical (observational) study used for eliminated, there will be so much annual reduction in
‘hypothesis testing’. It is known by several synonyms: incidence of disease in the given population.
Prospective study, Forward looking study, Cause to effect Interpretation of Relative Risk:
study, Exposure to outcome study, Risk factor to disease
study, Incidence study, Follow up study.

Types of Cohort studies:

1. Prospective cohort study:
Known as ‘Current cohort study’ or ‘Concurrent cohort
study’. Outcome has not yet occurred when the study has
begun: Only exposure has occurred; we look for
CONFIDENCE INTERVAL For a two-tailed interval, divide your alpha by two to get
the alpha value for the upper and lower tails.
Introduction: 3. Look up the critical value that corresponds with the
Confidence Interval is the range of values of the estimate alpha value.
to fall between a certain percentage of the time if the If your data follows a normal distribution, or if you have
experiment is run again or re-sample the population in a large sample size (n > 30) that is approximately
same way. It is the mean of the estimate plus and minus normally distributed, you can use the z distribution to find
the variation in that estimate. your critical values.
The desired confidence level is usually one minus the If you are using a small data set (n ≤ 30) that is
alpha (α) value you used in your statistical test: approximately normally distributed, use the t distribution
Confidence level = 1 − α instead. The t distribution follows the same shape as the z
So if we use an alpha value of p < 0.05 for statistical distribution, but corrects for small sample sizes. For the t
significance, then your confidence level would be 1 − distribution, you need to know your degrees of freedom
0.05 = 0.95, or 95%. (sample size minus 1).
Confidence level is the percentage of times expected to For normal distributions, like the t distribution and z
reproduce an estimate between upper and lower levels of distribution, the critical value is the same on either side of
the confidence interval, and it is set by the alpha (α) the mean.
value. Finding the standard deviation
Most statistical software will have a built-in function to
When to use Confidence Interval: calculate your standard deviation, but to find it by hand
CI can be calculated for many kinds of statistical you can first find your sample variance, then take the
estimates like, square root to get the standard deviation.
1. Proportions 1. Find the sample variance
2. Population means Sample variance is defined as the sum of squared
3. Differences between population means or proportions differences from the mean, also known as the
4. Estimates of variation among groups mean-squared-error (MSE):
These are point estimates. Confidence intervals are useful
for communicating the variation around a point estimate.

Calculating a Confidence Interval:

Most statistical programs will include the confidence 2. Find the standard deviation.
interval of the estimate when we run a statistical test. The standard deviation of your estimate (s) is equal to the
If we want to calculate a confidence interval on our own, square root of the sample variance/sample error (s2):
we need to know:
1. The point estimate we are constructing the confidence
interval for.
2. The critical values for the test statistic. Sample size
3. The standard deviation of the sample. The sample size is the number of observations in your
4. The sample size. data set.
Point estimate
The point estimate of your confidence interval will be Confidence interval for the mean of normally -
whatever statistical estimate you are making (e.g., distributed data:
population mean, the difference between population
means, proportions, variation among groups).
Example: Where:
In the TV-watching example, the point estimate is the CI = the confidence interval
mean number of hours watched: 35. ‹ = the population mean
Finding the critical value Z* = the critical value of the z distribution
Critical values tell you how many standard deviations σ = the population standard deviation
away from the mean you need to go in order to reach the √n = the square root of the population size
desired confidence level for your confidence interval. The confidence interval for the t distribution follows the
There are three steps to find the critical value. same formula, but replaces the Z* with the t*.
1. Choose your alpha (α) value. In real life, you never know the true values for the
The alpha value is the probability threshold for statistical population (unless you can do a complete census). Instead,
significance. The most common alpha value is p = 0.05, we replace the population values with the values from our
but 0.1, 0.01, and even 0.001 are sometimes used. sample data, so the formula becomes:
2. Decide if you need a one-tailed interval or a two-tailed
interval.
You will most likely use a two-tailed interval unless you
are doing a one-tailed t test.
Where:
ˆx = the sample mean
s = the sample standard deviation

Confidence interval for proportions

The confidence interval for a proportion follows the same
pattern as the confidence interval for means, but place of
the standard deviation you use the sample proportion
times one minus the proportion:

Where:
ˆp = the proportion in your sample (e.g. the proportion of
respondents who said they watched any television at all)
Z*= the critical value of the z distribution
n = the sample size

Confidence interval for non-normally distributed data

To calculate a confidence interval around the mean of
data that is not normally distributed, you have two
choices:
1. You can find a distribution that matches the shape of
your data and use that distribution to calculate the
confidence interval.
2. You can perform a transformation on your data to
make it fit a normal distribution, and then find the
confidence interval for the transformed data.
Reporting confidence intervals
Confidence intervals are sometimes reported in papers,
though researchers more often report the standard
deviation of their estimate.
If you are asked to report the confidence interval, you
should include the upper and lower bounds of the
confidence interval.
One place that confidence intervals are frequently used is
in graphs. When showing the differences between groups,
or plotting a linear regression, researchers will often
include the confidence interval to give a visual
representation of the variation around the estimate.
Confidence intervals are sometimes interpreted as saying
that the ‘true value’ of your estimate lies within the
bounds of the confidence interval.
The confidence interval only tells you what range of
values you can expect to find if you re-do your sampling
or run your experiment again in the exact same way.

Common questions