Basic Biostatistics Part 2
1st March, 2017
Content
• Part 1 Summary
• Sampling
• Statistical Hypothesis Tests
• Errors in Hypothesis Tests
• Power and Sample Size
• Examples
• Correlation and Regression
Part 1 Summary
• What were the key learning points from Part 1?
− In groups, identify 3 key learning points from
the first session
Sampling
Sampling
• An investigation of a population is said to be a survey or study of the
population
• A population is a group of individuals or objects that meets a set of
pre-defined criteria; e.g.
- All people with permanent residence in the UK
- All patient records held in a database
- All patients with schizophrenia
- All staff members of an organisation
- All patients registered to a particular specialist
- All members of the population diagnosed with a particular health
condition
• A survey or study that collects information from every member of a
population is referred to as a census
Sampling
• Not always possible to collect information from every
member of a population due to time and resources
• A ‘good sample’ can be used to reliably estimate
characteristics (e.g. the mean) of the population
• Sample – any subset of a population
Sample
Population
Sampling Error
• Errors in surveys can be divided into two categories
• Sampling error - error due to taking a sample rather
than studying the whole population
- e.g. if a psychiatrist randomly selects a sample of
patients and records the duration of each appointment,
the average treatment time can be calculated
- if the times for all patients were recorded (i.e. the entire
population) then the population average would most
likely differ from the sample average
Non-sampling error
• Non-Sampling error is error due to:
- poor selection of strata or sample (coverage errors)
- poor data entry (processing errors)
- inaccurate responses (measurement errors)
- non-response errors
• In surveys, non-sampling errors can be more of a
problem than sampling errors
Statistical Hypothesis Tests
Hypothesis Testing
• A process called Hypothesis Testing is used
to quantify a belief against a particular
hypothesis about the population
• There are many different types of hypothesis
tests
• Five stages for hypothesis testing can be
defined:
5 Stages
1. Define the Null & Alternative Hypotheses
2. Collect data
3. Calculate the value of the test statistic
4. Compare the value of the test statistic to
values from a known probability
distribution
5. Interpret the P-value and results
The Null Hypothesis
• The Null Hypothesis is tested which assumes
no effect (e.g. the difference in means equals
zero) in the population
• Example: Comparing the rates of
hallucinations in men and woman in the
population
− Null Hypothesis (H0): rates of hallucinations
are the same in men and woman in the
population
The Alternative Hypothesis
• The Alternative Hypothesis is holds if the Null
Hypothesis is not true
• Example
− Alternative Hypothesis (H1): rates of
hallucinations are different in men and
woman in the population
The test statistic
• After data collection, the sample data is used
to calculate a test statistic
• The test statistic is effectively the amount of
evidence in the data against H0
• Generally, the larger the value (irrelevant of
sign), the greater the evidence against H0
The P-value
• The test statistic is compared to values from
the relevant probability distribution to obtain a
P-value
• The P-value is the probability of obtaining
our results, or something more extreme, if
the Null Hypothesis is true
• The smaller the P-value, the greater the
evidence against H0
Rejecting H0
• Conventionally, if the P-value < 0.05, there is
sufficient evidence to reject H0
• There is only a small chance of the results
occurring if H0 is true
– H0 is rejected, the results are statistically
significant at the 5% level
Not rejecting H0
• If the P-value ≥ 0.05, there is insufficient
evidence to reject H0
– H0 is not rejected, the results are not
statistically significant at the 5% level
• NB: This does not mean that the null
hypothesis is true, simply that we do not have
enough evidence to reject it!
Parametric vs. Non-Parametric tests
• Tests which are based on the assumption that the
data follows a known probability distribution (often the
Normal) are known as parametric tests
• Sometimes data does not conform to the assumption
so non-parametric tests can be used
• Non-Parametric tests make no assumption about
the probability distribution
Non-parametric tests
• Useful when:
− sample size is small
− data is measured on a categorical scale (though
they are used for numerical data as well)
• However:
− they have less power of detecting a real difference
than the equivalent parametric tests
− they lead to decisions rather than generating a true
understanding of the data
Statistical tests
• Numerical data (Parametric tests)
– One-sample t-test
– Independent t-test
– Paired t-test
– One-way ANOVA
Statistical tests
• Numerical data, (non-parametric tests)
– Sign test
– Wilcoxon signed rank test
– Wilcoxon rank sum test
– Kruskal-Wallis test
Statistical tests
• Categorical data
– z-test for a proportion
– Sign test
– McNemar’s test
– Chi-squared test
– Chi-squared trend test
– Fisher’s exact test
Choosing a statistical test
• Useful medical statistical books will contain a
flowchart to provide guidance
• Considerations include:
– what is the data type?
– how many groups of data are there?
– can a probability distribution be assumed?
Errors in Hypothesis Testing
Making a wrong decision
• There is the possibility of making a wrong
decision when conducting a Hypothesis test
• A wrong decision may be made when rejecting
or not rejecting the Null Hypothesis
• The possible mistakes that can be made are a:
– Type I error
– Type II error
Type I error
• Rejecting the Null Hypothesis when it is true
• Concluding that there is an effect (difference)
when in reality there is none
• The maximum chance of making a Type I error
is denoted by alpha α
• α is the significance level of the test, we reject
the null hypothesis if the p-value is less than
the significance level
Type II error
• Not rejecting the Null Hypothesis when it is
false
• Concluding that there is no effect (difference)
when one really exists
• The chance of making a Type II error is
denoted by beta β
• Its compliment 1- β, is the Power of the test
Power and Sample Size
Power of the test
• The Power is the probability of rejecting the
Null Hypothesis when it is false
– i.e. the probability of making a correct decision
• The ideal power of the test is 100%
• However there is always a possibility of making
a Type II error
Sample size
• If the number of patients/samples in the study is small,
there may be inadequate power to detect an important
existing effect – wasted resources
• If the sample is too large, the study may be
unnecessarily time – consuming, expensive or
unethical
• Need to choose an optimal sample size that strikes a
balance between the implications of making a Type I or
Type II error
Calculating an optimal sample size for a test
• The following quantities need to be specified at
the design stage of the investigation in order to
calculate an optimal sample size:
– The Power
– Significance Level
– Variability
– Smallest effect of interest
Recall: 5 stages
1. Define the Null & Alternative Hypotheses
2. Collect data
3. Calculate the value of the test statistic
4. Compare the value of the test statistic to
values from a known probability distribution
5. Interpret the P-value and results
Examples
Scenario 1
• A randomised double blind trial to determine
the effect of inhaled corticosteroids on
wheezing episodes in children
• An inhaled beclomethasone dipropionate was
compared to a Placebo
• Response variable was average forced
expiratory volume (FEV) over a 6 month
period
• Sample sizes: Treatment group =50, Placebo
group = 48
Stages 1 and 2
• Stage 1: Define Ho and H1:
Ho: the mean FEV in the population of children is
the same in the two groups
H1: the mean FEV in the population of children is
different in the two groups
• Stage 2: Collect data
Graphical Analysis
Boxplots comparing treated group to control group
2.50
2.25
Forced Expiratory Volume (FEV)
2.00
1.75
1.50
1.25
1.00
Treated Group Control Group
Selecting a test
• What is the data type? Numerical
• How many groups are there? 2
• Are the groups Paired or Independent?
Independent
• Is Normality and equal variances of the data
assumed? Yes
→Unpaired (Independent) t-test
Analysis Output
Stages 3 and 4: Calculate the
Sample N Mean StDev SE Mean test statistic and compare to
1 50 1.640 0.286 0.040 values from a known probability
2 48 1.537 0.246 0.035 distribution
Difference = mu (1) - mu (2)
Estimate for difference: 0.1033
95% CI for difference: (-0.0038, 0.2104)
T-Test of difference = 0 (vs not =): T-Value = 1.91 P-Value = 0.059 DF = 96
Both use Pooled StDev = 0.2670
Stage 5: Interpret the results
• The P-value is 0.059
• There is insufficient evidence (just!) to reject Ho
at the 5% level
• There is insufficient statistical evidence of a
difference between the 2 groups
• The Power of the Test should be checked
• A Type II error may be made when not
rejecting Ho
Scenario 2
• A study was conducted to determine if a heart condition
influences the age at which children start to walk
• Response variable was age the children started to walk
• 30 children with a specific heart condition were analysed in
the study
• Children (in general) are known to start walking at an age
of 11.4 months
• Does the heart condition influence the age at which
children start to walk?
Stages 1 and 2
• Stage 1: Define Ho and H1
Ho: the mean walking age of the children with
the heart condition = 11.4 months
H1: the mean walking age of the children with
the heart condition ≠ 11.4 months
• Stage 2: Collect data
Graphical Analysis
Histogram showing walking age of children
4
Frequency
0
10 12 14 16 18
Months
Selecting a test
• What is the data type? Numerical
• How many groups are there? 1
• Is Normality of the data assumed? Yes
→One-sample t-test
Analysis Output
One-Sample T Stages 3 and 4: Calculate the test
statistic and compare to values from a
known prob distribution
Test of mu = 11.4 vs not = 11.4
N Mean StDev SE Mean 95% CI T P
30 13.158 2.583 0.472 (12.193, 14.123) 3.73 0.001
Stage 5: Interpret the results
• The P-value is 0.001
• There is strong evidence to reject Ho
• There is statistical evidence that the heart
condition influences the age at which children
start to walk
• The Probability that a Type I error has been
made in drawing this conclusion is 0.1%
Correlation and Regression
Correlation and Regression
• Correlation
– measures the strength of association
between two variables
• Regression
– models a relationship between two or
more variables
Correlation
• The degree of association between two variables is
called their correlation
• Positive correlation - when the points appear in a
band running from lower left to upper right (when x
increases, y increases)
• Negative correlation - when the points appear in a
band from upper left to lower right (when x increases,
y decreases)
• No correlation - when the points are randomly
scattered about the graph
Correlation and “Line of best fit”
Here are
some
examples
Be Careful!
"Correlation does not imply causality"
• In other words, the scatter plot may show that
a relationship exists, but it does not and cannot
prove that one factor is causing the other
• The scatter plot can only provide a clue that
two factors may be “cause and effect”
Correlation - example
• Driving test scores – written paper
• Outcome compared by plotting scores against
number of lessons (1-10)
– does score improve as the number of lessons
increases?
Scatter plot for learner drivers
170
160
150
140
marks3
130
120
110
100
90
0 2 4 6 8 10
classes
Linear Regression
• Investigates a straight line (linear) association
between variables
• Straight line fitted to the scatter diagram is
known as the regression equation
• Least squares – the sum of the squared
differences between the observed and
predicted values is minimised
Medical example
• Does increasing hardness improve abrasion resistance
for composites?
• Does increasing etch time improve bond strength to
enamel?
• Both questions require a regression approach
– using just two or three materials of different hardness
is not acceptable
– using just two etch times would not provide answers
Data
Composite Hardness Wear rate
1 120 56
2 168 46
3 290 21
4 42 98
5 78 80
6 90 65
7 130 32
Regression equation 1
A regression equation is: wear = 94.6 - 0.288 hardness
Fitted Line Plot
wear = 94.65 - 0.2882 hardness
100 S 14.5829
R-Sq 75.4%
R-Sq(adj) 70.4%
80
60
wear
40
20
0
50 100 150 200 250 300
hardness
Regression equation 2
• Etch time 5 to 60 s
• Bond strength 15 to 26 MPa
Regression equation: bond strength = 17.3 + 0.110 etch time
Fitted Line Plot
bond strength = 17.31 + 0.1103 etch time
27.5 S 2.51095
R-Sq 35.2%
R-Sq(adj) 32.2%
25.0
bond strength
22.5
20.0
17.5
15.0
0 10 20 30 40 50 60
etch time
Summary
• Part 2 Summary
• Sampling
• Statistical Hypothesis Tests
• Errors in Hypothesis Tests
• Power and Sample Size
• Examples
• Correlation and Regression