MSc Public Health
LDP7004- Introduction to
Epidemiology
Week 1: lecture 1- Statistics in
epidemiology
Module Director(s):
Lecturer(s):
#WeAreYSJ
@YorkStJohn
@YorkStJohnUniversity
Getting to Know
you
Please scan this barcode
Or join using code: 7951 0105
Objectives
1. Understand the contributions
of statistics to epidemiology
2. Introduction to key statistical
applications and concepts used in
epidemiological studies
Link to prior learning
Introduction to epidemiology(what
and why)
Types of epidemiology
Introduction
What is statistics?
Statistics and epidemiology?
Introduction
Epidemiological studies are either observational or experimental.
• Observational studies may be descriptive, which generate
hypotheses for further studies
Or
• Analytic, which test hypotheses and can report associations.
Or
• Experimental studies are used to test hypotheses and report
cause and effect relationships
Introduction- statistics in epidemiology
• Statistics can be used as a valuable tools in addressing
research questions of interest in epidemiological study
• Statistics can be used to draw conclusions or to make
predictions about what may happen in a population.
• Different types of data and information call for different types of
statistics.
Why biostatistics in epidemiology
• The key role of statistical modelling in epidemiology and public health is
indisputable
• Biostatistics, which is also called biometry, studies biological phenomena
through the use of mathematical principles, statistical modeling,
methodologies and processes
• The methods and tools of biostatistics are extensively used to understand
disease development, uncover the aetiology, and evaluate the
development of new strategies of prevention and control of the disease.
• Through data analysis, epidemiology can steer decision-making
processes, guide health and healthcare policy, and plan and assist in the
management and care of health and disease in individuals
(Matranga, Bono, & Maniscalco, 2021, Kier 2011)
Statistical analysis
• The type of data determines the statistical procedures that are
utilised.
• Data are typically described in a number of ways namely : type,
distribution, location and variation
• Most epidemiologic data are now analysed by using powerful
statistical models that require software and computers:
SPSS
SAS
Stata
MLwiN- specialist multilevel modelling software
(Kier, 2011)
Description of data in epidemiology: types of
data(variable)
• Nominal data -The simplest type of data; is used to label variables without
any order or quantitative value. E.g nationality, gender or occupation,
race/ethnicity etc
• Ordinal data- is qualitative data for which their values have some kind of
relative position.. E.g educational level, socioeconomic status, number of
children and stage of disease
• Interval data have a fixed mathematical difference between each point. E.g
temperature lies on an interval scale
• Ratio data /Continuous data- – are part of a set that does have a definite
zero point. e.g height or weight, age they are measurable and can be in
the form of fractions or decimal
• Discrete data are countable and finite; they are whole numbers or integers;
(Matranga,
e.g numberBono, & Maniscalco,
of students in a class 2021, Keir, 2011).
Types of data analysis
Descriptive Inferential statistics
Descriptive statistics summarise data and Used when it is impracticable
enable quicker understanding of what to have all the data for a population you
the data represent; allowing us to are interested in.
understand This might be due to the size of the
how representative the data set is of the population – larger populations will be
population we are interested in. more difficult and costly to measure.
This can be: Inferential statistics involves:
Distribution, measures of central • Estimating parameters
tendency (averages) and of dispersion • Hypothesis testing
Can be presented as graphs, histograms,
frequency polygons • (Keir, 2011)
Descriptive analysis- frequency distribution
Measures of central tendency Measures of Dispersion/spread
Mean - the (Sum of values/ Number of cases), the Range-The range of a set of data is the difference
mean is a better estimate of the centre of the data between its largest (maximum) value and its smallest
for normal data (minimum) value. In the epidemiologic community, the
range is usually reported as “from (the minimum) to
(the maximum),” that is, as two numbers rather than
one. Percentiles , quartiles, Interquartile range
Median- as the value at the midpoint of a Standard Deviation -is the measure of spread used
distribution, the most commonly with the arithmetic mean
median is a more reliable estimate of the centre of is usually calculated only when the data are more-or-
the data less “normally distributed,” i.e., the data fall into a
for skewed data. typical bell-shaped curve.
Mode - the value which occurs most frequently in a
distribution.
(Kier, 2011)
Frequency distribution
• The idea of a normal (also called
a Gaussian) distribution is very
important to understand.
• It is a theoretical distribution that
can be described as a bell-
shaped curve that is symmetrical
around the mean value of the
sample or population.
• In a normal distribution, the
median will be the same as the
mean and the mode.
• In a normal distribution, 68.3% of
data will lie within one standard
deviation either side of the
mean, 95.4% within two
(Kier, 2011)
Commonly used terms/concepts in inferential statistics
Statistical significance- means that a result is unlikely to have arisen by chance alone.
P- value : it is the chance of obtaining a result at least as extreme if the null hypothesis were true.
Therefore, if the
p- value is <0.05 there is less than a 5% chance that the result of a study would have occurred if there
was no relationship between the variables being studied.
The null hypothesis - often used to explain the idea of statistical significance; this is the hypothesis that
there is no relationship between variables. The null hypothesis is rejected if a statistically significant
result is found.
Type I error- falsely rejecting the null hypothesis when there is not a relationship between the variables
being
studied.
Type II error - failing to reject the null hypothesis when it is actually false.
This can be understood as a false negative, not finding a relationship between the variables being
studied when there is one.
(Matranga, Bono, & Maniscalco, 2021).
Confidence limits (confidence interval)
Epidemiologists mostly conduct studies not only to measure characteristics
in the subjects studied, but also to make generalisations about the larger
population from which these subjects came. This process is called inference
In epidemiology, a common way to indicate a measurement’s precision is by
providing a confidence interval.
A narrow confidence interval indicates high precision; a wide
confidence interval indicates low precision.
In public health, investigators generally want to have a greater level of
confidence than that, and usually set the confidence level at 95%.
More commonly, epidemiologists interpret a 95% confidence interval as the
range of values consistent with the data from their study.
(Matranga, Bono, & Maniscalco, 2021).
Feedforward activity(s)
PRE- SEMINAR TASK:
• Reflection on choice of data analysis
• Carrying out statistical analysis
Guetterman, T.C.(2019). Basics of statistics for primary care research. Fam Med
Com Health 7:e000067. doi:10.1136/ fmch-2018-000067
Kier K. L. (2011). Biostatistical applications in epidemiology. Pharmacotherapy, 31(1),
9–22. [Link]
Conclusion of objectives
1. Understand the contributions of statistics to
epidemiology
2. Introduction to key statistical applications and
concepts used in epidemiological studies
Support available
• Academic Quality to check in with Iain Pullar following recruitment summer 2023
References
• Guetterman, T.C.(2019). Basics of statistics for primary care
research. Fam Med Com Health 7:e000067. doi:10.1136/ fmch-
2018-000067
• Kier, K. L. (2011). Biostatistical applications in
epidemiology. Pharmacotherapy, 31(1), 9–22.
[Link]
• Matranga, D., Bono, F., & Maniscalco, L. (2021). Statistical
Advances in Epidemiology and Public Health. International journal
of environmental research and public health, 18(7), 3549.
Available from [Link] (Accessed
9th May 2023)