LECTURE OUTLINE
1
Experiment
Test or Series of tests in which:
variables) are changed
response on the output
Examples:
DATA COLLECTION/RECORDING
hus, the knowledge of experimental design (DOE) is highly requested.
Experimental Precision and Accuracy:
Precision :level of reproducibility of a given observation or analysis
Accuracy : how exact is the obtained experimental data from either the expected or theoretical value .
2
STATISTICS
Field of study that deals with data:
Data can be obtained through observation/experimentation
Biostatistics is the statistical analysis of biological data.
3
TYPES OF STATISTICS
1. Descriptive Statistics:
Described the central tendency of the data i.e. the position or value of the center point e.g. mode, mean
and median
Arrangement or dispersion of the data, how closely or far apart the values are from one another.
Described the distribution of the data arrangement e.g. skewedness (probability curve-tail), Kurtosis
(probability curve-peak), symmetrical, logarithmic, bell-shaped, sigmoid etc.
2. Inferential (Estimation) Statistics
It described the population statistics using the sample’s statistics
Compare the significant differences between two or more populations
Define or test relationships between variables
4
SOME STATISTICAL TERMINOLOGY
Population:
A population is the collection of objects /subjects under the study. Example products, people, animals,
microbes, proteins, explants, etc.
Population size (number) normally denoted by (N), can varied from as small as tenth to as large as billionth or
trillionth.
Example of study population: all car owners in Nigeria.
Sample:
A sample is a subset of the population under study.
Sample size is represented by (n)
For example, 16 randomly selected Sprague Dawley rats and 5 species out of 50 isolates etc.
Mode: data attributes or value with highest observed occurrence
Median: The middle value or attribute of the distribution
5
6
Variables
Are the objects ( e.g. plant species, microbial isolates etc.) or traits (characteristics such as yield, stability,
transformation efficiency etc.) which are being measured or observed.
For example, body weight, Age, bird species, temperature, lipid contents, molecular weight, enzyme activity,
number of amplicon per PCR cycle.
7
Types of Variables
Variables with two options such as true or false, and Yes or NO options are called Dichotomous variables.
Thus, variable can also be dependent or independent
The Dependent (Response) mostly plotted on Y-axis, is the main trait under observation.
The Independents are the inputted factors or treatments administered. Normally placed on X-axis
Example, The effect of chicken feeds supplement on the animal’s weight. The dependent variable is the
Animal weight while the independent variable is the supplemented feed type
Analysis with one dependent variable is called Univariate, while that of multiple variables is described as
Multivariate analysis
8
Variable can also be classified as either:
Qualitative (categorical) :Mostly consist of text, classified further into:
Norminal: are labels variables e.g. species isolates, cultivars type, gender, etc. we can only count these
variables nothing more.
Ordinal: we can count and rank them based on their levels or stages, e.g. education, disease conditions,
reproduction, life cycle and growth profile, etc.
Quantitative: Are numerical values, can use to perform calculations e.g. temperature, pressure, age,
weight etc. further classified into:
Interval: are numerical variables with specified intervals, can count , rank them and perform numerical
calculations. E.g. Temperature, we can compute a difference between normal body temperature of 37°C to that
of feverish body (42°C). Microbial culture (between 1 day old culture and 3 days old one), differences in
9
experimental time. It should be noted that INTERVAL variable have no true zero point. Zero in them is just a
reference, e.g. zero day microbial culture means at the initial day, 0°C does not mean no temperature, but
rather a reference point at which water freezes.
Ratio: constitute most of the quantitative variables , can be counted, ranked, take difference and ratio, and
they do have a true zero point. E.g. weight, Age etc.
Discrete: these are normally obtained by counting e.g. Number of colonies per plate. They are always finite
(whole) numbers i.e. you can only have 10, 50, 40, 300, NOT 10.5, 9.1, 3.7….
Continuous : can take infinite numbers (1,1.4,2,4.5..), they are values normally obtained by measurements
e.g. number of amplicons per PCR cycles, weight and Age
10
MEAN
The population Mean (μ) or sample mean (X) is the expected value of the statistical data. In other words, it is
the average of the data under analysis i.e.:
𝜇= Σ𝑋𝑁 (Eq. 1) 𝑋 = Σ𝑋𝑛 (Eq. 2)
where N or n is the respective population or sample size, and ΣX (X1+X2+X3……Xi).
Example, the following is a data describing ISI publication from a certain university over a period of 5 years.
2009 (579), 2010(456),2011(648),2012(567)and 2013(65).
𝑚𝑒𝑎𝑛= 520+313+648+587+1305=𝟒𝟑𝟗.𝟔
Variance (population variance σ2, and sample variance S2)
Is a measure on how far a set of numbers are spread out or related to one another and the mean.
Variance is always a positive number.
Zero variance value means responses have equal values
11
Small variance value means response values are closer to the mean and closer to each other
Alternatively, large variance value denote the response are widely or unevenly spread out form each
other and the mean as well.
Empirically, population variance is calculated by:
1. Find the differences of each response value from the population mean
2. Square the differences and Sum-up
3. Divide the sum-up value with population number (N).
𝜎2= Σ𝑥−𝜇2𝑁 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒= 80.42+−126.62+208.42+147.42+−309.625= 𝟑𝟔𝟕𝟎𝟎.𝟐𝟒
Note: If the variance to be calculated is from a sample data taken out of a total population, then the denominator
will be n-1.
12
Standard Deviation (population SD = σ, sample SD = S)
square root of variance𝜎2.
describes how the data are spread out from one another and the mean.
𝑆𝐷= 36700.24=191.57
In conclusion
r research to
others in a comprehensible way.
13
variables.
Standard deviation, explain the concept of statistics.
14
SAMPLING IN STATISTICS
Samples are parts of a population. For example, you might have a list of information on 100 people (your
“sample”) out of 10,000 people (the “population”).
ideal i.e. not too large or too small.
Then once you’ve decided on a sample size, you must use a sound technique to collect the sample from
the population.
Probability Sampling uses randomization to select sample members. Example a chance of picking red
apple out of 100 apples in a basket.
Non-probability sampling uses non-random techniques (i.e. the judgment of the researcher). You can’t
calculate the odds of any particular item, person or thing being included in your sample.
COMMON SAMPLING TYPES
15
Bernoulli samples: have independent Bernoulli trials (experiment with two outcomes) on population elements.
Samples are selected based on the trials outcomes. The sample sizes in Bernoulli samples follow a binomial
distribution.
Cluster samples: divide the population into groups (clusters). Then a random sample is chosen from the
clusters. It’s used when researchers don’t know the individuals in a population but do know the population
subsets or groups. Systematic samples: you select sample elements from participants list.
Simple Random Sampling (SRS): Select items completely randomly, so that each element has the same
probability of being chosen as any other element.
Stratified sampling is like cluster sampling, you divide the main population each into homogenous
subpopulation. You then apply simple random or a systematic method to choose sample from each
subpopulation independently. Stratified Randomization: a sub-type of stratified used in clinical trials. First,
divide patients into strata, then randomize with permuted block randomization.
16
Bootstrap Sample: Select a smaller sample from a larger sample with Bootstrapping (a type of resampling
where you draw large numbers of smaller samples of the same size, with replacement, from a single original
sample). Maximum Variation Samples when you want to include extremes (like rich/poor or young/old).
Respondent Driven Sampling. A chain-referral sampling method where participants recommend other people
they know.
SAMPLE ERROR
margin of error.
out of a 1000, and you got 19.357%. If the actual
percentage is 19.300%, the difference (19.357 – 19.300) of 0.057 or 0.3% = the margin of error. 𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓
𝑒𝑟𝑟𝑜𝑟= 1𝑛 where n is the sample size.
17
of error, except in cluster sampling, where it may increase due
to similarities among clusters members
Non-sampling error could be one reason as to why there’s a difference between the sample and the
population. This is due to poor data collection methods (like faulty instruments or inaccurate data recording),
selection bias, nonresponse bias (where individuals don’t want to or can’t respond to a survey), or other
mistakes in collecting the data.
hey key is to avoid making the errors in the first
place with a well-planned design of the survey or experiment.
Computing Sample Size
sampling so that marginal errors are minimized.
18
om the example presented in CI, we have seen that marginal error is affected by the sample or replicates
size, the more the sample the less the marginal error.
or can be
calculated.
be obtained either by:
a clinical study, you may be able to use a table
published in Machin et. al’s Sample Size Tables for Clinical Studies, Third Edition.
19
u know (or don’t know)
about your population. If you know some parameters about your population (like a known standard deviation),
you can use the techniques below. If you don’t know much about your population, use Slovin’s formula.
20