Am I recording?
1
B105 Applied Statistical
Modeling
William Baker Morrison
July 2024
2
Week 4 - Experimental design
3
Experimental design
● Identify the key components of statistical experiment design,
including the formulation of hypotheses.
● Analyze and compare different experimental design
strategies.
● Evaluate the impact of sample size, randomization, and
replication on the validity and reliability of experimental
results.
4
Discuss
How can we know something to be true?
5
Falsification
In science, we typically follow a process of “falsification” (i.e. to
disprove something) such that the opposite is proven true.
Introduced by Karl Pepper in 1934, falsifiability requires that a theory (or
hypothesis) should be refutable by an empirical test. That is to say a useful
theory must be testable.
In this way, we only need a single piece of evidence to disprove a whole
theory, we don’t have to collect all the data.
For example, I have a new theory:
“The centre of the earth is made of marshmallow”
Can I disprove this theory?
“In my past life, I was a knight in the crusades”
6
Hypothesis testing
7
Hypothesis formulation
Null hypothesis (H₀) - The statement we assume to be true, unless proven
otherwise (i.e. boring nothing happening)
Alternative hypothesis (H₁) - The condition state we accept as true if we reject
the null hypothesis (i.e. something interesting happening!)
Example Scenario: Does a new drug reduce symptoms of lung cancer?
H₀ - The new drug does not reduce symptoms of lung cancer.
H₁ - The new drug reduces symptoms of lung cancer.
8
Hypothesis formulation
What would be the null and alternative hypotheses for the following scenarios…
Scenario 1 - Does a new teaching method improve students' math proficiency?
Scenario 2 - Is there a relationship between hours of sleep and job performance?
Scenario 3 - Does a marketing campaign increase customer engagement on a
company's website?
Scenario 4 - Is the universe a simulation running on a PhD students’ laptop?
9
Hypothesis formulation
How do we accept or reject our hypotheses?
We assign probability to each outcome. For example…
Example Scenario: Can people tell the difference between the taste of Pepsi and
Coca cola when blind tested?
H₀ (p<=0.5) People cannot tell the difference
H₁ (p>0.5) People can tell the difference
In this case, probability is our “test statistic”
10
Hypothesis formulation
When we run our statistical tests, we are doing them in attempt to reject
a null hypothesis (i.e. to learn something new).
A common statistical test criteria is the “p-value” or significance
p-value <= 0.01 shows that our statistical test is significant
And in such a case we can REJECT our NULL hypothesis, and ACCEPT
our ALTERNATIVE hypothesis.
11
R basic syntax
Now complete swirl tutorial
- 10 & 11 of Statistical Inference course
library("swirl")
install_course("Statistical Inference")
swirl()
12
Experimental design strategies
13
Experimental design
14
Statistical models
Search for “statistical
test cheatsheet”
15
Sampling techniques
16
Sampling
As we explored earlier, it is typically
not feasible to analyse ALL the
data in a target population.
Say for example, we wanted to
understand ice cream flavour
preferences in Germany, we
wouldn’t as every single person.
We would take a SAMPLE from our
TARGET POPULATION
17
Sampling techniques
How should be select our samples from our target
population?
1. Simple random sampling
Randomly selecting samples from our target population
sample()
2. Systematic sampling
Selecting systematic sample (e.g. every 10th individual)
3. Stratified sampling
Selecting random samples from specific subpopulations
strata() / stratified()
18
R basic syntax
Now complete swirl tutorial
- 12 & 13 of Basic syntax course
19
Data selection and sampling
TASK: Simple Random Sampling
a. From the built-in dataset mtcars, randomly select 10 rows using simple random sampling. Set seed to 123
for reproducibility.
b. Compare the mean mpg (miles per gallon) of your sample with the mean mpg of the entire dataset.
TASK: Stratified Sampling
a. Using the built-in dataset mtcars, let's say we want to ensure that our sample has the same proportion of
4, 6, and 8 cylinder cars as the full dataset. Perform stratified sampling to select 5 rows that fulfill this
criterion.
b. Compare the proportions of each cylinder type (4, 6, and 8) in the stratified sample and the original
dataset.
e.g. split proportions after stratification = 4 (33%), 6 (33%), 8 (33%)
20