0% found this document useful (0 votes)
9 views20 pages

Experimental Design in Statistical Modeling

The document outlines key components of experimental design in statistical modeling, including hypothesis formulation, testing, and the impact of sample size and randomization. It discusses the concept of falsification in science and provides examples of null and alternative hypotheses. Additionally, it covers various sampling techniques and tasks related to simple and stratified sampling using R programming.

Uploaded by

ksb6h7bf72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views20 pages

Experimental Design in Statistical Modeling

The document outlines key components of experimental design in statistical modeling, including hypothesis formulation, testing, and the impact of sample size and randomization. It discusses the concept of falsification in science and provides examples of null and alternative hypotheses. Additionally, it covers various sampling techniques and tasks related to simple and stratified sampling using R programming.

Uploaded by

ksb6h7bf72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Am I recording?

1
B105 Applied Statistical
Modeling

William Baker Morrison


July 2024

2
Week 4 - Experimental design

3
Experimental design

● Identify the key components of statistical experiment design,


including the formulation of hypotheses.

● Analyze and compare different experimental design


strategies.

● Evaluate the impact of sample size, randomization, and


replication on the validity and reliability of experimental
results.

4
Discuss

How can we know something to be true?

5
Falsification
In science, we typically follow a process of “falsification” (i.e. to
disprove something) such that the opposite is proven true.

Introduced by Karl Pepper in 1934, falsifiability requires that a theory (or


hypothesis) should be refutable by an empirical test. That is to say a useful
theory must be testable.

In this way, we only need a single piece of evidence to disprove a whole


theory, we don’t have to collect all the data.

For example, I have a new theory:


“The centre of the earth is made of marshmallow”

Can I disprove this theory?


“In my past life, I was a knight in the crusades”

6
Hypothesis testing

7
Hypothesis formulation

Null hypothesis (H₀) - The statement we assume to be true, unless proven


otherwise (i.e. boring nothing happening)

Alternative hypothesis (H₁) - The condition state we accept as true if we reject


the null hypothesis (i.e. something interesting happening!)

Example Scenario: Does a new drug reduce symptoms of lung cancer?

H₀ - The new drug does not reduce symptoms of lung cancer.


H₁ - The new drug reduces symptoms of lung cancer.

8
Hypothesis formulation
What would be the null and alternative hypotheses for the following scenarios…

Scenario 1 - Does a new teaching method improve students' math proficiency?

Scenario 2 - Is there a relationship between hours of sleep and job performance?

Scenario 3 - Does a marketing campaign increase customer engagement on a


company's website?

Scenario 4 - Is the universe a simulation running on a PhD students’ laptop?

9
Hypothesis formulation
How do we accept or reject our hypotheses?

We assign probability to each outcome. For example…

Example Scenario: Can people tell the difference between the taste of Pepsi and
Coca cola when blind tested?

H₀ (p<=0.5) People cannot tell the difference

H₁ (p>0.5) People can tell the difference

In this case, probability is our “test statistic”

10
Hypothesis formulation
When we run our statistical tests, we are doing them in attempt to reject
a null hypothesis (i.e. to learn something new).

A common statistical test criteria is the “p-value” or significance

p-value <= 0.01 shows that our statistical test is significant

And in such a case we can REJECT our NULL hypothesis, and ACCEPT
our ALTERNATIVE hypothesis.

11
R basic syntax
Now complete swirl tutorial

- 10 & 11 of Statistical Inference course

library("swirl")

install_course("Statistical Inference")

swirl()

12
Experimental design strategies

13
Experimental design

14
Statistical models

Search for “statistical


test cheatsheet”

15
Sampling techniques

16
Sampling

As we explored earlier, it is typically


not feasible to analyse ALL the
data in a target population.

Say for example, we wanted to


understand ice cream flavour
preferences in Germany, we
wouldn’t as every single person.

We would take a SAMPLE from our


TARGET POPULATION

17
Sampling techniques

How should be select our samples from our target


population?

1. Simple random sampling


Randomly selecting samples from our target population
sample()

2. Systematic sampling
Selecting systematic sample (e.g. every 10th individual)

3. Stratified sampling
Selecting random samples from specific subpopulations
strata() / stratified()
18
R basic syntax
Now complete swirl tutorial

- 12 & 13 of Basic syntax course

19
Data selection and sampling

TASK: Simple Random Sampling

a. From the built-in dataset mtcars, randomly select 10 rows using simple random sampling. Set seed to 123
for reproducibility.

b. Compare the mean mpg (miles per gallon) of your sample with the mean mpg of the entire dataset.

TASK: Stratified Sampling

a. Using the built-in dataset mtcars, let's say we want to ensure that our sample has the same proportion of
4, 6, and 8 cylinder cars as the full dataset. Perform stratified sampling to select 5 rows that fulfill this
criterion.

b. Compare the proportions of each cylinder type (4, 6, and 8) in the stratified sample and the original
dataset.

e.g. split proportions after stratification = 4 (33%), 6 (33%), 8 (33%)

20

You might also like