STAT 201 Formula Sheet Winter 2023
Chapter 1 or a special case
Sample mean P (B) = P (A)P (B|A) + P (Ac )P (B|Ac )
Pn
xi Bayes rule
i=1
x=
n P (Ai )P (B|Ai )
P (Ai |B) =
Sample variance P (A1 )P (B|A1 ) + · · · + P (Ak )P (B|Ak )
1 X
n or a special case
s2x = (xi − x)2
n − 1 i=1 P (A)P (B|A)
P (A|B) =
P (A)P (B|A) + P (Ac )P (B|Ac )
Sample standard deviation
p De Morgan’s Law
sx = s2x
(A ∪ B)c = Ac ∩ B c and (A ∩ B)c = Ac ∪ B c
Inter-quartile range
For a discrete r.v. X, its pmf is
IQR = Q3 − Q1
p(x) = P (X = x).
1.5IQR rule p(x) is non-negative and
Q1 − 1.5IQR, Q3 + 1.5IQR X
p(x) = 1.
all x
Chapter 2 The mean and variance of X are
X
For any event A, µX = x · p(x)
all x
0 ≤ P (A) ≤ 1 and P (Ac ) = 1 − P (A). X X
2
σX = (x−µX )2 ·p(x) = [ x2 ·p(x)]−µ2X .
For an empty set ∅,
all x all x
P (∅) = 0. For a continuous r.v. X with a pdf f (x) and two
real numbers a and b where a < b,
For any two events A and B, Z b
P (A ∪ B) = P (A) + P (B) − P (A ∩ B). P (a < X < b) = f (x)dx.
a
If A and B are mutually exclusive, then f (x) is non-negative and
Z ∞
P (A ∩ B) = 0.
f (x)dx = 1.
−∞
For any two events A and B,
The mean and variance of X are
P (A ∩ B) Z ∞
P (B|A) = .
P (A) µX = x · f (x)dx
−∞
If A and B are independent, then Z ∞ Z ∞
2
σX = (x−µX )2 ·f (x)dx = [ x2 ·f (x)dx]−µ2X .
P (B|A) = P (B) and P (A ∩ B) = P (A)P (B). −∞ −∞
The median xm of X can be solved with
The law of total probability Z xm
1
P (B) = P (A1 )P (B|A1 ) + · · · + P (Ak )P (B|Ak ) f (x)dx = .
−∞ 2
1
STAT 201 Formula Sheet Winter 2023
Chapter 3 Poisson Distributions
If Y = X + c, then X ∼ P oisson(λ), then
e−λ λx
σY = σX . p(x) = for x = 0, 1, . . .
x!
2
If Y = cX, then µX = λ and σX = λ.
P (X = x) = dpois(x, λ)
σY = |c|σX .
P (X ≤ x) = ppois(x, λ)
If Y = c1 X1 + · · · + cn Xn , where Xi ’s are in-
dependent measurements and each has uncer- Normal Distributions
tainty σXi . Then
X ∼ N (µ, σ 2 ), then
q
2
σY = c21 σX 2 + · · · + c2 σ 2 .
n Xn
µX = µ and σX = σ2 .
1
1 −(x−µ)2
If f (x) = √ e 2σ 2 , −∞ < x < ∞
X1 + · · · + Xn 2πσ
X= , P (X ≤ x) = pnorm(x, µ, σ)
n
where each and every Xi has the same uncer- If P (X ≤ x) = q, then
tainty σ. Then x = qnorm(q, µ, σ)
σ z-score is
σX = √ . x−µ
n z=
σ
If Y = u(X), where u(·) is a nonlinear function.
Then Sampling Distribution of X
du(X)
σY ≈ | |σX . Let X1 , . . . , Xn be a random sample from a po-
dX
pulation with mean µ and variance σ 2 . Let X
be the sample mean. Then
Chapter 4
2 σ2
µX = µ and σX = .
Bernoulli Distributions n
X ∼ Bernoulli(p), then CLT : X follows a normal distribution approxi-
mately when n is sufficiently large.
p(x) = px (1 − p)1−x for x = 0, 1 z-score is
x−µ
z= √
2
µX = p and σX = p(1 − p). σ/ n
Sampling Distribution of Sn
Binomial Distributions
Let X1 , . . . , Xn be a random sample from a po-
X ∼ Bin(n, p), then pulation with mean µ and variance σ 2 . Let Sn
n! be the sample total. Then
p(x) = px (1−p)n−x for x = 0, 1, . . . , n
x!(n − x)! µSn = nµ and σS2 n = nσ 2 .
2
µX = np and σX = np(1 − p). CLT : Sn follows a normal distribution approxi-
mately when n is sufficiently large.
P (X = x) = dbinom(x, n, p) z-score is
sn − nµ
P (X ≤ x) = pbinom(x, n, p) z= √
nσ
2
STAT 201 Formula Sheet Winter 2023
Chapter 5 If sampled data are matched pairs, (1−α)100%
confidence interval for µX − µY when popula-
Under proper conditions (that is, you tion variances are equal :
will need to consider sample sizes, popu- sd
lation distributions,etc.) : D ± tnd −1,α/2 √ .
nd
(1 − α)100% confidence interval for µ when σ If a set of data has been given, you may use
is known : [Link]() in R to find a CI. Please refer to R do-
σ cumentation for more info.
x ± zα/2 √ .
n
(1 − α)100% confidence interval for µ when σ Chapter 6
is unknown :
s Conclusion 1 :
x ± tn−1,α/2 √ . Reject H0
n
Statistical significant
The multipliers can be found via R : p − value ≤ α
(1 − α)100% CI doesn’t contain µ0
zα/2 = qnorm(α/2, [Link] = F ), |test statistics|≥ critical point
Observed retults likey not due to chance
tn−1,α/2 = qt(α/2, n − 1, [Link] = F ). Data not consistent with H0
The sample size required for a (1 − α)100%
Conclusion 2 :
confidence interval for µ to have a precision of
Fail to reject H0
±m is
zα/2 σ 2 Not statistical significant
n=( ) . p − value > α
m
(1 − α)100% CI contains µ0
If two samples are independent, (1 − α)100%
|test statistics|< critical point
confidence interval for µX − µY when popula-
Observed retults likey due to chance
tion variances are not equal :
Data consistent with H0
s
s2X s2 Under proper conditions (that is, you
(x − y) ± tdf,α/2 + Y ,
nX nY will need to consider sample sizes, popu-
lation distributions,etc.) :
where
s2 s2
( nXX + nYY )2 The test statistic for testing H0 : µ = µ0 when
df = s2 s2
.
( nX )2 ( nY )2 σ is known :
X
nX −1 + Y
nY −1 x − µ0
zts = .
√σ
If two samples are independent, (1 − α)100% n
confidence interval for µX − µY when popula- The test statistic for testing H : µ = µ when
0 0
tion variances are equal : σ is unknown :
r
1 1 x − µ0
(x − y) ± tdf,α/2 sp + , tts = .
√s
nX nY n
If H1 : µ > µ0 , then
where
df = nX + nY − 2 and p−value = P (T ≥ tts ) = pt(tts , df, [Link] = F ).
If H1 : µ < µ0 , then
s
(nX − 1)s2X + (nY − 1)s2Y
sp = .
nX + nY − 2 p − value = P (T ≤ tts ) = pt(tts , df ).
3
STAT 201 Formula Sheet Winter 2023
2
σX
If H1 : µ ̸= µ0 , then To test H0 : 2 = 1, the test statistic is
σY
p − value = P (T ≥ |tts |) s2X
Fts = .
s2Y
= 2 ∗ pt(|tts |, df, [Link] = F ).
2
σX
If H1 is one-sided, then the critical point is If H1 : 2
σY
> 1, then
qt(α, df, [Link] = F ). p−value = pf (Fts , nX −1, nY −1, [Link] = F ).
2
If H1 is two-sided, then the critical point is σX
If H1 : 2
σY
< 1, then
qt(α/2, df, [Link] = F ). p − value = pf (Fts , nX − 1, nY − 1).
Type I error is Falsely rejecting a true null If H : σX 2
1 σ 2 ̸= 1, then
hypothesis. Type II error is Falsely failing to Y
reject a false null hypothesis when the alterna- p−value = 2∗pt(F , n −1, n −1, [Link] = F ), if F > 1;
ts X Y ts
tive hypothesis is true.
p−value = 2∗pt(Fts , nX −1, nY −1), if Fts < 1.
To test H0 : µX − µY = 0 :
If you use the F distribution table to find the
p-value, then you need to refer to the lecture
If two samples are independent with unequal
notes method to form the test statistic.
population variances, then the test statistic is
x−y If a set of data has been given, you may use
tts = q , [Link]() in R to perferm a test. Please refer to
s2X s2Y
nX + nY R documentation for more info.
with
s2 s2Y 2
( nXX + nY ) Chapter 9
df = s2 s2
.
( nX )2 ( nY )2
X
+ Y You will always need to check proper
nX −1 nY −1
condistions so that the following formu-
If two samples are independent with equal po- lae can be used.
pulation variances, then the test statistic is
x−y One way ANOVA model is
tts = q ,
sp n1X + 1
nY
Xij = µi + ϵij
where i = 1, . . . , I for levels and j = 1, . . . , J
where
for replicates.
df = nX + nY − 2 and
s
(nX − 1)s2X + (nY − 1)s2Y To test H0 : µ1 = . . . = µI , use the one way
sp = . ANOVA table as follows :
nX + nY − 2
If sampled data are matched pairs, then the test
statistic is
D
tts = sd .
√
nd
where
If a set of data has been given, you may use I
[Link]() in R to perferm a test. Please refer to
X
SST r = Ji (X i· − X ·· )2
R documentation for more info. i=1
4
STAT 201 Formula Sheet Winter 2023
I
r
X M SE
SSE = (Ji − 1)s2i x·j· ± tIJ(K−1),α/2 .
IK
i=1
Block design and 23 factorial experiments use
SST otal = SST r + SSE
similar techniques as shown in one-way or two-
The critical value for one way ANOVA F test way factor analysis. Please study these two
is sections for more information.
qf (α, I − 1, N − I, [Link] = F ). If data are given, you may use lm(), anova() and
confint(), etc. in R to perferm ANOVA tests.
If the factor/treatment effect is significant,
Please refer to R documentation for more info.
compute the CI for each µi as
r
M SE
xi ± tN −I,α/2 .
Ji Chapter 7
Two way ANOVA model is
You will always need to check proper
Xij = µ + αi + βj + γij + ϵijk condistions so that the following formu-
lae can be used.
where i = 1, . . . , I for levels of facotr A, j =
1, . . . , J for levels of factor B and k = 1, . . . , K Correlation coefficient is
for replicates. 1 xi − x yi − y
r= ( )( ) = cor(x, y).
n − 1 sX sY
To test interaction effects and main effcts, use
the two way ANOVA table as follows : The fitted regression line is
ŷ = β̂0 + β̂1 x,
where
Pn
(x − x)(yi − y) sy
β̂1 = Pn i
i=1
2
=r
(x
i=1 i − x) sx
β̂0 = y − β̂1 x.
The estimated effects are
The coefficient of determination is
α̂i = xi·· − x··· Pn Pn
2 SSR − y)2 − i=1 (yi − ŷ)2
i=1 (yi P
r = = n
β̂j = x·j· − x··· SST otal i=1 (yi − y)
2
γ̂ij = xij· − xi·· − x·j· + x··· The redisdual is
The critical value for a two way ANOVA F test
ei = yi − ŷ.
is
The standard deviation of errors is
qf (α, dfnumerator , dfdenominator , [Link] = F ). s Pn
(1 − r)2 i=1 (yi − y)2
If the interation effect is not siginificant but a s=
n−2
factor main effect is significant, compute the CI
for the factor means of the significant factor(s)
The standard deviation of β̂0 is
as s
1 x2
r
M SE sβ̂0 = s + Pn
xi·· ± tIJ(K−1),α/2 and/or 2
JK n i=1 (xi − x)
5
STAT 201 Formula Sheet Winter 2023
The standard deviation of β̂1 is Test of one or two Means
s [Link](x, alternative=__,
sβ̂1 = pPn mu=__, [Link]=__)
i=1 (xi − x)2
[Link](x, y, alternative=__,
mu=__, paired=__, [Link]=__)
Inference about β̂1 can be obtained through
[Link](Resp~Factor, data, alternative=__),
mu=__, paired=__, [Link]=__)
β̂1 ± tn−2,α/2 sβ̂1
pt(q, df, [Link]=__)
β̂1 − β1claim qt(p, df, [Link]=__)
tts =
sβ̂1
Inference about β̂0 can be obtained through Test of Equality of Variance
[Link](x, y, alternative=__,
β̂0 ± tn−2,α/2 sβ̂0 [Link]=__,)
pf(q, df1, df2,[Link]= __)
β̂0 − β0claim
tts = qf(p, df1, df2, [Link]=__)
sβ̂0
Inference about the mean repsonse y for a given
x is ANOVA Models
ŷ ± tn−2,α/2 sŷ You may need to use the
factor() function in some cases.
ŷ − µ0
tts = tapply(Resp, list(FactorA), mean)
sŷ
fit<-lm(Resp~FactorA, data)
where anova(fit)
ŷ = β̂0 + β̂1 x
s fit1<-lm(Resp~FactorA-1, data)
confin(fit1)
1 (x − x)2
sŷ = s + Pn 2
n i=1 (xi − x)
tapply(Resp, list(FactorA), mean)
tapply(Resp, list(FactorB), mean)
and µ0 is the claimed mean response value for tapply(Resp, list(FactorA,FacotrB), mean)
the given x value. fit<-lm(Resp~FactorA*FactorB, data)
Prediction interval for a future y at a given x anova(fit)
is fit<-lm(Resp~FactorA+FactorB, data)
ŷ ± tn−2,α/2 spred anova(fit)
pf(q, df1, df2,[Link] =FALSE)
where
ŷ = β̂0 + β̂1 x qf(p, df1, df2, [Link] =FALSE)
s
1 (x − x)2
spred = s 1 + + Pn .
n i=1 (xi − x)
2 Simple Linear Regression Models
cor(x,y)
R functions model<-lm(Resp~Expl, data)
summary(model)
x and y refer to generic variables.
anova(model)
You will need to provide proper values at __.
Expl<-[Link](Expl=new_value)
predict(model, newdata=Expl, interval=__)
Descriptive Statistics confint(model, level=__)
mean(x) sd(x)
var(x) median(x)
min(x) max(x) If functions that you would like to use aren’t provi-
quantile(x) summary(x)
ded in this list, please feel free to search them in the
hist(x) boxplot(x)
lecture notes or R documentation.