0% found this document useful (0 votes)
4 views13 pages

Statistics Complete Guide

This study guide covers essential statistical concepts including confidence intervals, hypothesis testing, ANOVA, and chi-square tests, providing solved examples and formula shortcuts. It is structured into chapters that explain each topic in detail, starting from basic definitions to complex applications. The guide aims to assist beginners in understanding and applying statistical methods effectively.

Uploaded by

aashay
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views13 pages

Statistics Complete Guide

This study guide covers essential statistical concepts including confidence intervals, hypothesis testing, ANOVA, and chi-square tests, providing solved examples and formula shortcuts. It is structured into chapters that explain each topic in detail, starting from basic definitions to complex applications. The guide aims to assist beginners in understanding and applying statistical methods effectively.

Uploaded by

aashay
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STATISTICS

Complete Beginner's Study Guide

Confidence Intervals • Hypothesis Testing • ANOVA • Chi-Square

With Solved Examples · Easy to Hard · Exam Hints & Formula Shortcuts

TABLE OF CONTENTS

■ Chapter 1: Confidence Interval Estimation

■ Chapter 2: Hypothesis Testing — One-Sample Tests

■ Chapter 3: Two-Sample Tests

■ Chapter 4: One-Way ANOVA

■ Chapter 5: Chi-Square Tests

■ Chapter 6: Exam Keyword → Formula Cheat Sheet

Statistics Study Guide | Page 1


CHAPTER 1: CONFIDENCE INTERVAL ESTIMATION

What Is a Confidence Interval?


A confidence interval (CI) is a range of values that we are fairly sure contains the true population
parameter (like the mean or proportion). Instead of saying "the average is exactly 50", we say "we are 95%
confident the average is between 47 and 53."

■ Key Concept

95% CI means: If we repeated the study 100 times, about 95 of those intervals would contain the true
value.

Wider interval: More confidence but less precision.

Narrower interval: Less confidence but more precision.

Core Formulas
CI for Mean (population SD known — use Z)

CI = x■ ± Z* × (σ / √n)

x■ = sample mean

Z* = critical value (1.645 for 90%, 1.96 for 95%, 2.576 for 99%)

σ = population standard deviation

n = sample size

CI for Mean (population SD unknown — use t)

CI = x■ ± t* × (s / √n)

s = sample standard deviation

t* = t-critical value with df = n − 1 (from t-table)

When to use t: σ is unknown AND/OR n < 30

CI for Proportion

CI = p■ ± Z* × √(p■(1−p■) / n)

p■ = sample proportion (number of successes / n)

Use Z*: 1.96 for 95% CI (most common)

Quick Reference: Critical Values


Confidence Level α Z* (two-tail)

90% 0.10 1.645

Statistics Study Guide | Page 2


95% 0.05 1.960

99% 0.01 2.576

EXAMPLE 1 (Easy) — CI for Mean with Z

A company measures the weight of 36 bags of chips. The sample mean is 150g. The population SD is
known to be 12g. Construct a 95% confidence interval for the true mean weight.

Step 1 Identify values: x■ = 150, σ = 12, n = 36, Z* = 1.96 (for 95%)

Step 2 Calculate SE (Standard Error): SE = σ/√n = 12/√36 = 12/6 = 2

Step 3 Calculate margin of error: ME = Z* × SE = 1.96 × 2 = 3.92

Step 4 CI = x■ ± ME = 150 ± 3.92 → (146.08 , 153.92)

Answer ■ We are 95% confident the true mean weight is between 146.08g and 153.92g.

EXAMPLE 2 (Hard) — CI for Mean with t + Proportion

A random sample of 16 students scored: Mean = 72, s = 8. Population SD is unknown. Construct a 95%
CI for the true mean score. Also, if 10 out of 16 students passed, find the 95% CI for the pass proportion.

Part A — CI for Mean (t-distribution)


Step 1 σ is unknown, n=16 < 30 → use t-distribution

Step 2 df = n−1 = 16−1 = 15; t* = 2.131 (from t-table, df=15, 95%)

Step 3 SE = s/√n = 8/√16 = 8/4 = 2

Step 4 ME = t* × SE = 2.131 × 2 = 4.262

Step 5 CI = 72 ± 4.262 → (67.74 , 76.26)

Answer ■ 95% CI for mean score: (67.74, 76.26)

Part B — CI for Proportion


Step 1 p■ = 10/16 = 0.625; n = 16; Z* = 1.96

Step 2 SE = √(p■(1−p■)/n) = √(0.625×0.375/16) = √(0.01465) = 0.121

Step 3 ME = 1.96 × 0.121 = 0.237

Step 4 CI = 0.625 ± 0.237 → (0.388 , 0.862)

Answer ■ 95% CI for pass rate: (38.8%, 86.2%)

Statistics Study Guide | Page 3


CHAPTER 2: HYPOTHESIS TESTING — ONE-SAMPLE TESTS

What Is Hypothesis Testing?


Hypothesis testing is a method to make decisions about a population using sample data. We start with
an assumption (null hypothesis) and check if data gives enough evidence to reject it.

■ Key Concepts

H■ (Null Hypothesis): The "no effect / no difference" claim. Example: µ = 50

H■ (Alternative Hyp.): What we want to prove. Example: µ ≠ 50 or µ > 50 or µ < 50

p-value: Probability of getting results as extreme as observed, assuming H■ is true.

α (significance level): Usually 0.05. If p-value < α → Reject H■.

Steps for ANY Hypothesis Test


Step 1 State H■ and H■

Step 2 Choose α (significance level, usually 0.05)

Step 3 Calculate the test statistic (Z or t)

Step 4 Find p-value OR compare test stat with critical value

Step 5 Decision: If p-value < α → Reject H■. Otherwise → Fail to Reject H■

Step 6 Conclusion in plain English

One-Sample Z-Test (σ known)


Z = (x■ − µ■) / (σ / √n)

µ■ = the claimed/hypothesized population mean

Use when: σ is known and n is large (n ≥ 30)

One-Sample t-Test (σ unknown)


t = (x■ − µ■) / (s / √n) df = n − 1

Use when: σ is unknown and/or n < 30

df = degrees of freedom = n − 1

Types of Hypothesis Tests (Tails)


Type H■ looks like Reject H■ if

Two-tailed µ ≠ µ■ |Z| > Z* or p < α

Statistics Study Guide | Page 4


Right-tailed µ > µ■ Z > Z* or p < α

Left-tailed µ < µ■ Z < −Z* or p < α

EXAMPLE 3 (Easy) — One-Sample Z Test

A factory claims its bolts have a mean length of 5 cm (µ■ = 5). A sample of 49 bolts shows x■ = 5.1 cm.
The population SD σ = 0.35 cm. At α = 0.05, is there enough evidence to reject the factory's claim?
(Two-tailed test)

Step 1 H■: µ = 5 | H■: µ ≠ 5 (two-tailed)

Step 2 α = 0.05; Critical Z* = ±1.96

Step 3 Z = (x■ − µ■)/(σ/√n) = (5.1 − 5)/(0.35/√49) = 0.1/(0.35/7) = 0.1/0.05 = 2.0

Step 4 |Z| = 2.0 > 1.96 (critical value) → Reject H■

Answer ■ There IS sufficient evidence to reject the claim. The mean bolt length is not 5 cm.

EXAMPLE 4 (Hard) — One-Sample t Test

A diet company claims its program reduces weight by more than 5 kg on average. A sample of 20
participants lost: x■ = 5.8 kg, s = 1.5 kg. Test at α = 0.05 if the claim is supported. (Right-tailed test)

Step 1 H■: µ ≤ 5 | H■: µ > 5 (right-tailed)

Step 2 n=20, σ unknown → t-test. df = 20−1 = 19. t* = 1.729 (one-tail, α=0.05, df=19)

Step 3 t = (x■ − µ■)/(s/√n) = (5.8 − 5)/(1.5/√20) = 0.8/(1.5/4.472) = 0.8/0.335 = 2.388

Step 4 t = 2.388 > t* = 1.729 → Reject H■

Answer ■ There is sufficient evidence that the diet reduces weight by MORE than 5 kg.

Statistics Study Guide | Page 5


CHAPTER 3: TWO-SAMPLE TESTS

What Are Two-Sample Tests?


We compare two groups to see if their means (or proportions) are different. Example: Do men and
women differ in average salary? Do two drugs differ in effectiveness?

■ Independent vs Paired

Independent samples: Two completely separate groups. Example: Group A gets Drug X, Group B gets
Drug Y.

Paired samples: Same people measured twice (before/after). Example: weight before and after diet.

Independent Two-Sample t-Test


t = (x■■ − x■■) / √(s■²/n■ + s■²/n■)

H■: µ■ = µ■ (no difference between groups)

H■: µ■ ≠ µ■ or µ■ > µ■ or µ■ < µ■

df (approx): Use smaller of n■−1 and n■−1 (conservative) or Welch formula

Paired t-Test (Same subjects, two measurements)


t = d■ / (s_d / √n) where d■ = mean of (x■ − x■) differences

d_i = difference for each pair = x■■ − x■■

d■ = mean of all differences

s_d = standard deviation of the differences

df = n − 1 (n = number of pairs)

EXAMPLE 5 (Easy) — Independent Two-Sample t-Test

Group A (n=10): x■■=78, s■=5. Group B (n=10): x■■=74, s■=6. Test if Group A scores significantly
higher at α=0.05. (Right-tailed)

Step 1 H■: µ■ = µ■ | H■: µ■ > µ■

Step 2 df = min(10−1, 10−1) = 9; t* = 1.833 (one-tail, α=0.05, df=9)

Step 3 t = (78−74)/√(25/10 + 36/10) = 4/√(2.5+3.6) = 4/√6.1 = 4/2.470 = 1.619

Step 4 t = 1.619 < t* = 1.833 → Fail to Reject H■

Answer ■ Not enough evidence that Group A scores significantly higher than Group B.

Statistics Study Guide | Page 6


EXAMPLE 6 (Hard) — Paired t-Test

A trainer records weights (kg) of 5 clients BEFORE and AFTER a 6-week program. Test if the program
caused significant weight loss at α=0.05. (Left-tailed: after < before)

Client Before After d = Before−After

1 85 82 3

2 90 86 4

3 78 75 3

4 95 91 4

5 88 84 4

Mean d■ = 3.6

Step 1 H■: µ_d = 0 | H■: µ_d > 0 (before > after means weight was lost)

Step 2 d values: 3, 4, 3, 4, 4. d■ = (3+4+3+4+4)/5 = 18/5 = 3.6

Step 3 s_d = √[Σ(d−d■)²/(n−1)]. Deviations: −0.6, 0.4, −0.6, 0.4, 0.4 → Σ(dev²) =
0.36+0.16+0.36+0.16+0.16 = 1.20. s_d = √(1.20/4) = √0.30 = 0.548

Step 4 t = d■/(s_d/√n) = 3.6/(0.548/√5) = 3.6/(0.548/2.236) = 3.6/0.245 = 14.69

Step 5 df=4; t*=2.132 (one-tail, α=0.05, df=4). t=14.69 >> 2.132 → Reject H■

Answer ■ Strong evidence the program caused significant weight loss!

Statistics Study Guide | Page 7


CHAPTER 4: ONE-WAY ANOVA (Analysis of Variance)

What Is ANOVA?
ANOVA tests whether 3 or more group means are equal. Instead of doing multiple t-tests (which
increases error), ANOVA does it in one test.

■ Key Concepts

H■: All group means are equal: µ■ = µ■ = µ■ = ...

H■: At least one group mean is different

Test Statistic: F-ratio = variation BETWEEN groups / variation WITHIN groups

Reject H■ if: F > F_critical (from F-table) or p-value < α

ANOVA Formulas
F = MSB / MSW

SSB (Between): Σ n■(x■■ − x■_grand)² [How far group means are from grand mean]

SSW (Within): ΣΣ (x■■ − x■■)² [How far each value is from its group mean]

MSB: SSB / (k−1) [k = number of groups]

MSW: SSW / (N−k) [N = total number of observations]

df Between: k − 1

df Within: N − k

ANOVA Table Structure


Source SS df MS = SS/df F = MSB/MSW

Between Groups SSB k−1 MSB F

Within Groups SSW N−k MSW —

Total SST N−1 — —

EXAMPLE 7 (Easy) — One-Way ANOVA

Three teaching methods are tested on students. Scores: Method A: 70, 75, 80 | Method B: 85, 90, 95 |
Method C: 60, 65, 70. Test at α=0.05 if the methods differ significantly.

Step 1 Group means: x■_A = 75, x■_B = 90, x■_C = 65. Grand mean = (225+270+195)/9 =
690/9 = 76.67

Step 2 SSB = n[(x■_A−x■)² + (x■_B−x■)² + (x■_C−x■)²] =


3[(75−76.67)²+(90−76.67)²+(65−76.67)²] = 3[2.79+177.79+136.12] = 3×316.70 = 950.10

Statistics Study Guide | Page 8


Step 3 SSW: A:(70−75)²+(75−75)²+(80−75)²=50. B:(85−90)²+(90−90)²+(95−90)²=50.
C:(60−65)²+(65−65)²+(70−65)²=50. SSW=150

Step 4 df_B=3−1=2, df_W=9−3=6. MSB=950.10/2=475.05. MSW=150/6=25.0

Step 5 F = 475.05/25.0 = 19.00. F_critical(2,6, α=0.05) ≈ 5.14

Answer ■ F=19.00 > 5.14 → Reject H■. Teaching methods have significantly different effects!

Statistics Study Guide | Page 9


CHAPTER 5: CHI-SQUARE TESTS

Two Types of Chi-Square Tests


Test Type Purpose Data Type

Goodness of Fit Does data follow an expected distribution? One categorical variable

Test of Independence Are two categorical variables related?


Two categorical variables (contingency table)

The Chi-Square Formula (Same for Both Tests!)


χ² = Σ (O − E)² / E

O = Observed frequency (actual count from data)

E = Expected frequency (what we expect under H■)

For Test of Independence: E = (Row Total × Column Total) / Grand Total

df for Goodness of Fit: k − 1 (k = number of categories)

df for Independence: (rows − 1) × (columns − 1)

EXAMPLE 8 (Easy) — Chi-Square Goodness of Fit

A die is rolled 60 times. Each face should appear 10 times (expected). Observed: 1→8, 2→12, 3→9,
4→11, 5→7, 6→13. Test if the die is fair at α=0.05.

Face O E (O−E) (O−E)² (O−E)²/E

1 8 10 −2 4 0.40

2 12 10 2 4 0.40

3 9 10 −1 1 0.10

4 11 10 1 1 0.10

5 7 10 −3 9 0.90

6 13 10 3 9 0.90

Total 60 60 — — χ²=2.80

Step 1 H■: Die is fair (each face equally likely). H■: Die is not fair.

Step 2 df = k−1 = 6−1 = 5. χ²_critical(5, α=0.05) = 11.07

Step 3 χ² = 2.80 < 11.07 → Fail to Reject H■

Answer ■ No evidence the die is unfair. The die appears to be fair.

EXAMPLE 9 (Hard) — Chi-Square Test of Independence

Statistics Study Guide | Page 10


A survey asks 200 people about their gender (Male/Female) and preference for Tea/Coffee. Is there a
relationship between gender and drink preference? (α=0.05)

Observed Tea Coffee Row Total

Male 30 70 100

Female 50 50 100

Col Total 80 120 200

Step 1 H■: Gender and drink preference are independent. H■: They are related.

Step 2 Expected values E = (Row Total × Col Total) / Grand Total: E(M,Tea)=100×80/200=40.
E(M,Coffee)=100×120/200=60. E(F,Tea)=100×80/200=40.
E(F,Coffee)=100×120/200=60.

Step 3 χ² = (30−40)²/40 + (70−60)²/60 + (50−40)²/40 + (50−60)²/60 = 100/40 + 100/60 + 100/40 +


100/60 = 2.5 + 1.667 + 2.5 + 1.667 = 8.333

Step 4 df = (2−1)(2−1) = 1. χ²_critical(1, α=0.05) = 3.841

Answer ■ χ²=8.333 > 3.841 → Reject H■. Gender and drink preference ARE related!

Statistics Study Guide | Page 11


CHAPTER 6: EXAM KEYWORD → FORMULA CHEAT SHEET

This is the most important section for exams! Learn to recognize keywords in the question that tell you
which formula to use.

Exam Keyword / Hint Use This Formula Short Formula

"Estimate the mean", "95% confident", Confidence Interval for Mean x■ ± Z*(σ/√n) or x■ ± t*(s/√n)
"construct an interval"

"Proportion", "percentage", "fraction of CI for Proportion p■ ± Z*√(p■(1−p■)/n)


people who..."

"Population SD is known" or "n ≥ 30" Use Z-test / Z-interval Z = (x■−µ■)/(σ/√n)

"Population SD unknown" or "n < 30" or Use t-test / t-interval t = (x■−µ■)/(s/√n), df=n−1
"sample SD given"

"Test if mean equals / differs from a One-Sample t or Z Test H■: µ = µ■


specific value"

"Compare two groups", "is there a Two-Sample t-Test t = (x■■−x■■)/√(s■²/n■+s■²/n■)


difference between A and B" (Independent)

"Before and after", "same subjects", Paired t-Test t = d■/(s_d/√n)


"paired", "matched pairs"

"3 or more groups", "compare One-Way ANOVA F-Test F = MSB/MSW


methods/treatments", "ANOVA"

"Follows a distribution?", "goodness of Chi-Square Goodness of Fit χ² = Σ(O−E)²/E, df=k−1


fit", "expected vs observed"

"Relationship between two Chi-Square Independence χ² = Σ(O−E)²/E, df=(r−1)(c−1)


categories?", "independent?", Test
"contingency table"

"Right-tailed" or "greater than" in H■ Right-Tailed Test Reject if test stat > +critical value

"Left-tailed" or "less than" in H■ Left-Tailed Test Reject if test stat < −critical value

"Not equal" or "different from" in H■ Two-Tailed Test Reject if |test stat| > critical value

ALL FORMULAS AT A GLANCE — Quick Reference

Formula Name Formula When to Use

CI Mean (Z) x■ ± Z*(σ/√n) σ known

CI Mean (t) x■ ± t*(s/√n) σ unknown

CI Proportion p■ ± Z*√(p■q■/n) Proportions

One-sample Z Z=(x■−µ■)/(σ/√n) σ known, test mean

One-sample t t=(x■−µ■)/(s/√n) σ unknown, test mean

Statistics Study Guide | Page 12


Two-sample t t=(x■■−x■■)/√(s■²/n■+s■²/n■) Compare 2 groups

Paired t t=d■/(s_d/√n) Before/After

ANOVA F F=MSB/MSW 3+ groups

Chi-Square χ²=Σ(O−E)²/E Categorical data

MEMORY TRICKS FOR BEGINNERS

■ Memory Tricks

Z vs t rule: "If they GIVE you σ (population SD) → use Z. If you only have s (sample SD) → use t."

CI vs HT: "CI = estimate a range. HT = test a claim about a number."

ANOVA trick: "3+ groups? Think F! F = Between ÷ Within."

Chi-Square trick: "Two categorical variables in a table? Always chi-square!"

Paired test trick: "Same people, two measurements? ALWAYS paired t-test!"

p-value rule: "p is low, H■ must go! (If p < α, reject H■)"

Statistics Study Guide | Page 13

You might also like