0% found this document useful (0 votes)
12 views16 pages

Probability Statistics Notes

This document provides comprehensive study notes on Probability and Statistics for BSCS 3rd Semester, covering key topics such as Frequency Distribution, Coefficient of Variation, Standard Deviation, Linear Regression, and Hypothesis Testing. Each topic is explained in simple language with definitions, formulas, examples, and memory tips to aid understanding. The notes are designed to be student-friendly and easy to remember.

Uploaded by

inoonmr
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views16 pages

Probability Statistics Notes

This document provides comprehensive study notes on Probability and Statistics for BSCS 3rd Semester, covering key topics such as Frequency Distribution, Coefficient of Variation, Standard Deviation, Linear Regression, and Hypothesis Testing. Each topic is explained in simple language with definitions, formulas, examples, and memory tips to aid understanding. The notes are designed to be student-friendly and easy to remember.

Uploaded by

inoonmr
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

📊 PROBABILITY & STATISTICS

Complete Study Notes — BSCS 3rd Semester


Student-Friendly | Simple Language | Easy to Remember

📋 TOPICS COVERED IN THIS DOCUMENT

➤ Topic 1: Frequency Distribution


➤ Topic 2: Coefficient of Variation
➤ Topic 3: Standard Deviation
➤ Topic 4: Linear Regression & Correlation
➤ Topic 5: Mean, Median, Mode
➤ Topic 6: Z-Test & T-Test (Hypothesis Testing)
➤ Topic 7: Probability
➤ Topic 8: Sampling Distribution
📌 TOPIC 1: FREQUENCY DISTRIBUTION
🎯 What is it? Frequency Distribution = Organizing data into a table to show HOW MANY
TIMES each value occurs.

Think of it like this: Imagine you asked 20 students their marks. Instead of listing all 20 marks
randomly, you GROUP them (like 60-70, 70-80, etc.) and COUNT how many students fall in each
group. That organized table is called a Frequency Distribution.

Key Terms You Must Know


Term Simple Meaning
Class / Class Interval A range/group e.g. 60–70, 70–80
Frequency (f) Number of times a value falls in that class
Class Width/Size Upper limit − Lower limit (e.g. 70−60 = 10)
Tally Marks Counting marks used to record frequency (||||)
Relative Frequency Frequency ÷ Total (gives a fraction/%)
Cumulative Frequency Running total of frequencies from top to bottom
Class Boundaries Actual edges of class (e.g. 59.5 – 70.5 for 60–70)
Class Midpoint (Lower limit + Upper limit) ÷ 2 = middle of class

How to Make a Frequency Distribution Table


➤ Step 1: Find the RANGE = Highest value − Lowest value
➤ Step 2: Decide number of CLASSES (usually 5 to 10)
➤ Step 3: Find CLASS WIDTH = Range ÷ Number of Classes
➤ Step 4: List classes starting from the smallest value
➤ Step 5: Count how many data values fall in each class (Tally)
➤ Step 6: Write the Frequency for each class

📝 Example: Marks of 10 students: 55, 62, 70, 75, 80, 63, 58, 72, 85, 68
Range = 85 − 55 = 30 | Classes = 3 | Width = 10 55–65: frequency = 3 | 65–75: frequency = 4 |
75–85: frequency = 3

Types of Frequency
➤ Simple Frequency: Just the count (f)
➤ Relative Frequency: f ÷ total — tells proportion out of 1
➤ Cumulative Frequency: Add each frequency to the previous total
➤ Percentage Frequency: Relative Frequency × 100

💡 Memory Tip: 'FREQUENCY = How FREQUENTLY something appears!' Think of a frequency


distribution as a SORTED COUNTING table.
📌 TOPIC 2: COEFFICIENT OF VARIATION (CV)
🎯 What is it? CV = A % value that tells us HOW SPREAD OUT the data is relative to its
mean. Used to COMPARE variation between two datasets.

Real Life Example: Suppose Group A has average salary = 50,000 with SD = 5,000 and Group B
has average salary = 200,000 with SD = 10,000. Which group has more variation? You can't
compare raw SDs because the means are different. CV solves this by converting SD into a
percentage!

📐 Formula for Coefficient of Variation


CV = (Standard Deviation ÷ Mean) × 100
CV = (σ / x̄ ) × 100 [for population] OR CV = (s / x̄ ) × 100 [for sample]

How to Calculate CV (Step by Step)


➤ Step 1: Calculate the Mean (x̄ ) of your data
➤ Step 2: Calculate the Standard Deviation (SD or σ)
➤ Step 3: Apply formula: CV = (SD ÷ Mean) × 100
➤ Step 4: Result is in PERCENTAGE (%)

📝 Example: Dataset: 10, 20, 30, 40, 50


Mean = (10+20+30+40+50) ÷ 5 = 30 SD = 14.14 (approx) CV = (14.14 ÷ 30) × 100 = 47.13%

Interpretation Rules
➤ LOW CV (e.g. <15%): Data is CONSISTENT / less spread — values are close to each other
➤ HIGH CV (e.g. >30%): Data is SPREAD OUT / more variable — values differ a lot
➤ When comparing two groups: Lower CV = More consistent

⚠️ Important: CV has NO UNITS. It is always expressed as a %. This makes it perfect for


comparing datasets with different units or scales!

💡 Memory Tip: Think of CV as a 'Consistency Score'. Low CV = Consistent. High CV =


Chaotic/Varied. Formula trick: C-V = CV, stands for Comparing Variation!
📌 TOPIC 3: STANDARD DEVIATION (SD)
🎯 What is it? Standard Deviation = The AVERAGE DISTANCE of each data point from the
MEAN. It tells us how SPREAD the data is.

Think of it like this: If all students got exactly 70 marks, SD = 0 (no spread). If students got 40, 60, 70,
80, 100 — they are spread out, so SD is larger.

Steps to Calculate Standard Deviation


➤ Step 1: Find the Mean (x̄ ) = Sum of all values ÷ count
➤ Step 2: Subtract mean from each value: (x - x̄ )
➤ Step 3: Square each difference: (x - x̄ )²
➤ Step 4: Find average of squared differences = Variance (σ²)
➤ Step 5: Take square root of Variance = Standard Deviation (σ)

📐 Standard Deviation Formula (Population)


σ = √[ Σ(x - x̄ )² ÷ N ]
σ = sigma (population SD), x̄ = mean, N = total count, Σ = sum of all

📐 Standard Deviation Formula (Sample)


s = √[ Σ(x - x̄ )² ÷ (n-1) ]
Use (n-1) instead of N for SAMPLE data — this is called Bessel's Correction

📝 Example: Data: 2, 4, 4, 4, 5, 5, 7, 9
Mean = 40÷8 = 5 Differences: -3,-1,-1,-1,0,0,2,4 Squared: 9,1,1,1,0,0,4,16 → Sum=32 Variance =
32÷8 = 4 SD = √4 = 2

Variance vs Standard Deviation


Variance (σ²) Standard Deviation (σ)
Squared units Same units as data
Mean of squared deviations Square root of variance
Harder to interpret directly Easy to interpret
Always ≥ 0 Always ≥ 0

💡 Memory Tip: SD Formula Memory: 'Subtract, Square, Sum, Divide, Root' = S-S-S-D-R. Think
SSDR — Standard Deviation Recall!
📌 TOPIC 4: MEAN, MEDIAN & MODE
🎯 These are called MEASURES OF CENTRAL TENDENCY — they find the CENTER or
TYPICAL value of a dataset.

MEAN — The Average


Mean = Sum of all values ÷ Number of values | Formula: x̄ = Σx ÷ n

➤ Add all numbers together, then divide by how many there are
➤ Most commonly used measure
➤ Affected by EXTREME values (outliers)
📝 Example: Marks: 70, 80, 90, 60, 100 → Mean = (70+80+90+60+100)÷5 = 400÷5 = 80

MEDIAN — The Middle Value


Median = Middle value when data is arranged in ORDER

➤ STEP 1: Arrange data in ascending order


➤ STEP 2: If ODD count: middle value is median
➤ STEP 3: If EVEN count: average of two middle values
📝 Example: ODD: 3, 5, 7, 9, 11 → Median = 7 (3rd value) EVEN: 3, 5, 7, 9 → Median = (5+7)÷2 =
6

MODE — The Most Frequent


Mode = The value that appears MOST OFTEN in the data

➤ No formula needed — just find the most repeated value


➤ A dataset can have: No mode | One mode (Unimodal) | Two modes (Bimodal) | Many modes
(Multimodal)
📝 Example: Data: 4, 5, 5, 6, 7, 7, 7, 8 → Mode = 7 (appears 3 times) Data: 1, 2, 3, 4 → No mode
(all appear once)

When to Use What Why


Mean Normal data, no outliers — most accurate
Median When there are outliers or skewed data
Mode For categorical data (e.g. most popular color)

💡 Memory Tip: Memory Trick: Mo-de = Mo-st. Mean = Average. Median = Middle. The 3 M's: Most,
Middle, Average!
📌 TOPIC 5: LINEAR REGRESSION & CORRELATION
🎯 Correlation = HOW STRONGLY two variables are related Regression = PREDICTING
one variable using another

Real Life: Hours studied (X) vs Marks scored (Y). Are they related? That's correlation. If a student
studies 6 hours, what marks will they get? That's regression!

CORRELATION — Measuring the Relationship


Pearson's Correlation Coefficient (r) measures the strength and direction of a LINEAR relationship
between two variables.

📐 Correlation Coefficient Formula


r = Σ[(x - x̄ )(y - ȳ)] ÷ √[Σ(x-x̄ )² × Σ(y-ȳ)²]
Value of r is always between -1 and +1

Value of r Meaning
r = +1 Perfect Positive Correlation — both increase
together
r between 0.7 and 1 Strong Positive Correlation
r between 0.3 and 0.7 Moderate Positive Correlation
r near 0 No/Weak Correlation — no relationship
r between -0.3 and -0.7 Moderate Negative Correlation
r = -1 Perfect Negative Correlation — one increases,
other decreases

LINEAR REGRESSION — Making Predictions


📐 Regression Line (Line of Best Fit)
Ŷ = a + bX
Ŷ = predicted value of Y | a = Y-intercept (value of Y when X=0) | b = slope (change in Y per unit X)

📐 How to find b (slope) and a (intercept)


b = [nΣXY − ΣXΣY] ÷ [nΣX² − (ΣX)²] a = (ΣY − bΣX) ÷ n
n = number of data pairs
Difference: Correlation vs Regression
Correlation Regression
Measures strength of relationship Predicts value of one variable
Result: r value (-1 to +1) Result: Equation of a line
No cause-effect implied Implies X influences/predicts Y
Symmetric: r(X,Y) = r(Y,X) Not symmetric: X predicts Y ≠ Y predicts X

💡 Memory Tip: 'CoRReLation = R value'. Regression = R-egression = R-easonable guess. 'Line of


Best Fit = Best Prediction Line'
📌 TOPIC 6: HYPOTHESIS TESTING — Z-TEST & T-TEST
🎯 Hypothesis Testing = A statistical METHOD to decide if a CLAIM about a population is
TRUE or FALSE using sample data.

Understanding Hypothesis
➤ Null Hypothesis (H₀): The DEFAULT claim. 'Nothing special is happening.' Example: 'Average
height = 170cm'
➤ Alternative Hypothesis (H₁ or Hₐ): What you WANT to prove. Example: 'Average height ≠
170cm'

Steps of Hypothesis Testing


➤ Step 1: State H₀ and H₁
➤ Step 2: Choose significance level α (usually 0.05 = 5%)
➤ Step 3: Calculate test statistic (Z or T value)
➤ Step 4: Find critical value from Z or T table
➤ Step 5: Compare → If |test stat| > critical value → REJECT H₀
➤ Step 6: State conclusion in plain language

Z-TEST
Use Z-Test when: Sample size n ≥ 30 AND Population SD (σ) is KNOWN

📐 Z-Test Formula
Z = (x̄ − μ) ÷ (σ ÷ √n)
x̄ = sample mean | μ = claimed population mean | σ = population SD | n = sample size

T-TEST
Use T-Test when: Sample size n < 30 OR Population SD is UNKNOWN (use sample
SD instead)

📐 T-Test Formula
t = (x̄ − μ) ÷ (s ÷ √n)
s = sample standard deviation | Degrees of freedom df = n − 1

Z-Test T-Test
n ≥ 30 (large sample) n < 30 (small sample)
Population σ is known Population σ is unknown
Uses normal distribution Uses t-distribution (wider/flatter)
Critical value from Z-table Critical value from T-table (based on df)
More accurate for large samples More accurate for small samples

📝 Example: Z-Test Example


Claim: Average height = 170cm. Sample: n=40, x̄ =172, σ=5 Z = (172−170) ÷ (5÷√40) = 2 ÷ 0.79 =
2.53 Critical value (α=0.05, two-tail) = 1.96 Since 2.53 > 1.96 → REJECT H₀ → Height is NOT
170cm

💡 Memory Tip: 'Z = large, T = tiny' — Z-test for BIG samples, T-test for TINY samples. Also: Z is for
known σ, T is for unknown σ (need to estimate it from sample).
📌 TOPIC 7: PROBABILITY
🎯 Probability = The CHANCE / LIKELIHOOD that something will happen. Always between 0
(impossible) and 1 (certain).

📐 Basic Probability Formula


P(Event) = Number of Favorable Outcomes ÷ Total Possible
Outcomes
Example: P(Head on coin) = 1 ÷ 2 = 0.5 = 50%

Key Probability Rules


➤ Rule 1 — Range: 0 ≤ P(A) ≤ 1 always
➤ Rule 2 — Certain Event: P(sure thing) = 1 | P(impossible) = 0
➤ Rule 3 — Complement: P(not A) = 1 − P(A)
➤ Rule 4 — Addition (Mutually Exclusive): P(A or B) = P(A) + P(B)
➤ Rule 5 — Addition (NOT Mutually Exclusive): P(A or B) = P(A) + P(B) − P(A and B)
➤ Rule 6 — Multiplication (Independent): P(A and B) = P(A) × P(B)
➤ Rule 7 — Conditional: P(A|B) = P(A and B) ÷ P(B)

Important Probability Terms


Term Meaning & Example
Sample Space (S) All possible outcomes. Coin toss: S = {H, T}
Event A specific outcome or set of outcomes
Mutually Exclusive Events that CANNOT happen at the same time.
E.g., Head AND Tail at once
Independent Events One event does NOT affect the other. E.g., two
separate coin tosses
Dependent Events One event AFFECTS the other. Drawing cards
without replacement
Conditional Probability Probability of A given B already happened: P(A|B)
Complementary Event Opposite of the event. P(not A) = 1 − P(A)

Types of Probability
➤ Classical Probability: Based on EQUALLY LIKELY outcomes (theoretical)
➤ Empirical/Experimental Probability: Based on ACTUAL EXPERIMENTS and observed
frequency
➤ Subjective Probability: Based on PERSONAL JUDGMENT or expert opinion
📝 Example: Deck of cards: P(drawing an Ace) = 4/52 = 1/13 ≈ 0.077 P(drawing a Heart) = 13/52
= 1/4 = 0.25 P(Ace OR Heart) = 4/52 + 13/52 − 1/52 = 16/52 = 4/13

💡 Memory Tip: 'Probability = Favorable ÷ Total'. Think FORTUNE: Favorable Over Remarkable
Total. P is always between 0 and 1!
📌 TOPIC 8: SAMPLING DISTRIBUTION
🎯 Sampling Distribution = Distribution of a statistic (like the MEAN) computed from ALL
POSSIBLE SAMPLES of same size from a population.

Imagine a school of 1000 students. You take many samples of 30 students each and calculate the
mean marks of each sample. Those sample means form a distribution → That is the Sampling
Distribution of the Mean!

The Central Limit Theorem (CLT) — Most Important Theorem!


📢 CENTRAL LIMIT THEOREM: If you take sufficiently large samples (n ≥ 30) from ANY
population, the sampling distribution of the sample mean will be APPROXIMATELY
NORMAL, regardless of the original population's shape!

Key Properties of Sampling Distribution


➤ Mean of Sampling Distribution (μx̄ ) = Population Mean (μ) — Sample means average out to
population mean
➤ Standard Error (SE) = σ ÷ √n — Spread of sample means. Larger n → smaller SE

📐 Standard Error Formula


SE = σ ÷ √n (or SE = s ÷ √n for sample)
σ = population SD | n = sample size | SE decreases as n increases

Why is Sampling Distribution Important?


➤ It is the FOUNDATION of hypothesis testing (Z and T tests)
➤ It tells us how much sample means VARY from the true population mean
➤ Larger samples → less variation → more reliable estimates
➤ Connects sample statistics to population parameters

Population vs Sampling Distribution Description


Population Distribution All individual data points in the population
Sample Distribution Data points in ONE sample
Sampling Distribution Distribution of a STATISTIC (like mean) from MANY
samples
Mean: μ Population mean — fixed, unknown usually
Mean: x̄ Sample mean — varies from sample to sample
Standard Error (SE) SD of the sampling distribution of means

📝 Example: Population: heights with μ=170cm, σ=10cm. Take 100 samples each of size n=25.
Mean of sample means = 170cm (same as population) SE = 10 ÷ √25 = 10 ÷ 5 = 2cm By CLT:
Distribution of sample means is Normal with mean=170, SD=2

💡 Memory Tip: 'CLT = Central Limit THEOREM = Center becomes Normal!' Bigger sample →
sampling distribution becomes more normal and narrower. SE = σ/√n — as n grows, error shrinks!
⚡ QUICK REVISION CHEAT SHEET ⚡

Topic Key Formula / Key Point


Frequency Distribution Organize data in classes. Cumulative freq = running
total.
Coefficient of Variation CV = (SD ÷ Mean) × 100%. Lower CV = more
consistent.
Standard Deviation σ = √[Σ(x−x̄ )²÷N]. Measures spread around mean.
Mean x̄ = Σx ÷ n. Sum divided by count.
Median Middle value when arranged in order.
Mode Most frequently occurring value.
Correlation (r) r = between -1 and +1. Measures relationship
strength.
Regression Line Ŷ = a + bX. b = slope, a = intercept.
Z-Test Z = (x̄ −μ)÷(σ÷√n). Use when n≥30, σ known.
T-Test t = (x̄ −μ)÷(s÷√n). Use when n<30, σ unknown.
Probability P(A) = Favorable ÷ Total. Range: 0 to 1.
Sampling Distribution SE = σ÷√n. CLT: large samples → normal dist.

📚 Good Luck in Your Exams! You've Got This! 🎓


BSCS 3rd Semester — Probability & Statistics Complete Notes

You might also like