Mathematics IV — Unit 3
Statistical Techniques I · BAS303/BAS403 · Complete Notes
Central Tendency | Moments | Skewness & Kurtosis | Curve Fitting | Correlation | Regression
1. Central Tendency — Measures of Average
Arithmetic Mean Median (Grouped) Mode (Grouped)
AM = Sfi*xi / Sfi l + (N/2 - c)/f * i l + (f-f-1)/(2f-f-1-f1)*i
Shortcut: a + Sfd/Sf l=lower lim, c=prev CF f-1=before, f1=after
Geometric Mean Empirical Relation
G = (x1*x2*...*xn)^(1/n) Mean - Mode = 3[Mean - Median]
Harmonic Mean Shortcut AM
1/H = (1/n)(1/x1+1/x2+...+1/xn) AM = a + Sfd/Sf (d = x - a)
Grouped Median = l + [(N/2 - c) / f] * i | N = Total freq, c = CF above median
class
2. Dispersion — Range, MD, SD, Variance, CV
Range Mean Deviation Std Deviation
R = L - S (Largest - MD = Sf|xi - x_bar| / Sf sigma = sqrt(Sf(x-x_bar)^2
Smallest) / Sf)
Variance = sigma^2
Shortcut SD: sigma = sqrt[ Sfd^2/N - (Sfd/N)^2 ]
Coefficient of Variation (CV) = (sigma / x_bar) * 100 | sigma^2 = mu_2 = 2nd
moment about mean
3. Moments — Teen Types (3 Types)
About Mean (mu_r) Raw Moment (mu_r') About Origin (V_r)
mu_r = (1/N) * mu'_r = (1/N) * V_r = (1/N) * Sf_i * xi^r
Sf_i(xi-x_bar)^r Sf_i(xi-a)^r V_1 = x_bar
mu_1=0 | mu_2=sigma^2 a = assumed mean
Relations: Raw Moments → Central Moments
mu'_1 = x_bar - a | mu_2 = mu'_2 - (mu'_1)^2
mu_3 = mu'_3 - 3*mu'_2*mu'_1 + 2*(mu'_1)^3
mu_4 = mu'_4 - 4*mu'_3*mu'_1 + 6*mu'_2*(mu'_1)^2 - 3*(mu'_1)^4
About Origin: V_1=x_bar | V_2=mu_2+x_bar^2 | V_3=mu_3+3*mu_2*x_bar+x_bar^3
Trick: mu_1 always = 0 (about mean). Raw mu'_1 = x_bar - a. Exam mein ye yaad karo!
4. Skewness & Kurtosis
Karl Pearson's Sk beta & gamma Coefficients
Sk = (Mean - Mode) / sigma = 3(Mean - beta_1 = mu_3^2 / mu_2^3 [Skewness]
Median) / sigma Range: -3 to +3 Sk=0: beta_2 = mu_4 / mu_2^2 [Kurtosis] gamma_1
Symmetric | Sk>0: +ve | Sk<0: -ve = +/- sqrt(beta_1) gamma_2 = beta_2 - 3
Bowley's Coefficient Kurtosis Types
(Q3 + Q1 - 2*Median) / (Q3 - Q1) Range: beta_2 = 3 --> Mesokurtic (Normal) beta_2
-1 to +1 Use for open-end or unequal CI > 3 --> Leptokurtic beta_2 < 3 -->
Platykurtic gamma_2 = 0 --> Normal
Kurtosis = shape of frequency distribution. gamma_1 = Karl Pearson's coefficient of skewness.
5. Curve Fitting — Method of Least Squares
Curve Type Normal Equations Trick / Transform
y = a + bx (Straight) Sy = na + bSx Sxy = aSx + bSx^2 Direct — no transform
y = a + bx + cx^2 Sy=na+bSx+cSx^2 Sxy=aSx+bSx^2+cSx^3 3 equations, solve for a,b,c
(Parabola) Sx^2y=aSx^2+bSx^3+cSx^4
y = a + bx^2 Put X=x^2 => y=a+bX Sy=na+bSX Substitute X=x^2 first
SXy=aSX+bSX^2
y = ax + b/x Sxy = aSx^2 + nb S(y/x) = na + Derive via dS/da=0, dS/db=0
bS(1/x^2)
y = ax + bx^2 Sxy = aSx^2 + bSx^3 Sx^2y = aSx^3 + Divide by x first
bSx^4
y = ab^x (Exponential) log y = log a + x*log b Y = A + Bx Take log10, A=log a, B=log b
SY=nA+BSx | SxY=ASx+BSx^2 Antilog at end
y = ae^(bx) ln y = ln a + bx Y = A + bx SY=nA+bSx Take natural log (ln) A = ln a =>
| SxY=ASx+bSx^2 a = e^A
6. Correlation — Karl Pearson & Spearman Rank
Covariance: Cov(x,y) = S(xi*yi)/n - (Sxi/n)*(Syi/n)
Karl Pearson r = SXY / sqrt(SX^2 * SY^2) where X=x-x_bar, Y=y-y_bar
General Formula: r = [N*Sx'y' - Sx'*Sy'] / [sqrt(N*Sx'^2-(Sx')^2) *
sqrt(N*Sy'^2-(Sy')^2)]
Grouped Data: r = [N*SfX'Y'-(SfX')(SfY')] / [sqrt(N*SfX'^2-(SfX')^2) *
sqrt(N*SfY'^2-(SfY')^2)]
Spearman's Rank Correlation
r = 1 - 6*Sd^2 / [n(n^2 - 1)] (d = R1 - R2, difference of ranks)
With ties: r = 1 - 6[Sd^2 + (1/12)(m1^3-m1) + (1/12)(m2^3-m2)+...] / n(n^2-1)
Tie handling: Same value wale items ko average rank do. E.g., 5th & 6th same => rank 5.5. Next rank = 7.
7. Regression — Lines of Best Fit
Regression Line y on x (predict y) Regression Line x on y (predict x)
y - y_bar = byx*(x - x_bar) byx = x - x_bar = bxy*(y - y_bar) bxy =
r*(sigma_y/sigma_x) = SXY / SX^2 r*(sigma_x/sigma_y) = SXY / SY^2
byx Formula bxy Formula
byx = r*(sigma_y/sigma_x) = SXY/SX^2 bxy = r*(sigma_x/sigma_y) = SXY/SY^2
r = sqrt(byx * bxy) | r = SXY / (N*sigma_x*sigma_y)
Property Statement
Same sign bxy, byx and r always have same sign
Product sqrt(bxy * byx) <= 1
Sum bxy + byx >= 2r
Independence Regression coefficients are independent of origin but NOT of scale
Angle tan(theta) = (1-r^2)/r * sigma_x*sigma_y/(sigma_x^2+sigma_y^2)
Intersection Both regression lines pass through (x_bar, y_bar)
Mind Map — Unit 3 Overview
Statistical Techniques I
AM = a+Sfd/Sf | Median = l+(N/2-c)/f*i | Mode = l+(f-f-1)/(2f-f-1-f1)*i | GM, HM |
Central Tendency Mean-Mode=3[Mean-Median]
Range=L-S | MD=Sf|x-x_bar|/Sf | SD=sqrt(Sfd^2/N-(Sfd/N)^2) | Variance=sigma^2 |
Dispersion CV=(sigma/x_bar)*100
About Mean: mu_r = (1/N)Sf(x-x_bar)^r | Raw (mu'_r): a=assumed mean | About Origin:
Moments (3 Types) V_r | Relations: mu via mu'
Pearson's Sk=(Mean-Mode)/sigma | Bowley=(Q3+Q1-2M)/(Q3-Q1) |
Skewness & Kurtosis beta_1=mu_3^2/mu_2^3 | beta_2=mu_4/mu_2^2 | gamma_2=beta_2-3
y=a+bx (straight) | y=a+bx+cx^2 (parabola) | y=abx => log transform | y=ae^bx => ln
Curve Fitting transform | y=ax+b/x
Covariance formula | Karl Pearson's r | Grouped data formula | Spearman Rank
Correlation r=1-6Sd^2/n(n^2-1) | Tie correction
y on x: byx=r*sigma_y/sigma_x | x on y: bxy=r*sigma_x/sigma_y | r=sqrt(bxy*byx) | Both
Regression lines => (x_bar,y_bar)
Question Solve Karne Ka Shortcut (4-Step Techniques)
1. Central Tendency / Dispersion Questions
1 Table banao: x, f, mid-val, d=(x-a), fd, fd^2, CF columns
2 Summation nikalo: Sf, Sfd, Sfd^2 calculate karo
3 Formula apply karo — AM=a+Sfd/Sf, SD=sqrt(Sfd^2/N-(Sfd/N)^2)
4 CV/Skewness — step 3 results use karo directly
2. Moments Questions (mu' => mu => beta)
1 Raw moments nikalo: mu'_1=Sfd/N, mu'_2=Sfd^2/N, mu'_3=Sfd^3/N, mu'_4=Sfd^4/N
2 Central moments convert karo: mu_2=mu'_2-(mu'_1)^2, mu_3 formula, mu_4 formula
3 beta_1 = mu_3^2 / mu_2^3 (skewness) aur beta_2 = mu_4 / mu_2^2 (kurtosis)
4 gamma_1 = +/- sqrt(beta_1) aur gamma_2 = beta_2 - 3 (excess kurtosis)
3. Regression Lines Given => Find r, sigma
1 Both lines pass through (x_bar, y_bar) => simultaneous equations se x_bar, y_bar nikalo
2 Express as x = f(y) => bxy identify karo | Express as y = f(x) => byx identify karo
3 r = sqrt(bxy * byx) — check: r <= 1, agar nahi toh lines ki assignment swap karo!
4 r * (sigma_y/sigma_x) = byx => sigma_y nikalo (agar sigma_x given ho)
4. Curve Fitting — Kaunsa Formula Use Karen?
1 Equation dekho: Linear? => Direct normal eqns. Non-linear? => Substitute/transform pehle
2 y=ab^x => log10 lo | y=ae^(bx) => ln lo | y=ax^2+b => X=x^2 rakh do
3 Normal equations likh ke table banao => Summation values compute karo
4 a, b (and c) solve karo => original equation mein substitute karo => Final answer!
Last Minute Cheatsheet — Sab Important Formulas
Topic Formula Key Note
AM Shortcut x_bar = a + Sfd/Sf a = assumed mean
SD Shortcut sigma = sqrt[Sfd^2/N - (Sfd/N)^2] Faster than direct method
Pearson Corr. r = [NSx'y'-Sx'Sy']/[sqrt(NSx'^2-(Sx')^2)*sq x'=x-a, y'=y-b
rt(NSy'^2-(Sy')^2)]
Spearman's Rank r = 1 - 6Sd^2 / n(n^2-1) With tie: add (m^3-m)/12
Pearson's Sk Sk = (Mean - Mode) / sigma Or 3(Mean-Median)/sigma
Bowley's Coeff. (Q3+Q1-2M) / (Q3-Q1) For open-end/unequal CI
beta_1 beta_1 = mu_3^2 / mu_2^3 gamma_1 = +/-sqrt(beta_1)
(Skewness)
beta_2 (Kurtosis) beta_2 = mu_4 / mu_2^2 gamma_2 = beta_2 - 3
Regression byx byx = r*sigma_y/sigma_x = SXY/SX^2 y on x line slope
Regression bxy bxy = r*sigma_x/sigma_y = SXY/SY^2 x on y line slope
r from b values r = sqrt(byx * bxy) Check: r <= 1 always!
Normal Eqns Sy=na+bSx | Sxy=aSx+bSx^2 n = no. of data points
(y=a+bx)
y=ab^x fitting Y=A+Bx | A=log a, B=log b Antilog at final step
Exam Tip: Agar r > 1 aaye => lines ki assignment galat hai => SWAP karo! Both lines always pass through
(x_bar, y_bar).
Tie in Rank Correlation: m-th repeat ke liye add (m^3-m)/12 inside the bracket in Sd^2.
Curve Fitting Golden Rule: Agar equation non-linear hai => pehle transform karo (log/ln/substitution), THEN
normal equations likho.
Mathematics IV — Unit 3: Statistical Techniques I | BAS303/BAS303H/BAS403/BAS403H | Bitwise Learning