0% found this document useful (0 votes)
5 views14 pages

Module 2-4

This document covers the concepts of sampling distributions, including the definitions of statistics and parameters, the central limit theorem, and various distributions such as normal, chi-square, t, and F distributions. It explains the properties and conditions for these distributions, including their means, variances, and relationships to one another. Additionally, it provides mathematical formulations and the moment-generating functions for these distributions.

Uploaded by

wixaco9705
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views14 pages

Module 2-4

This document covers the concepts of sampling distributions, including the definitions of statistics and parameters, the central limit theorem, and various distributions such as normal, chi-square, t, and F distributions. It explains the properties and conditions for these distributions, including their means, variances, and relationships to one another. Additionally, it provides mathematical formulations and the moment-generating functions for these distributions.

Uploaded by

wixaco9705
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 2

Sampling distributions

Statistic and parameter

Any function of the population values is called parameter. E.g. µ-po


pulation mean, 𝜎-population s.d.

Any function of sample values is known as sample statistic.e.g. x


– sample mean, s – sample s.d

Sampling distribution

The probability distribution of a sample statistic is called the


sampling distribution of that sample statistic. E.g. z, t, χ2 and F
distributions.

Standard error

Standard deviation of the sampling distribution of a sample statistic


is called the standard error of that statistic.

Central limit theorem

Let x1, x2,… … .xn be ‘n’ independent random variables, let all have
same distribution with common mean µ and common s.d 𝜎 . Then the
x1 + x2 + … … xn
mean of these variables x = follows normal distribution
n

with mean µ and s.d 𝜎/√𝑛


𝑛.

Conditions for central limit theorem

1) Variables must be independent.


2) All variables should have common mean and common s.d
3) n is very large
Sampling distribution of mean of samples taken from normal population
[distribution of sample mean]

Let x1, x2,… … .xn be a random sample taken from a normal


population with mean µ and standard deviation 𝜎 , Let x be the sample
mean then,

x1 + x2 + … … xn
x=
n

𝑡2𝜎2
mgf of normal distribution = 𝑒µt + 2

𝑡2𝜎2
Mx (t) = 𝑒µt + 2

We know that all random sample follows normal distribution


therefore all xi ‘s have same mgf.
𝑡2𝜎2
ie MXi (t) = 𝑒µt + 2 ; i = 1,2,3,… .n

M𝑥 (t) = M (t)
x + x + … …x
1 2 n
n

= Mx1/n (t). Mx2/n (t)… … .. Mxn/n (t)

= Mx1 (t/n). Mx2 (t/n)… … .. Mxn (t/n)


µ𝑡
𝑡 𝑡 2𝜎2 µ𝑡
𝑡 𝑡2 𝜎2 µ𝑡
𝑡 𝑡 2𝜎2
=𝑒 𝑛 + 𝑒 𝑛 + … … … … .𝑒 𝑛 +
𝑛2
2𝑛 𝑛2
2𝑛 𝑛2
2𝑛

µ𝑡
𝑡 𝑛 µ𝑡
𝑡𝑛 𝑡2𝜎2
𝑡 2𝜎2 𝑡 2𝜎2𝑛
=𝑒 𝑛 + =𝑒 𝑛 + = 𝑒µ𝑡𝑡 + 2𝑛
𝑛
2𝑛𝑛2 𝑛2
2𝑛

This is the mgf of normal distribution with mean µ and s.d 𝜎/√𝑛
𝑛.

− (𝑥 − µ)2
∴ f(𝑥) = 1
𝑒 2𝜎𝜎2/𝑛
𝑛
𝜎
√2𝜋
𝜋
𝑛

√𝑛
𝑛 − (𝑥 − µ)2
f(𝑥) = 𝑒 2𝜎𝜎2/𝑛
𝑛
𝜎√2𝜋
𝜋
ie , X~ N (µ ,𝜎
𝜎) then X ~ N (µ ,𝜎 𝑛)
𝜎/√𝑛

Distribution of the variance of samples taken from a normal population


[distribution of sample variance]

Let x1,x2,… … … xn be a random sample taken from a normal


population with mean ‘µ' and variance 𝜎2. Let s2 = 1 ∑ (xi -x)2 be the
𝑛

sample variance and x be the sample mean. Then the distribution of


sample variance is given by,
𝑛 −1
𝑛
( ) 2 − 𝑛𝑠2
𝜎2
2𝜎 𝑛−1
f(S2) = 𝑒 𝜎2
2𝜎 (S2 ) 2
−1 ; 0<S2<∞
┌(𝑛
𝑛 − 1)/2

Chi-square (χ2) distribution

A continuous random variable χ2 is said to follow a χ2distribution if


its probability density function is,

1 𝑛/2
( )
f(χ2)
χ2 𝑛
= 2
𝑒 − 2 (χ2 ) 2
−1 1 ; 0< χ2 <∞
┌𝑛
𝑛/2

Where ‘n’ is parameter and degrees of freedom.

Mgf of χ2 distribution

M χ2 (t) = E[𝑒𝑡χ2 ]

=∫ ∞ f(χ2 )d χ2
0 𝑒 𝑡χ2

𝑛
1
( ) 2 𝑛
=∫ ∞ 2 −1 d χ2
0 𝑒 𝑡χ2 𝑒 − χ2 /2 (χ2 ) 2
┌𝑛
𝑛 /2

𝑛
1
2 𝑛
= 2
∫ ∞ −1 d χ2
0 𝑒 𝑡χ2 − χ2/2 (χ2) 2
┌𝑛
𝑛 /2
𝑛
1
2 𝑛 ┌n
= 2
∫ ∞ −1 d χ2 =
┌𝑛
𝑛 /2
0 𝑒 − χ2(1 − 2t)/2 (χ2) 2
𝑚𝑛

0
𝑒 − 𝑚𝑥 𝑥𝑛 − 1 𝑑𝑥

𝑛
1
2
= 2 ┌n/2 = 1 = (1 − 2𝑡
𝑡) − 𝑛/2
𝑛 1 − 2𝑡
𝑡 𝑛/2 𝑛

2 2 (1 − 2𝑡
𝑡) 2

Mean and variance of χ2 distribution

we have, Mχ2 (t) = (1 − 2𝑡


𝑡) − 𝑛/2
𝑛 𝑛 𝑛
× 2𝑡
𝑡 ( 𝑡)2
+ 1) × (2𝑡
2 2 2
= 1+ + +… … … ..
1! 2!

𝑛 𝑛
𝑛𝑡 ( 𝑡2
+ 1) × 4𝑡
2 2
= 1+ + +… … … …
1! 2!

𝑛𝑡 n(n + 2)𝑡 2
= 1+ + +… … … … .
1! 2!

𝑡𝑟
μr1 = coefficient of
𝑟!

𝑡
μ11 = coefficient of =n
1!

𝑡
μ21 = coefficient of =n(n+2)
2!

∴ mean of χ2 distribution = n

Variance (µ2) = µ21 – (µ11)2

= n(n+2) – n2

= n2+ 2n-n2

= 2n

Additive property of χ2 distribution


Let U and V be two independent random variables following χ2
distribution with degrees of freedom n1 and n2 respectively, then X = U+V
follows χ2 distribution with (n1+n2) degrees of freedom.

Proof:

Let, U ~ χ2(n1) and V~ χ2 (n2)

Let X= U+V

Mx(t) = Mu+v(t)= Mu (t). Mv (t)


− 𝑛1 − 𝑛2
= (1 − 2t) 2 . (1 − 2t) 2

= (1 − 2𝑡
𝑡) − (n1 + n2)/2

This is the mgf of χ2 distribution with n1 +n2 degrees of freedom.

Distribution of square of a random variable following N(0,1)

Let ‘X’ be a random variable following N(0,1) we have to find the


distribution of X2.

Let X~ N(0,1) and Y = X2

My (t) = E(ety) = E [𝑒𝑡 𝑥2 ]

My (t) = ∫ ∞ 𝑒 𝑡𝑥2 f(x)dx


-∞

1 𝑥2
=∫ ∞ 𝑒𝑡 𝑥2
-∞
√2𝜋
𝜋
𝑒− 2 dx

1 ∞
= ∫ -∞ 𝑒 𝑡𝑥2 − 𝑥2 /2 dx
2𝜋
𝜋

1 − 𝑥2
∞ [1 − 2𝑡
𝑡]
= ∫ -∞ 𝑒 2 dx
2𝜋
𝜋

1
= 1 × √𝜋
𝜋 = 1 = 1 = 1 − 2𝑡
𝑡 −
√2𝜋
𝜋 1 − 2𝑡
𝑡 1 ( ) 2
1 − 2𝑡
𝑡 −
2 (1 − 2𝑡
𝑡) 2
This is the mgf of χ2 with one degrees of freedom.

Note

If x1,x2,… … … xn are ‘n’ independent standard normal variates


then, x21+ x22+… … .+ x2n follows χ2 distribution with ‘n’ degrees of
freedom.

Properties of χ2 distribution

1) It is a sampling distribution
2) It is a continuous probability distribution
3) Parameter of χ2 distribution is ‘n’
4) Mean of χ2distribution is ‘n’ and variance is ‘2n’ .
5) For large values of n, χ 2distribution is symmetric.
6) As the degrees of freedom increases, χ2 distribution approaches to
normal distribution.

Students t-distribution

A continuous random variable ‘t’ is said to follow t-distribution if


its probability density function is,

f(t) = ┌(𝑛
𝑛 + 1)/2
1+
𝑡2 𝑛 + 1)/2 ;
(𝑛 −∞<𝑡<∞
𝑛𝜋 𝑛

here ‘n’ is the degrees of freedom

Properties of t-distribution

1) t-distribution is a sampling distribution.


2) All odd moments of the distribution are zero.
𝑛
3) mean of the distribution is zero and variance = for n>2
𝑛−2

4) t-curve is maximum at t = 0
5) for large samples,t-distribution approaches to normal distribution

F-distribution
A continuous random variable F is said to follow F-distribution if its
probability density function is given by,

𝑛1 𝑛1
𝑛1 −1
2 𝐹 2 𝑛1 + 𝑛2
f(F) = 𝑛2
(1 +
𝑛1
𝐹) 2 ;0< 𝐹<∞
𝑛1 𝑛2 𝑛2
𝛽 ,
2 2

where (n1,n2) be the degrees of freedom

Properties of F- distribution

1) It is a sampling distribution.
1
2) If F follows F-distribution with (n1,n2) degrees of freedom, then
𝐹

follows F distribution with (n2,n1) degrees of freedom.


𝑛2
3) Mean of F distribution where (n1,n2) are the degree of
𝑛2 − 2

freedom.
4) F-curve is j shaped when n2 ≤ 2 and bell shaped when n1>2.

Relationship b/w z, t, χ2 and F distributions

1) When X follows a normal distribution with mean µ and s.d. 𝜎


𝑥−µ
then Z = follows standard normal distribution.
𝜎

2) Square of a normal variable with mean zero and unit variance


follows χ2 distribution with one degree of freedom.
X ~ N(0,1) then X2 ~ χ2(1)
3) If x1,x2,… … … xn are ‘n’ independent standard normal variables
then x21+ x22+… … .+ x2nfollows χ2 distribution with ‘n’ degrees of
freedom.
4) If X is a random variable following normal distribution with mean
zero and unit variance and y is a random variable following χ2
𝑋
distribution with ‘n’ degrees of freedom the, follows t-
𝑌/𝑛
𝑛

distribution with ‘n’ degrees of freedom.


5) If Y1 and Y2 are two independent random variables following χ2
Y1/n1
distribution with n1 and n2 degrees of freedom then
Y2/n2

follows F- distribution with (n1-1, n2 -1) degrees of freedom.

1. Obtain the expression for the mean of a Poisson distribution.

For a Poisson distribution with parameter λ\lambdaλ,

E(X)=λ.E(X) = \lambda.E(X)=λ.

2. Obtain the mean of a uniform distribution in continuous setup.

For a continuous uniform distribution on [a,b][a,b][a,b],

E(X)=a+b2.E(X) = \frac{a+b}{2}.E(X)=2a+b .

3. Conditions under which the Binomial distribution tends to Normal


distribution.

A Binomial B(n,p)B(n,p)B(n,p) tends to Normal when:

 nnn is large,
 ppp is not extremely close to 0 or 1,
 Specifically: np≥5np \ge 5np≥5 and n(1− p)≥5n(1-p) \ge 5n(1− p)≥5.

Then:

B(n,p)≈N(np, np(1− p)).B(n,p) \approx N(np,\;np(1-p)).B(n,p)≈N(np,np(1− p)).

4. What are the commonly used sampling distributions?

Common sampling distributions:

 t-distribution
 Chi-square distribution
 F-distribution
 Normal distribution (sampling distribution of mean for large samples)
5. Point out some uses of F-distribution.

Uses:

 Testing equality of two population variances


 ANOVA (Analysis of Variance)
 Comparing goodness of fit in regression (overall regression significance)
 Testing significance of multiple coefficients.

6. Relation between Normal and t variable.

If:

 Z∼N(0,1)Z \sim N(0,1)Z∼N(0,1)


 V∼χν2V \sim \chi^2_\nuV∼χν2 independently

Then the t-variable is:

T=ZV/ν.T = \frac{Z}{\sqrt{V/\nu}}.T=V/ν Z .

The t-distribution approaches Normal as ν→∞\nu \to \inftyν→∞.

7. Define point estimation.

Point estimation is the process of using sample data to compute a single numerical
value (a point estimate) that serves as the best guess of an unknown population
parameter.

8. Define efficiency.

Efficiency of an estimator is a measure of how well it estimates a parameter


compared to other unbiased estimators.
An estimator T1T_1T1 is more efficient than T2T_2T2 if:

Var (T1)<Var (T2).\operatorname{Var}(T_1) < \operatorname{Var}(T_2).Var(T1


)<Var(T2 ).
9. Confidence interval for population variance in sampling from normal
population.

For a normal population with sample variance s2s^2s2, sample size nnn:

(n− 1)s2χ1− α/22≤σ2≤(n− 1)s2χα/22.\frac{(n-1)s^2}{\chi^2_{1-\alpha/2}} \le


\sigma^2 \le \frac{(n-1)s^2}{\chi^2_{\alpha/2}}.χ1− α/22 (n− 1)s2 ≤σ2≤χα/22 (n
− 1)s2 .

10. Define composite hypothesis.

A composite hypothesis is one that does not specify the distribution completely,
meaning the parameter lies in a range instead of a single value.
Example: H0:μ≥10H_0: \mu \ge 10H0 :μ≥10.

11. Uses of chi-square test.

 Test of independence in contingency tables


 Test of goodness of fit
 Test for population variance
 Test of homogeneity

12. Test statistic used when goodness of fit is applied.

For goodness of fit, the chi-square statistic is:

χ2=∑ (Oi− Ei)2Ei,\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i},χ2=∑ Ei (Oi − Ei )2 ,

where OiO_iOi = observed frequency, EiE_iEi = expected frequency.

13. If X is Bernoulli with P(X=1)=0.6 and P(X=0)=0.4, find mean and


variance.

For Bernoulli(p):

E(X)=p=0.6E(X) = p = 0.6E(X)=p=0.6 Var (X)=p(1−


p)=0.6(0.4)=0.24\operatorname{Var}(X) = p(1-p) = 0.6(0.4) = 0.24Var(X)=p(1−
p)=0.6(0.4)=0.24
14. Wages ~ N(70, 5). Estimate number of workers with wages between 69
and 72.

Convert to Z-scores:

Z1=69− 705=− 0.2,Z2=72− 705=0.4Z_1 = \frac{69-70}{5} = -0.2,\quad Z_2 =


\frac{72-70}{5}=0.4Z1 =569− 70 =− 0.2,Z2 =572− 70 =0.4

From Z-table:

 P(Z<0.4)=0.6554P(Z<0.4)=0.6554P(Z<0.4)=0.6554
 P(Z<− 0.2)=0.4207P(Z<-0.2)=0.4207P(Z<− 0.2)=0.4207

Probability in interval:

0.6554− 0.4207=0.23470.6554 - 0.4207 = 0.23470.6554− 0.4207=0.2347

Workers:

1000× 0.2347=235 (approx)1000 \times 0.2347 = 235\ \text{(approx)}1000×


0.2347=235 (approx)

15. In a Normal distribution, 17% below 30 and 17% above 60. Find mean
and SD.

Given:

P(X<30)=0.17⇒Z=− 0.95P(X<30)=0.17 \Rightarrow Z=-0.95P(X<30)=0.17⇒Z=− 0.95


P(X>60)=0.17⇒P(X<60)=0.83⇒Z=+0.95P(X>60)=0.17 \Rightarrow P(X<60)=0.83
\Rightarrow Z=+0.95P(X>60)=0.17⇒P(X<60)=0.83⇒Z=+0.95

Equations:

30=μ− 0.95σ30 = \mu - 0.95\sigma30=μ− 0.95σ 60=μ+0.95σ60 = \mu +


0.95\sigma60=μ+0.95σ

Subtract:

30=1.9σ⇒σ=301.9=15.7930 = 1.9 \sigma \Rightarrow \sigma = \frac{30}{1.9} =


15.7930=1.9σ⇒σ=1.930 =15.79

Mean:
μ=30+0.95σ=30+15=45\mu = 30 + 0.95\sigma = 30 + 15 = 45μ=30+0.95σ=30+15=45

Mean = 45, SD ≈ 15.8

16. Properties of chi-square distribution.

 Defined as the sum of squares of independent standard normals.


 Only one parameter: degrees of freedom (ν\nuν).
 Non-negative (supports x>0x>0x>0).
 Mean = ν\nuν; Variance = 2ν2\nu2ν.
 Right-skewed, becomes symmetric when ν\nuν is large.
 Additivity: If X∼χν12X\sim\chi^2_{\nu_1}X∼χν1 2 and Y∼
χν22Y\sim\chi^2_{\nu_2}Y∼χν2 2 , independent, then

X+Y∼χν1+ν22.X+Y\sim\chi^2_{\nu_1+\nu_2}.X+Y∼χν1 +ν2 2 .

17. Write down the pdf of t-distribution.

For ttt-distribution with ν\nuν degrees of freedom:

f(t)=Γ(ν+12)νπ Γ(ν2)(1+t2ν)−
ν+12f(t)=\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\
nu}{2}\right)}\left(1+\frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}f(t)=νπ Γ(2ν )Γ(2ν+1
) (1+νt2 )− 2ν+1

18. Find the MLE of a and b in U(a,b).

Sample: x1,x2,… ,xnx_1, x_2, \dots,x_nx1 ,x2 ,… ,xn from U(a,b)U(a,b)U(a,b).

Likelihood is maximized when:

a^=min (xi),b^=max (xi)\hat{a} = \min(x_i), \qquad \hat{b} = \max(x_i)a^=min(xi


),b^=max(xi )

19. Derive the confidence interval for proportion of a Binomial population.

Sample proportion:

p^=xn\hat{p} = \frac{x}{n}p^ =nx


Approximate CI:

p^±Zα/2p^(1− p^)n\hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}p^ ±


Zα/2 np^ (1− p^ )

(For large nnn; normal approximation.)

20. Explain the procedure for testing independence of attributes.

Chi-square test of independence steps:

1. Form a contingency table of observed frequencies OijO_{ij}Oij .


2. Compute expected frequencies:

Eij=(row total)(column total)grand totalE_{ij} = \frac{(\text{row


total})(\text{column total})}{\text{grand total}}Eij
=grand total(row total)(column total)

3. Compute test statistic:

χ2=∑ (Oij− Eij)2Eij\chi^2 = \sum \frac{(O_{ij}-E_{ij})^2}{E_{ij}}χ2=∑ Eij (Oij −


Eij )2

4. Degrees of freedom:

(r− 1)(c− 1)(r-1)(c-1)(r− 1)(c− 1)

5. Compare with χα2\chi^2_{\alpha}χα2 or compute p-value.


6. If χ2\chi^2χ2 is large → reject independence → attributes are associated.

21. Two colleges: 46/200 and 48/250 fail. University failure rate = 18%.
Examine if colleges differ significantly.

Let p0=0.18p_0 = 0.18p0 =0.18.

Combine colleges:

x=46+48=94,n=200+250=450x = 46+48 = 94,\quad n = 200+250 =


450x=46+48=94,n=200+250=450 p^=94450=0.2089\hat{p} = \frac{94}{450} =
0.2089p^ =45094 =0.2089
Test statistic:

Z=p^− p0p0(1− p0)/nZ = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}}Z=p0 (1− p0 )/n


p^ − p0

Compute SE:

0.18× 0.82450=0.01798\sqrt{\frac{0.18 \times 0.82}{450}} = 0.017984500.18×


0.82 =0.01798 Z=0.2089− 0.180.01798=1.60Z = \frac{0.2089 - 0.18}{0.01798} =
1.60Z=0.017980.2089− 0.18 =1.60

For 5% level, critical value = 1.96.

Since:

1.60<1.961.60 < 1.961.60<1.96

Fail to reject H₀ .
The colleges do NOT differ significantly from the university failure rate.

You might also like