0% found this document useful (0 votes)
4 views21 pages

Note 3

Statistical inference involves methods for making decisions about populations based on sample data, primarily through parameter estimation and hypothesis testing. Estimation can be point or interval-based, with characteristics of good estimators including unbiasedness, efficiency, and consistency. The document also discusses the concepts of standard error, mean square error, and different types of interval estimates, such as confidence, tolerance, and prediction intervals.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views21 pages

Note 3

Statistical inference involves methods for making decisions about populations based on sample data, primarily through parameter estimation and hypothesis testing. Estimation can be point or interval-based, with characteristics of good estimators including unbiasedness, efficiency, and consistency. The document also discusses the concepts of standard error, mean square error, and different types of interval estimates, such as confidence, tolerance, and prediction intervals.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STATISTICAL INFERENCE

STATISTICAL INFERENCE
• Preamble: Methods used in making decisions or drawing conclusions about
a population constitute the field of statistical inference
• Statistical inference is categorized into 2 major areas:
• Parameter estimation
• Hypothesis testing
• Definitions:
• Statistical inference is the process by which we infer population properties from the
properties of the sample
• The process of drawing conclusions about population from samples that are subject
to random variation
• Population in statistics refers to the collection of all objects or elements under study in a
given situation
• This population property must be observable and measurable
Estimation
• Estimation theory is the aspect or branch of statistics that deals with
estimating the values of parameters based on empirical or measured
data that has a random component
• The objective of estimation is to approximate the value of a
population parameter on the basis of a sample statistics
• Eg: The sample mean 𝑋ത could be used to estimate the population mean 𝜇
• Note: The entire purpose of estimation is to arrive at an estimator (a
function that maps the sample space to a set of estimates), which
takes the sample as input and produces an estimate (a single value
calculated based on samples and used to estimate a population value)
of the parameters with the corresponding accuracy.
TYPES OF ESTIMATE
• Point Estimator:
• Here, inference is drawn by using a
single value to estimate an unknown
parameter
• Interval Estimator :
• Here, an interval of values is used in
estimating an unknown parameter
Point Estimator
• A point estimator draws inference about a population by estimating
the value of an unknown parameter using a single value or point
• A point estimate is one of the possible values a point estimator can
assume
• Mathematically, if there is a fixed parameter 𝜃 that needs to be
estimated and 𝑋 is a random variable corresponding to the observed
data, then an estimator of 𝜃, usually denoted by 𝜃, መ is a function of
the random variable 𝑋 and as such, is in itself a random variable.
Characteristics of a good Point Estimator
• Unbiasedness
• Efficiency
• Consistency
Unbiasedness
• An estimator should be close to the true value of the unknown
parameter
• An estimator 𝜃መ is said to be an unbiased estimator of a parameter 𝜃 if
the expected value of the estimator equals the parameter
• 𝐸 𝜃෠ = 𝜃
• When there exists a difference between the expected value of the
estimator 𝐸 𝜃መ and the parameter 𝜃, the estimator is said to be
biased or not unbiased and the difference is called bias
• 𝐸 𝜃෠ − 𝜃 = 𝑏𝑖𝑎𝑠
• For unbiased estimator, the bias value is zero
Example 1: If 𝑋 is a random variable with mean 𝜇 and variance 𝜎 2 , and
𝑋1 , 𝑋2 ,…, 𝑋𝑛 is a random sample of size 𝑛 from a population
represented by 𝑋, show that the sample mean 𝑋ത and sample variance
𝑠 2 are unbiased estimators of 𝜇 and 𝜎 2 respectively
• Solution:
• Recall that from the definition of mean of a continuous random variable,
• 𝐸 𝑋 = ‫𝑥 ׬‬. 𝑓 𝑥 𝑑𝑥 = 𝜇
• Similarly, from the definition of variance of a continuous random variable
• var 𝑋 = ‫ 𝑥 ׬‬− 𝜇 2 . 𝑓 𝑥 𝑑𝑥
• =𝐸 𝑥−𝜇 2
• = E 𝑋2 − 𝐸 𝑋 2
𝑋1 +𝑋2 +𝑋3 +⋯+𝑋𝑛

• If 𝑋 = with 𝐸 𝑋𝑖 = 𝜇 for 𝑖 = 1, 2, 3, … , 𝑛
𝑛
• Then 𝐸 𝑋ത = 𝜇
σ𝑛 ത 2
𝑖−1 𝑋𝑖 −𝑋 1
• 𝐸 𝑠2 =𝐸 = 𝐸 σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2
𝑛−1 𝑛−1
1
• = 𝐸 σ𝑛𝑖=1 𝑋𝑖 2 + 𝑋ത 2 − 2𝑋𝑋
ത 𝑖
𝑛−1
1
• = 𝐸 σ𝑛𝑖=1 𝑋𝑖 2 − 𝑛𝑋ത 2
𝑛−1
1
• = σ𝑛𝑖=1 𝐸(𝑋𝑖 )2 − 𝑛𝐸(𝑋)ത 2 {from Mean of a linear combination}
𝑛−1
2 𝜎2
• But 𝐸(𝑋𝑖 ) = 𝜇 + 𝜎 while 𝐸 𝑋ത = 𝜇 +
2 2 2 2
𝑛
2 1 𝑛 2 2 2 𝜎2
• ∴ 𝐸𝑠 = σ𝑖=1 𝜇 + 𝜎 − 𝑛 𝜇 +
𝑛−1 𝑛
1
• = 𝑛𝜇2 + 𝑛𝜎 2 − 𝑛𝜇2 − 𝜎 2 = 𝜎 2
𝑛−1
• Therefore, 2the sample variance 𝑆 2 is an unbiased estimator of the population
variance 𝜎
Efficiency
• Given two unbiased estimators for a parameter, the estimator with a
smaller variance is more efficient
• For the same parameter 𝜃, an unbiased point estimator 𝜃መ1 , is more
efficient than another unbiased point estimator 𝜃መ2 , if
• 𝑉𝑎𝑟 𝜃෠1 (𝑥) < 𝑉𝑎𝑟 𝜃෠2 (𝑥)
• Therefore, if we have several estimators, a typical principle of
estimation is to choose the estimator that has minimum variance.
• Such estimator is known as Minimum Variance Unbiased Estimator (MVUE)
• MVUE is most likely among all unbiased estimators to produce an
መ that is close to the true value of 𝜃
estimate, 𝜃,
MVUEs of different distributions
• For a normal distribution with unknown mean and variance:
• Sample mean 𝑥ҧ is the MVUE for population mean 𝜇
• 𝐸 𝑋ത − 𝜇 = 𝐸 𝑋ത − 𝜇 = 0
• Sample variance 𝑆 2 is the MVUE for population variance 𝜎 2
• 𝐸 𝑆2 − 𝜎 2 = 𝐸 𝑆2 − 𝜎 2 = 0
• For other distributions, the sample mean and sample variance are not
in general, MVUEs
Illustrative example: Suppose we have a random sample of n
observations 𝑋1 , 𝑋2 , 𝑋3 ,…, 𝑋𝑛 and we wish to compare two
possible unbiased estimators for 𝜇: the sample mean 𝑥ҧ and a
single observation from the sample, say 𝑥𝑖 .
𝜎2
• For the sample mean: 𝑉 𝑥ҧ =
𝑛
• For the single observation: 𝑉 𝑥𝑖 = 𝜎 2
𝜎2
• Since < 𝜎 2 , we can conclude that for sample sizes 𝑛 ≥ 2, the
𝑛
sample mean is a better estimator of 𝜇 than a single observation 𝑥𝑖 .
Consistency
• An unbiased estimator is said to be consistent if the difference
between the estimator and the target population parameter becomes
smaller as the sample size increases.
• For example:
𝜎2
• The variance of the sample mean is . Thismeans that as the sample size n
𝑛
increases, the variance decreases. Therefore the sample mean is a consistent
estimator of 𝜇
Reporting a Point Estimate:
Standard Error
• When the numerical value or point estimate of a parameter is reported, it
is usually desirable to give some idea of the precisions of estimation.
• The measure of precision usually employed is the standard error of the
estimator that has been used.
መ is its standard deviation, given by:
• The standard error of an estimator, 𝜃,
• 𝜎𝜃෡ = 𝑉 𝜃෠

• If the standard error involves unknown parameter that can be estimated,


substitution of these values into 𝜎𝜃෡ produces an estimated standard error
denoted by 𝜎ො 𝜃෡
Mean Square Error
• When an estimator follows a normal distribution, one can be reasonably confident that
the true value of the parameter lies within 2 standard errors of the estimate
• Sometimes, it is necessary to use a biased estimator and in such cases, the mean square
error of the estimator is employed
• Mean Square Error (MSE) of an estimator 𝜃෠ of the parameter 𝜃 is defined as:
2
• 𝑀𝑆𝐸 𝜃෠ = 𝐸 𝜃෠ − 𝜃
• Rewritten as
2 2 2
• 𝑀𝑆𝐸 𝜃෠ = 𝐸 𝜃෠ − 𝜃 = 𝐸 𝜃෠ − 𝐸 𝜃෠ + 𝜃 − 𝐸 𝜃෠
• =𝑉 𝜃෠ + 𝑏𝑖𝑎𝑠 2
• MSE is, therefore, the variance of the estimator plus the squared bias
• A good estimator should have a small mean square estimation error because this implies
that the estimator values are clustered around the parameter 𝜃
• If 𝜃෠ is an unbiased estimator, 𝑏𝑖𝑎𝑠 = 0 and then 𝑀𝑆𝐸 𝜃෠ = 𝑉 𝜃෠
• In comparing two estimators, the relative efficiency is calculated
• The relative efficiency of 𝜃መ2 to 𝜃መ1 is defined as:
෡1
𝑀𝑆𝐸 𝜃
• ෡2
𝑀𝑆𝐸 𝜃

• If this relative efficiency is less than 1, then 𝜃መ1 is a more efficient


estimator of 𝜃 than 𝜃መ2 in the sense that it has a smaller mean square
error
Interval Estimator
• An interval estimator of a population parameter under random
sampling consists of two random variables which are called the
UPPER and LOWER limits of the interval estimators
• These upper and lower limits determine the intervals expected to
contain the parameter estimated
• Interval estimates are all the ranges an interval estimator can assume.
Each interval estimate states a range within which a population
parameter probably lies.
Assessing Interval estimators
• Accuracy (confidence level)
• Precision (Margin of errors)
Confidence level of the estimator
• Refers to the probability that an interval estimator obtained will
contain the value of the population parameter
• Any possible outcome of an interval estimate and the confidence level
defines the type of estimate
• Eg: An interval estimate with confidence level (1 − 𝛼) is called an (𝟏 −
𝜶)−confidence interval
Margin of error
• This is measured by the half of the width of the interval estimates
• That is the difference between the upper or lower limits of the
confidence interval
Types of Interval estimate
• Confidence interval
• We cannot be certain that the interval contains the true, unknown population
parameter—we only use a sample from the full population to compute the point
estimate and the interval.
• The confidence interval is constructed so that we have high confidence that it does
contain the unknown population parameter
• A confidence interval bounds population or distribution parameters
• Tolerance interval
• We need to account for the potential error in each point estimate to form a tolerance
interval for the distribution
• A tolerance interval bounds a selected proportion of a distribution
• Prediction interval
• A prediction interval bounds future observations from the population or distribution

You might also like