3/18/2024
2.0. Introduction
• Populations are described by their probability
distributions and parameters…
– For quantitative populations, the location and
Chapter 2: Statistical Estimation shape are described by m and s.
– For a binomial populations, the location and
By: Meseret A.
shape are determined by p.
Debre Markose University
College of Business and Economics
Department of Management • If the values of parameters are unknown, we make
December, 2016 inferences about them using sample information.
1 2
• Therefore, statistical inference is the process of
using sample results to draw conclusions about the 2.1. The Concept Of Estimation
parameters of a population. • Estimation is a process whereby we select a
• Generally, statistical inference has two basic tasks random sample from a population and use an
• statistical estimation estimator (a sample statistic) to estimate a
• Statistical hypothesis testing population parameter
Estimation: is estimating or predicting the value of the
population parameter, Important Terms
E.g: what is (are) the most likely values of m and P – Estimator: is a formula (rule) or a strategy that is used
to estimate a population parameter
Hypothesis testing: is deciding about the value of a
parameter based on some preconceived idea… – Estimate: is a number computed by using the data from
3 the sample 4
1
3/18/2024
The Objective of Estimation Process of Estimation
• To determine the approximate value of a i. The relevant population is carefully defined,
population parameter ii. The parameter of interest is (are) identified,
• To determine size of a sample (n), which will be iii. A representative sample is selected,
enough for reliable estimation iv. Data are collected from all members of a sample,
• To determine some methods of statistical units v. Sample statistic (s) that serve (s) as estimate (s) is (are)
computed from the data collected
sampling from a population
vi. The population parameter (s) is (are)estimated using
sample statistic (s) (estimates)
5 6
e.g. point estimate of parameter θ generally is θ
2.2. Types of Estimation or Estθ = θ like Est µ is 𝒙 or Est P is 𝒑
• there are two types of estimations (estimators) Where,
– Point estimation (Estimator) θ is any population parameter
– Interval estimation (Estimator) θ is the corresponding sample statistic
N.B: in every case the appropriate point estimator of a
2.2.1. Point estimator (Estimation) parameter is simply the corresponding sample statistic:
a point estimator draws inference about a population by
estimating the value of an unknown parameter using a
single value or a point…
a point estimation deals with the task of selecting a specific
sample value as an estimate for population parameter…
a point estimate is a specific value of a sample statistic that
is used to estimate the unknown population parameter 7 8
2
3/18/2024
Properties (qualities) of a best Estimators a) Unbiasedness
• These properties refers to the desirable qualities of • An unbiased estimator of a population parameter
estimators is an estimator whose expected value is equal to
the value of a parameter
i.e., E(θ) = θ
• Therefore, the best estimator satisfies (meets) • unbiased estimator of θ is sample characteristic,
following conditions: which satisfy condition :
a) Unbiasedness lim θ = θ
𝑛→∞
b) Consistency
b) Consistency
c) Efficiency • An unbiased estimator is said to be consistent if
d) Sufficiency the difference between the estimator and the
parameter grows smaller as the sample size grows
9
larger 10
c) Sufficiency • However, there are three drawbacks to using point
• An estimator is said to be sufficient for estimating a estimators :
population parameter, if it contains all the information a) It is virtually certain that the estimate will be wrong.
in the samples about the parameter. i.e., P(θ = θ)= 0
lim σθ = 0
𝑛→∞
b) No information about how close the estimator is to
the parameter…
d) Efficiency c) Point estimates don’t have a capacity to reflect the
• If we have two unbiased estimator of the same effects of large sample size…
parameter, the point estimator with small variance is
said to have greater Efficiency
As a consequence, we use the second method of
• Let θ1 and θ2 be estimator of θ. Then, estimating a population parameter – Interval
Var(θ1) < var(θ2) , so θ1 is efficient estimator of θ Estimator …
11 12
3
3/18/2024
2.3. Interval Estimator • The general form of an interval estimate is as follows:
Point estimate ± Margin of error
• An interval estimator draws inferences about a population by
estimating the value of an unknown parameter using an Where, Margin of error (ME) is the maximum error of
interval. estimation at a given confidence level
• More generally, we can define confidence interval for any
• Thus, it is a rule for determining (based on sample information) a required probability content less than 1: Then, A and B are
range or interval in which the parameter is likely to fall…. such that
P(a<θ<b) = 1-α
N.B: an interval estimator (A.K.A confidence interval estimator): • Where,
• is stated in terms of Probability…. α is error probability-the risk of estimation taking any value
between 0 and 1; and to be equally distributed to both sides of
• gives information about Closeness to Unknown Population the normal curve…then α/2 for each side
Parameter 1- α is the probability content or level of confidence or
• Thus, the corresponding estimate is called an interval estimate … confidence coefficient for the interval- measure of the
confidence we have that the interval does indeed contain the
• The corresponding process is called an interval estimation parameter of interest.
which establish an interval consisting a lower limit and an upper N.B: a confidence interval calculated in this way is called a (1-α)
limit in which the population parameter is expected to fall 13 100% confidence interval for θ 14
• Thus, the interval estimate of θ at 1-α CL is: • And it will have the general form given by:
P(a≤θ≤b)= 1-α θ ± Margin of Error
Where;
where: a and b are the lower and upper limits of interval – Margin of error is maximum error of estimation or the
difference between Ӫ and θ
α is the risk of estimation
• And Margin of error (ME)=
f(θ)
– Where;
• δӪ is standard error of Ӫ
• Zα/2 is the critical value of Z or the z value that cuts off a right-tail
α/2 α/2 area of α/2 under standard normal curve-at 1-α confidence level
• Since Ӫ is a point estimator of θ, the confidence
a b
interval of θ at 1-α CL is:
Ӫ±
15 16
4
3/18/2024
Process of constructing CI
i. Compute sample statistic θ v. Multiply standard error of the statistic by tabled
value ( compute margin of error)
ii. Compute the standard error of the statistic δ θ
iii. Make a decision about level of confidence desired
vi. Form interval by adding and subtracting the
(usually, 90%, 95%, 98%, and 99%) calculated value (margin of error) to and from the
iv. Find the table value of the desired CL statistic
1-𝛼 Commonly
𝛼 used CL and
𝛼/2 𝐳 Zα/2 𝛂/𝟐
Ӫ±
0.90 0.1 0.05 𝐳𝟎.𝟎𝟓 = 𝟏. 𝟔𝟒𝟓
• Then, 1 - a of all the values of obtained in
repeated sampling from a distribution, construct an
0.95 0.05 0.025 𝐳𝟎.𝟎𝟐𝟓 = 1.96
interval:
0.98 0.02 0.01 𝐳𝟎.𝟎𝟏 = 𝟐. 𝟑𝟑
σ σ
0.99 0.01 0.005 𝐳𝟎.𝟎𝟎𝟓 = 𝟐. 𝟓𝟕𝟓 θ − 𝑍 α 2 𝑛 , θ + 𝑍 α 2 𝑛 that includes (covers)
17
the expected value of the population parameter. 18
2.3.1. interval estimator of m when δ is known [Link] Interval Estimator of m when δ is unknown
• Assume • If δ is unknown, and n<30, we have to use t-distribution
the sampling distribution of the mean 𝑥 is normal or with (n-1) degrees of freedom given by:
δ is known t(n-1)=
the observed sample mean is 𝑥 , then (1-α)100% CI for m is
given by
s s N.B: df is number of observations that are free to vary after
P( x z a 2 m x za 2 ) 1 a sample mean has been calculated…
n n • Assume:
N.B: in this form µ is in the center of the interval created by The sampling distribution 𝑥 has normal distribution
adding and subtracting Zα/2 standard errors to and from 𝑥 .
If δ is unknown, replace δ with the sample standard
This equation says that, with repeated sampling from this deviation s,
population, the proportion of values of 𝑥 for which the
interval includes μ is equal to 1 − α. • Then (1-α)100% CI for m when δ is unknown is given by:
This form of probability statement is very useful because it is 𝑠 𝑠
the confidence interval estimator of μ. 𝑥 − 𝑡 (𝑛 − 1 ) α 2 < µ < 𝑥 + 𝑡 (𝑛 − 1) α 2
19
𝑛 𝑛 20
5
3/18/2024
• However, for large degrees of freedom, the t-
distribution is approximated by Z-distribution given • How to Interpreting the Confidence Interval
Estimate?
by: to interpret the confidence interval estimate properly,
𝒙−µ remember that the confidence interval estimator was
Z= 𝒔 √𝒏 derived from the sampling distribution of the sample
mean.
• Where, the sampling distribution was used to make probability
s is sample standard deviation statements about the sample mean
although the form has changed, the confidence interval
𝒔 √𝒏 is estimated standard error of 𝒙 estimator is also a probability statement about the sample
mean. It states that there is 1 − α probability that the
• Therefore, a larger sample (1-α)100% CI for μ is sample mean will be𝒔 equal to a value𝒔such that the
given by: interval 𝒙 − 𝒁 𝜶 𝟐 𝒕𝒐 𝒙 + 𝒁 𝜶 𝟐 will include the
𝒏 𝒏
𝑠 𝑠 population mean.
𝑥 −𝑍α 2 < µ < 𝑥+𝑍α 2 once the sample mean is computed, the interval acts as
𝑛 𝑛 the lower and upper limits of the interval estimate of the
population mean
21 22
2.3.2. Interval Estimator of Population Proportion 2.3.3 Confidence Interval for (μ1 – μ2) (for large
(Large Sample Size) samples and known variances)
• Assume • Let:
𝒑 is the observed proportion of “success” in random – Ẍ1 represents the observed sample mean of random
sample of size n sample size of n1 with population mean of μ1
Sampling distribution of 𝒑 is normal – Ẍ2 represents the observed sample mean of random
sample of size n2 with population mean of μ2
• Then (1-α)100% CI for population proportion P is
– the sample sizes are large and population variances are known
given by:
• Then, (1-α)100% confidence interval for (μ1 – μ2) is
𝒑(𝟏 − 𝒑) 𝒑(𝟏 − 𝒑) given by:
𝒑−𝑍α 2 <p<𝒑+𝑍α 2
𝒏 𝒏
23 24
6
3/18/2024
2.3.3 Confidence Interval for (μ1 – μ2) (for small • It therefore follows that random variable
samples and unknown variances) 𝒁=
Ẍ𝟏−Ẍ𝟐 −(𝛍𝟏 – 𝛍𝟐)
has a standard normal
• Suppose again that we 𝐧𝟏+𝐧𝟐
𝜹𝟐 ( 𝒏𝟏𝒏𝟐 )
– have independent random samples n1 and n2 distribution……
observations from normal populations with means μ1 • Since this variance is common to the two populations,
and μ2, the two sets of sample information can be pooled
– populations have common unknown variance (δ2), together to estimate it. The estimator used is:
n1−1 S12 +(n2−1)S22
– Inference about population means is based on the S=
(𝑛1+ 𝑛2−2)
difference Ẍ1 − Ẍ2 between the two sample means. 2
• Replacing the unknown 𝛿 gives the random variable
• Therefore, This random variable has a normal Ẍ𝟏 − Ẍ𝟐 − (𝛍𝟏 – 𝛍𝟐)
distribution with mean (μ1 – μ2) and variance 𝒕=
𝒗𝒂𝒓 Ẍ𝟏 − Ẍ𝟐 = 𝐯𝐚𝐫 Ẍ𝟏 + 𝐯𝐚𝐫 Ẍ𝟐 𝐧𝟏 + 𝐧𝟐
𝑺 ( )
𝒏𝟏𝒏𝟐
𝜹𝟏𝟐 𝜹𝟐𝟐 𝟏 𝟏 𝐧 + 𝐧𝟐
= + =𝜹𝟐 (𝒏𝟏 + 𝒏𝟐) = 𝜹𝟐 ( 𝐧𝟏 ) N.B: the random variable obeys the student’s t
𝐧𝟏 𝐧𝟐 𝟏 𝐧𝟐 distribution with (n1+n2-2) degrees of freedom
25 26
• Then, confidence intervals for (1-α)100% 2.2.4 Confidence Interval for (P1 – P2) (for large
confidence interval for the difference between the samples)
population means (μ1 – μ2) can be obtained • Let:
through: – þ1 is observed proportion of success in random sample
𝐧𝟏 + 𝐧𝟐 of n1 and with population proportion of P1
Ẍ𝟏 − Ẍ𝟐 − 𝒕𝒏𝟏 + 𝒏𝟐 − 𝟐 𝛼 2 𝑺 ( ) – þ2 is observed proportion of success in random sample
𝒏𝟏𝒏𝟐 of n2 and with population proportion of P2
< 𝛍𝟏 – 𝛍𝟐 – the sample sizes are large
𝜶 𝐧𝟏 + 𝐧𝟐 • Then, (1-α)100% confidence interval for (P1 – P2) is
< Ẍ𝟏 − Ẍ𝟐 + 𝒕𝐧𝟏 + 𝐧𝟐 − 𝟐 𝑺 ( ) given by:
𝟐 𝒏𝟏𝒏𝟐
27 28
7
3/18/2024
Information and the Width of the Interval Sample-Size Determination
• Interval estimation is designed to convert data into
information….. • Before determining the necessary sample size,
three questions must be answered:
• The width of the confidence interval estimate is a function of How close do you want your sample estimate to be to
the population standard deviation, the confidence level, and
the sample size… the unknown parameter? (What is the desired bound,
– If there is a great deal of variation in the random variable B?)
(measured by a large standard deviation), it is more difficult to
accurately estimate the population mean. That difficulty is
translated into a wider interval. What do you want the desired confidence level (1-α) to
– decreasing the confidence level narrows the interval; increasing it
be so that the distance between your estimate and the
widens the interval. However, a large confidence level is generally parameter is less than or equal to B?
desirable because that means a larger proportion of confidence
interval estimates that will be correct in the long run.
What is your estimate of the variance (or standard
– A larger sample size provides more potential information. The deviation) of the population in question?
increased amount of information is reflected in a narrower
29 30
interval.
Error of Estimation • which can also be expressed as
• sampling error was the difference between the
sample and the population exists because of the
observations that happened to be selected for the • This tells us that the difference between Ẍ and μ lies
sample. between and with probability 1 α.
• Now we can define the sampling error as the
difference between an estimator and a parameter
• Expressed another way, we have with probability
• We can also define this difference as the error of
estimation (Ẍ-μ) 1 α,
• In our derivation of the confidence interval
estimator of μ • We interpret this to mean that is the
maximum error of estimation that we are willing
31 to tolerate. 32
8
3/18/2024
• We label this value B, which stands for the bound
on the error of estimation
Determining the Sample Size
• Solving for n, we produce Minimum required
sample size in estimating population mean, μ as:
• With the same sort of transformation and
calculation the Minimum required sample size in
estimating population proportion is given by:
33