0% found this document useful (0 votes)
19 views20 pages

BA UNIT 2 Notes

Statistical sampling methods are essential for selecting a representative subset of data from a larger population to make valid inferences about the whole group. The document outlines two main types of sampling methods: probability sampling, which allows for strong statistical inferences, and non-probability sampling, which is easier but carries a higher risk of bias. It also discusses the importance of understanding sampling error, which affects the validity and reliability of research findings.

Uploaded by

pavanbhimaraju4
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views20 pages

BA UNIT 2 Notes

Statistical sampling methods are essential for selecting a representative subset of data from a larger population to make valid inferences about the whole group. The document outlines two main types of sampling methods: probability sampling, which allows for strong statistical inferences, and non-probability sampling, which is easier but carries a higher risk of bias. It also discusses the importance of understanding sampling error, which affects the validity and reliability of research findings.

Uploaded by

pavanbhimaraju4
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT-II

Statistical Sampling Methods


Statistical sampling is a method of selecting a representative subset of data from a larger
population to estimate characteristics, analyze behavior, or draw conclusions about the whole group. It
uses random, objective, and mathematical techniques to ensure the subset is unbiased, allowing for
high-confidence inferences at lower costs and faster speeds than a complete census.

When you conduct research about a group of people, it’s rarely possible to collect data from
every person in that group. Instead, you select a sample. The sample is the group of individuals who
will actually participate in the research.
To draw valid conclusions from your results, you have to carefully decide how you will select
a sample that is representative of the group as a whole. This is called a sampling method. There are
two primary types of sampling methods that you can use in your research:
 Probability sampling involves random selection, allowing you to make strong statistical
inferences about the whole group.
 Non-probability sampling involves non-random selection based on convenience or other
criteria, allowing you to easily collect data.
You should clearly explain how you selected your sample in the methodology section of your
paper or thesis, as well as how you approached minimizing research bias in your work.
Population vs. sample
First, you need to understand the difference between a population and a sample, and identify
the target population of your research.
 The population is the entire group that you want to draw conclusions about.
 The sample is the specific group of individuals that you will collect data from.
The population can be defined in terms of geographical location, age, income, or many other
characteristics.
If the population is very large, demographically mixed, and geographically dispersed, it might
be difficult to gain access to a representative sample. A lack of a representative sample affects
the validity of your results, and can lead to several research biases, particularly sampling bias.
Sampling frame
The sampling frame is the actual list of individuals that the sample will be drawn from.
Ideally, it should include the entire target population (and nobody who is not part of that population).
Example: Sampling frameYou are doing research on working conditions at a social media
marketing company. Your population is all 1000 employees of the company. Your sampling frame is
the company’s HR database, which lists the names and contact details of every employee.
Sample size
The number of individuals you should include in your sample depends on various factors,
including the size and variability of the population and your research design. There are
different sample size calculators and formulas depending on what you want to achieve with statistical
analysis.
It can be very broad or quite narrow: maybe you want to make inferences about the whole
adult population of your country; maybe your research focuses on customers of a certain company,
patients with a specific health condition, or students in a single school.
It is important to carefully define your target population according to the purpose and
practicalities of your project.
Probability sampling methods
Probability sampling means that every member of the population has a chance of being
selected. It is mainly used in quantitative research. If you want to produce results that are
representative of the whole population, probability sampling techniques are the most valid choice.
There are four main types of probability sample.
1. Simple random sampling
In a simple random sample, every member of the population has an equal chance of being
selected. Your sampling frame should include the whole population.
To conduct this type of sampling, you can use tools like random number generators or other
techniques that are based entirely on chance.
Example: Simple random samplingYou want to select a simple random sample of 1000
employees of a social media marketing company. You assign a number to every employee in the
company database from 1 to 1000, and use a random number generator to select 100 numbers.
2. Systematic sampling
Systematic sampling is similar to simple random sampling, but it is usually slightly easier to
conduct. Every member of the population is listed with a number, but instead of randomly generating
numbers, individuals are chosen at regular intervals.
Example: Systematic samplingAll employees of the company are listed in alphabetical order.
From the first 10 numbers, you randomly select a starting point: number 6. From number 6 onwards,
every 10th person on the list is selected (6, 16, 26, 36, and so on), and you end up with a sample of
100 people.
If you use this technique, it is important to make sure that there is no hidden pattern in the list
that might skew the sample. For example, if the HR database groups employees by team, and team
members are listed in order of seniority, there is a risk that your interval might skip over people in
junior roles, resulting in a sample that is skewed towards senior employees.
3. Stratified sampling
Stratified sampling involves dividing the population into subpopulations that may differ in
important ways. It allows you draw more precise conclusions by ensuring that every subgroup is
properly represented in the sample.
To use this sampling method, you divide the population into subgroups (called strata) based
on the relevant characteristic (e.g., gender identity, age range, income bracket, job role).
Based on the overall proportions of the population, you calculate how many people should be
sampled from each subgroup. Then you use random or systematic sampling to select a sample from
each subgroup.
Example: Stratified samplingThe company has 800 female employees and 200 male
employees. You want to ensure that the sample reflects the gender balance of the company, so you sort
the population into two strata based on gender. Then you use random sampling on each group,
selecting 80 women and 20 men, which gives you a representative sample of 100 people.
4. Cluster sampling
Cluster sampling also involves dividing the population into subgroups, but each subgroup
should have similar characteristics to the whole sample. Instead of sampling individuals from each
subgroup, you randomly select entire subgroups.
If it is practically possible, you might include every individual from each sampled cluster. If
the clusters themselves are large, you can also sample individuals from within each cluster using one
of the techniques above. This is called multistage sampling.
This method is good for dealing with large and dispersed populations, but there is more risk
of error in the sample, as there could be substantial differences between clusters. It’s difficult to
guarantee that the sampled clusters are really representative of the whole population.
Example: Cluster samplingThe company has offices in 10 cities across the country (all with
roughly the same number of employees in similar roles). You don’t have the capacity to travel to
every office to collect your data, so you use random sampling to select 3 offices – these are your
clusters.
Non-probability sampling methods
In a non-probability sample, individuals are selected based on non-random criteria, and not
every individual has a chance of being included.
This type of sample is easier and cheaper to access, but it has a higher risk of sampling bias.
That means the inferences you can make about the population are weaker than with probability
samples, and your conclusions may be more limited. If you use a non-probability sample, you should
still aim to make it as representative of the population as possible.
Non-probability sampling techniques are often used in exploratory and qualitative research. In
these types of research, the aim is not to test a hypothesis about a broad population, but to develop an
initial understanding of a small or under-researched population.

1. Convenience sampling
A convenience sample simply includes the individuals who happen to be most accessible to
the researcher.
This is an easy and inexpensive way to gather initial data, but there is no way to tell if the
sample is representative of the population, so it can’t produce generalizable results. Convenience
samples are at risk for both sampling bias and selection bias.
Example: Convenience samplingYou are researching opinions about student support services
in your university, so after each of your classes, you ask your fellow students to complete a survey on
the topic. This is a convenient way to gather data, but as you only surveyed students taking the same
classes as you at the same level, the sample is not representative of all the students at your university.
2. Voluntary response sampling
Similar to a convenience sample, a voluntary response sample is mainly based on ease of
access. Instead of the researcher choosing participants and directly contacting them, people volunteer
themselves (e.g. by responding to a public online survey).
Voluntary response samples are always at least somewhat biased, as some people will
inherently be more likely to volunteer than others, leading to self-selection bias.
Example: Voluntary response samplingYou send out the survey to all students at your
university and a lot of students decide to complete it. This can certainly give you some insight into the
topic, but the people who responded are more likely to be those who have strong opinions about the
student support services, so you can’t be sure that their opinions are representative of all students.
3. Purposive sampling
This type of sampling, also known as judgement sampling, involves the researcher using their
expertise to select a sample that is most useful to the purposes of the research.
It is often used in qualitative research, where the researcher wants to gain detailed knowledge
about a specific phenomenon rather than make statistical inferences, or where the population is very
small and specific. An effective purposive sample must have clear criteria and rationale for inclusion.
Always make sure to describe your inclusion and exclusion criteria and beware of observer
bias affecting your arguments.
Example: Purposive samplingYou want to know more about the opinions and experiences of
disabled students at your university, so you purposefully select a number of students with different
support needs in order to gather a varied range of data on their experiences with student services.
4. Snowball sampling
If the population is hard to access, snowball sampling can be used to recruit participants via
other participants. The number of people you have access to “snowballs” as you get in contact with
more people. The downside here is also representativeness, as you have no way of knowing how
representative your sample is due to the reliance on participants recruiting others. This can lead
to sampling bias.
Example: Snowball samplingYou are researching experiences of homelessness in your city.
Since there is no list of all homeless people in the city, probability sampling isn’t possible. You meet
one person who agrees to participate in the research, and she puts you in contact with other homeless
people that she knows in the area.
5. Quota sampling
Quota sampling relies on the non-random selection of a predetermined number or proportion
of units. This is called a quota.
You first divide the population into mutually exclusive subgroups (called strata) and then
recruit sample units until you reach your quota. These units share specific characteristics, determined
by you prior to forming your strata. The aim of quota sampling is to control what or who makes up
your sample.
Example: Quota samplingYou want to gauge consumer interest in a new produce delivery
service in Boston, focused on dietary preferences. You divide the population into meat eaters,
vegetarians, and vegans, drawing a sample of 1000 people. Since the company wants to cater to all
consumers, you set a quota of 200 people for each dietary group. In this way, all dietary preferences
are equally represented in your research, and you can easily compare these [Link] continue
recruiting until you reach the quota of 200 participants for each subgroup.

Estimating population parameters


min

SAMPLING ERROR:
Sampling error is a fundamental concept in statistics that refers to the discrepancy between a sample
statistic and the true population parameter it aims to estimate. It arises from the fact that we are
observing only a subset of the population rather than the entire population itself. Understanding the
nature and implications of sampling error is essential for researchers, analysts, and decision-makers
across various fields.
Sampling error encompasses random fluctuations that occur when different samples are drawn from
the same population. It reflects the variability inherent in the sampling process and impacts the
accuracy and reliability of research findings. By recognizing and quantifying sampling errors, you can
assess the precision of their estimates and make informed decisions based on the reliability of the
data.
Importance of Understanding Sampling Error
Understanding sampling error is crucial for several reasons:
 Validity of Inferences: Sampling error directly affects the validity of statistical inferences
drawn from sample data. Researchers must recognize the potential for error and assess its
impact on the reliability of research findings.
 Precision of Estimates: Sampling error quantifies the uncertainty associated with sample
estimates of population parameters. Recognizing the magnitude of sampling error helps
researchers gauge the accuracy of their estimates and establish confidence intervals around
them.
 Data Quality Assurance: Awareness of sampling error prompts researchers to implement
appropriate sampling techniques and validation procedures to minimize error and ensure the
quality and integrity of research data.
 Decision-Making Confidence: Decision-makers rely on accurate and reliable data to make
informed decisions. Understanding sampling error provides decision-makers with insights
into data reliability and enhances their confidence in using research findings to inform
policies, strategies, and actions.
Types of Sampling Error
Sampling error can take various forms, each with unique characteristics and implications for data
analysis. Understanding these types is crucial for effectively addressing and mitigating their impact on
research outcomes.
Random Sampling Error
Random sampling error occurs when the sample selected for analysis is not perfectly representative of
the entire population due to chance. Despite careful selection procedures, there is always a degree of
randomness inherent in sampling. This randomness can lead to fluctuations in sample characteristics
compared to the population parameters.
Example:
Suppose you're conducting a survey on voting preferences in a city. You randomly select 500
individuals from the voter registry to participate. However, due to chance, your sample ends up with
slightly more young voters compared to the population. This discrepancy is a result of a random
sampling error.

To minimize random sampling error:


 Increase Sample Size: Larger samples reduce the impact of random fluctuations, leading to
more reliable estimates of population parameters.
 Randomization Techniques: Employ randomization methods such as simple random
sampling or stratified random sampling to ensure every member of the population has an
equal chance of being included in the sample.
Systematic Sampling Error
Systematic sampling error occurs when there is a consistent bias in the selection process, leading to
results that consistently overestimate or underestimate the true population parameters. Unlike random
sampling error, which is due to chance, systematic error arises from flaws in the sampling
methodology or data collection process.
Example:
Imagine you're conducting a survey on household income levels in a country. Instead of randomly
selecting households, you only survey individuals from urban areas, inadvertently excluding rural
populations. As a result, your sample systematically underrepresents low-income households, leading
to an overestimation of average income levels.

To mitigate systematic sampling error:


 Diversify Sampling Methods: Use a combination of sampling techniques (e.g., stratified
sampling, cluster sampling) to ensure a more representative sample.
 Validate Sampling Frame: Thoroughly assess the sampling frame to ensure it accurately
reflects the entire population, addressing any biases or omissions.
Non-Sampling Error
Non-sampling error encompasses errors that are not directly related to the sampling process but can
still impact the accuracy of research findings. These errors can arise from various sources, including
data collection, measurement, and processing.

Non-sampling errors can manifest in different forms:


 Measurement Error: Inaccuracies or inconsistencies in measuring variables of interest,
leading to distorted results.
 Selection Bias: Systematic differences between individuals or units included in the sample
and those excluded, resulting in biased estimates.
 Non-Response Bias: This occurs when individuals chosen for the sample do not respond to
the survey, potentially skewing the results.
 Coverage Error: Arises when certain segments of the population are not adequately
represented in the sampling frame, leading to a biased sample.
Understanding and distinguishing between sampling and non-sampling errors is essential for
accurately interpreting research findings and implementing appropriate corrective measures. While
sampling errors can be minimized through careful sampling techniques, addressing non-sampling
errors often requires rigorous validation procedures and data quality checks throughout the research
process.
Examples of Sampling Errors
Understanding sampling error through real-world examples can provide valuable insights into its
effects on research outcomes and decision-making processes. Let's explore a range of examples that
illustrate different scenarios where sampling errors may arise.
Example 1: Political Polling
Consider a scenario where a polling organization conducts a survey to estimate the proportion of
voters supporting a particular candidate in an upcoming election. Due to limitations in resources and
time, the organization selects a random sample of registered voters from a specific geographic region.
However, the sample inadvertently overrepresents urban areas and underrepresents rural areas.

Effect: The survey results may reflect a higher level of support for the candidate than what exists in
the entire population. This discrepancy arises from sampling error, as the sample fails to accurately
represent the demographic diversity and voting preferences of the entire electorate.
Example 2: Quality Control in Manufacturing
In a manufacturing plant, quality control inspectors conduct random inspections of finished products
to assess their compliance with quality standards. However, due to time constraints, inspectors tend to
focus more on products from certain production lines or shifts.

Effect: Sampling error may occur if products from certain production lines or shifts exhibit different
quality characteristics than those from others. As a result, the sampled products may not accurately
represent the overall quality of the entire production process, leading to biased quality assessments
and potentially overlooking quality issues.
Example 3: Public Health Surveys
A public health agency conducts a survey to estimate the prevalence of a specific health condition in a
community. The agency randomly selects households from a list of residential addresses and invites
residents to participate in the survey. However, some residents decline to participate due to privacy
concerns or other reasons.
Effect: Non-response bias may introduce sampling error if the individuals who decline to participate
differ systematically from those who agree to participate. Depending on the characteristics of non-
respondents, the survey results may underestimate or overestimate the true prevalence of the health
condition in the community.
Example 4: Market Research
A market research firm conducts a survey to gather feedback on a new product launch. The firm
distributes online surveys to a random sample of customers who have purchased similar products in
the past. However, respondents who choose to participate may have stronger opinions or different
purchasing behaviors than those who do not participate.
Effect: Self-selection bias may lead to sampling error if the opinions and behaviors of survey
respondents differ systematically from those of non-respondents. The survey results may overstate or
understate the level of interest or satisfaction with the new product, affecting the validity of market
research insights.
Sampling Distribution
Sampling distribution is essential in various aspects of real life, essential in inferential statistics. A
sampling distribution represents the probability distribution of a statistic (such as the mean or standard
deviation) that is calculated from multiple samples of a population. It helps us to understand how a
statistic varies across different samples and is crucial for making inferences about the population.

Sampling distribution is the probability distribution of a statistic based on random samples of a given
population. It is also know as finite distribution.
Important Terminologies in Sampling Distribution
Some important terminologies related to sampling distribution are given below:
 Statistic: Summary value from a sample (e.g., mean, median).
 Parameter: Summary value from a population.
 Sample: A subset of a population.
 Population: The entire group being studied.
 Sampling Distribution: Distribution of a statistic across many samples.
 Central Limit Theorem (CLT): Sample means follow a normal distribution as the sample
size increases.
 Standard Error: Standard deviation of the sampling distribution.
 Bias: Systematic error causing deviation from the true value.
 Confidence Interval: Range likely to contain the population parameter.
 Sampling Method: How samples are chosen (random, stratified, etc.).
 Inferential Statistics: Concluding a population from samples.
 Hypothesis Testing: Making decisions about population parameters using sample data.
Factors Influencing Sampling Distribution
The variability of a sampling distribution is measured by standard error or population variance,
depending on the context and the type of inference required. Both measure how spread out the data is
around the mean.
Main factors influencing the variability of a sampling distribution are:
1. Number Observed in a Population: The symbol for this variable is "N." It is the measure of
observed activity in a given group of data.
2. Number Observed in Sample: The symbol for this variable is "n." It is the measure of
observed activity in a random sample of data that is part of the larger grouping.
3. Method of Choosing Sample: How you chose the samples can account for variability in
some cases.
Types of Distributions
3 main types of sampling distributions are:
 Sampling Distribution of Mean
 Sampling Distribution of Proportion
 T-Distribution
Sampling Distribution of Mean
The sampling distribution of the mean refers to the probability distribution of sample means that you
get by repeatedly taking samples (of the same size) from a population and calculating the mean of
each sample.
Key concepts of Sampling Distribution of Mean
 Population Mean (μ): The average of the entire population.
 Sample Mean (x̄ ): The average of a sample taken from the population.
 Sampling Distribution of the Mean: If you take multiple samples and plot their means, that
plot will form the sampling distribution of the mean.
For any population with mean µ and standard deviation σ:
 Mean, or center of the sampling distribution of x̄ , is equal to the population mean, µ.
µ x ¿= µ
−¿

There is no tendency for a sample mean to fall systematically above or below µ, even if the
distribution of the raw data is skewed. Thus, the mean of the sampling distribution is an unbiased
estimate of the population mean µ.
 Standard deviation of the sampling distribution is σ/√n, where n is the sample size.
σ x = σ/√n
Where

 σ x - Standard Deviation of Sampling Deviation


 σ - Population Standard Deviation
 n - Sample size

 sampling distribution of standard deviation

Sampling Distribution of Proportion


Sampling distribution of a proportion focuses on proportions in a population. Here, you select samples
and calculate their corresponding proportions. The means of the sample proportions from each group
represent the proportion of the entire population.
The formula for the sampling distribution of a proportion (often denoted as p̂ ) is:
p̂ = x/n
Where:
 p̂ is the Sample Proportion
 x is the Number of "successes" or occurrences of the Event of Interest in the Sample
 n is Sample Size
This formula calculates the proportion of occurrences of a certain event (e.g., success, positive
outcome) within a sample.
T-Distribution
Sampling distribution involves a small population or a population about which you don't know much.
It is used to estimate the mean of the population and other statistics such as confidence intervals,
statistical differences, and linear regression. T-distribution uses a t-score to evaluate data that wouldn't
be appropriate for a normal distribution.
The formula for the t-score, denoted as t, is:
t = [x - μ] / [s /√(n)]
Where:
 x is the Sample Mean
 μ is Population Mean (or an estimate of it)
 s is the Sample Standard Deviation
 n is Sample Size
This formula calculates the difference between the sample mean and the population mean, scaled by
the standard error of the sample mean. The t-score helps to assess whether the observed difference
between the sample and population means is statistically significant.
Solved Examples of Sampling Distribution
Example 1: Mean and standard deviation of the tax value of all vehicles registered in a certain state
are μ=$13,525 and σ=$4,180. Suppose random samples of size 100 are drawn from the population of
vehicles.
Find
 mean μx̄
 standard deviation σx̄ of the sample mean x̄
Solution:
Since n = 100, the formulas yield
μx̄ = μ = $13,525
σx̄ = σ / √n = $4180 / √100
σx̄ = $418
Example 2: A prototype automotive tire has a design life of 38,500 miles with a standard deviation of
2,500 miles. Five such tires are manufactured and tested. On the assumption that the actual population
mean is 38,500 miles and the actual population standard deviation is 2,500 miles, find the probability
that the sample mean will be less than 36,000 miles. Assume that the distribution of lifetimes of such
tires is normal.
Solution:
Here, we will assume and use units of thousands of miles.
Then sample mean x̄ has
 Mean: μx̄ = μ = 38.5
 Standard Deviation: σx̄ = σ/√n = 2.5/√5 = 1.11803
Since the population is normally distributed, so is x̄ , hence,
P (X < 36) = P(Z < {36 - μx̄ }/σx̄ )
P (X < 36) = P(Z < {36 - 38.5}/1.11803)
P (X < 36) = P(Z < -2.24)
P(X < 36) = 0.0125
Therefore, if the tires perform as designed then there is only about a 1.25% chance that the average of
a sample of this size would be so low.

Interval estimation is a fundamental concept in statistics that involves estimating a range within
which a population parameter is expected to lie rather than providing a single-point estimate. This
method provides the more comprehensive understanding of the potential values of the parameter
offering the insights into the precision and reliability of the estimate.
By considering the variability in sample data, interval estimation helps in the making more informed
decisions based on the statistical analysis. Whether we're dealing with the means, proportions, or
variances interval estimation plays a crucial role in the inferential statistics allowing the researchers to
quantify the uncertainty in their estimates.
What is Interval Estimation?
Interval estimation refers to the statistical technique used to estimate a population parameter by
calculating an interval within which the parameter is expected to fall with the specified level of
confidence. This interval is known as the confidence interval. The width of the interval reflects the
precision of the estimate: narrower intervals indicate the more precise estimates while wider intervals
suggest the greater uncertainty.
Types of Interval Estimation
There are three types of Interval they are:
 Confidence Interval
 Prediction Interval
 Tolerance Interval
Confidence Interval
A Confidence Interval (CI) is a range of values derived from the sample statistics that is likely to
contain the value of an unknown population parameter. It provides an estimate of the parameter's
value with the certain level of confidence typically expressed as a percentage. The interval is
constructed so that if the same procedure is repeated numerous times a specified the percentage of the
intervals would contain the parameter.
For the Population Mean μ with the Known Population Standard Deviation σ :

σ
CI =x́ ± Z α /2 ×
√n
 x́ : Sample mean
 Z α / 2: Z-score corresponding to the desired confidence level
 σ : Population standard deviation
 n: Sample size
For the Population Mean μ with the Unknown Population Standard Deviation s:

s
CI =x́ ± t α/ 2 ×
√n
 t α / 2:The t-distribution value corresponding to the desired confidence level and degrees of the
freedom df = n - 1
 s: Sample standard deviation
For the Population Proportion p:

CI =^p ± Z α / 2 ×
√ ^p (1− ^p )
n
 ^p: Sample proportion
Example:
Suppose we are estimating the average height of adult males in city. We take a random sample and
calculate the mean height as the 175 cm. We also calculate the 95% confidence interval for the mean
height to be between the 173 cm and 177 cm. This means we can be 95% confident that the true
average height of the all adult males in the city lies within this interval of the 173 cm to the 177 cm.
Prediction Interval
A Prediction Interval (PI) is a range of values that is likely to contain the value of the single new
observation from the given population. Unlike a confidence interval which estimates a parameter a
prediction interval estimates where a new data point will fall accounting for both variability in the
sample and the uncertainty about the parameter estimates.
For a Single New Observation:


PI = x́ ± t α/ 2 × s × 1+
1
n
 t α / 2: t-distribution value corresponding to the desired confidence level
 s: Sample standard deviation
 n: Sample size
Example:
The Imagine we have a dataset of the students' test scores and have built a regression model to the
predict future scores. If we predict a future student's test score based on their previous performance
the prediction interval might be 65 to the 85. This means there is a high the probability that the new
student's actual score will fall within this range.
Tolerance Interval
A Tolerance Interval (TI) provides a range within which a specified proportion of the population falls
with the certain level of confidence. It accounts for the variability in the population as well as the
sample size aiming to ensure that a specified percentage of the population is covered by the interval.
The Tolerance intervals are useful for the understanding the distribution and spread of the data within
the population.
General Form of Tolerance Interval:
TI =x́ ± k × s
 k: The Tolerance factor
 s: Sample standard deviation
Example:
Consider a manufacturing process that produces bolts with the varying diameters. After taking a
sample we calculate a 95% tolerance interval for the diameters to be between the 4.95 mm and 5.05
mm. This means we are 95% confident that 99% of the all future bolts produced will have diameters
within this range.
How to Calculate Interval Estimates?
Calculating Confidence Interval
The confidence interval (CI) for the mean when the population standard deviation is known is given
by:

CI =x́ ± z
( )
σ
√n
where: x́ is the sample mean z is the z-score corresponding to the desired confidence level σ is the
population standard deviation and n is the sample size.
Calculating Prediction Interval
The confidence interval (CI) for the mean when the population standard deviation is unknown is given
by:

CI =x́ ± t
( √sn )
where: x́ is the sample mean t is the t-score corresponding to the desired confidence level s is the
sample standard deviation and n is the sample size.
Calculating Tolerance Interval
The formula for the tolerance interval is more complex and depends on the desired confidence level
and proportion of the population. For normal distributions it generally involves:
TI =x́ ± k ⋅ s
where k is derived from tolerance factor tables based on the desired the confidence and proportion.
Confidence Level and Margin of Error
 The Confidence Level indicates the degree of the certainty that the interval contains the
population parameter. The Common confidence levels are 90%, 95% and 99%.
 The Margin of Error represents the range within which the true parameter is expected to the
fall. It is influenced by the confidence level and sample size and can be calculated as:
σ
Margin of Error=z ⋅
√n
Solved Practice Questions
Question 1: Given a sample mean of 50 a population standard deviation of the 10 and a sample
size of 100 calculate the 95% confidence interval for the population mean.
Solution:
σ
Confidence Interval = x́ ± Z α/ 2 ×
√n
10
Confidence Interval = 50 ±1.96 × =50 ± 1.96 ×1=50 ± 1.96
√100
Confidence Interval = (48.04, 51.96)
Question 2: A sample of 25 students has a mean score of 80 with the standard deviation of 5.
Find the 90% confidence interval for the true mean score.
Solution:
s
Confidence Interval = x́ ± t α / 2 ,n−1 ×
√n
5
Confidence Interval = 80 ± 1.645× =80± 1.645 ×1=80 ±1.645
√25
Confidence Interval = (78.355, 81.645)
Question 3 : A factory tests the breaking strength of 50 samples of the material with the mean
breaking strength of 3000 psi and a standard deviation of 100 psi. Calculate the 99% confidence
interval for the true mean breaking strength.
Solution:
σ
Confidence Interval = x́ ± Z α/ 2 ×
√n
100
Confidence Interval = 3000 ± 2.576 × =3000 ± 2.576 ×14.14=3000 ± 36.42
√50
Confidence Interval = (2963.58, 3036.42)
Question 4 : The Determine the 95% confidence interval for the mean of a sample with the
mean of 200 a sample size of 30 and a standard deviation of 15.
Solution:
s
Confidence Interval = x́ ± t α / 2 ,n−1 ×
√n
15
Confidence Interval = 200 ± 2.045× =200 ± 2.045 ×2.74=200 ± 5.61
√30
Confidence Interval = (194.39, 205.61)
Question 5 : A researcher collects data from the 40 participants and finds a mean score of 75
with the standard deviation of 8. What is the 99% confidence interval for the population mean?
Solution:
σ
Confidence Interval = x́ ± Z α/ 2 ×
√n
8
Confidence Interval = 75 ± 2.576 × =75 ±2.576 × 1.26=75 ± 3.24
√ 40
Confidence Interval = (71.76, 78.24)
Question 6 : Calculate the 95% confidence interval for the average height of a population given
a sample mean of the 175 cm a standard deviation of 10 cm and a sample size of 25.
Solution:
s
Confidence Interval = x́ ± t α / 2 ,n−1 ×
√n
10
Confidence Interval = 175 ± 2.064 × =175 ±2.064 × 2=175 ± 4.128
√ 25
Confidence Interval = (170.872 cm, 179.128 cm)
Question 7 : A survey of 100 customers yields an average satisfaction score of the 85 with the
standard deviation of the 7. Find the 99% confidence interval for the true mean satisfaction
score.
Solution:
σ
Confidence Interval = x́ ± Z α/ 2 ×
√n
7
Confidence Interval = 85 ± 2.576 × =85 ±2.576 × 0.7=85 ±1.8032
√100
Confidence Interval = (83.1968, 86.8032)
Question 8: The Determine the 95% confidence interval for the average number of hours
studied per week by the group of the 50 students given a mean of the 20 hours and a standard
deviation of the 4 hours.
Solution:
σ
Confidence Interval = x́ ± Z α / 2 ×
√n
4
Confidence Interval = 20 ± 1.96 × =20 ± 1.96 ×0.566=20 ± 1.11
√50
Confidence Interval = (18.89 hours, 21.11 hours)
Question 9 : The Calculate the 90% confidence interval for the mean IQ score given a sample
mean of the 110 a standard deviation of 15 and a sample size of the 60.
Solution:
σ
Confidence Interval = x́ ± Z α / 2 ×
√n
15
Confidence Interval = 110± 1.645 × =110± 1.645× 1.936=110 ±3.18
√ 60
Confidence Interval = (106.82, 113.18)
Question 10 : A sample of 40 plants has a mean height of the 35 cm with the standard deviation
of 5 cm. Determine the 95% confidence interval for the true mean height of the plants.
Solution:
s
Confidence Interval = x́ ± t α / 2 ,n−1 ×
√n
5
Confidence Interval = 35 ± 2.022× =35± 2.022 ×0.79=35 ±1.596
√ 40
Confidence Interval = (33.404 cm, 36.596 cm)
Prediction interval
If we are interested in capturing the uncertainty about the random variable y, then we should refer to
prediction interval. In this case, we typically rely on LLN and the assumed distribution for the random
variable y. For example, if we know that y∼N(μ,σ2), then based on our sample we can construct a
prediction interval of the width 1−α:y∈(¯y+zα/2s,¯y+z1−α/2s),(6.5)where zα/2 is the z-statistics
(quantile of standard normal distribution) for the level α/2 and ¯y is the sample estimate of μ and s is
the sample estimate of σ. The graphical presentation of such interval can be shown as in Figure 6.11.

Figure 6.11: Artificial data, mean, confidence and prediction intervals.


Figure 6.11 shows the 95% prediction interval on two plots: the linear plot of values vs observations
id and on the histogram. In both cases the prediction intervals are the dashed orange lines, lying
further away from the sample mean (the solid blue line). The two solid red lines around the mean
represent the 95% confidence intervals for the mean (discussed in Section 6.4). As can be seen, the
prediction intervals show, where the 95% of observations are expected to lie. As a result, several
observations lie outside the bounds (given the sample of 100 observations, we would expect 5 of them
to lie outside, but this will vary from one sample to another). In contrast, confidence interval shows,
where the expectation of the population will lie in 95% of the cases, if the interval is constructed
many times for random samples.
The formula (6.5) relies on the assumption of normality. If it does not hold, the formula would
change. In a way, the prediction interval just comes to getting the quantiles of the assumed
distribution based on estimated parameters. In some cases, when some of the assumptions do not hold,
we might switch to more advanced methods for prediction interval construction.

You might also like