0% found this document useful (0 votes)

186 views14 pages

Sampling Distributions in Data Analytics

The document discusses sampling distributions and the central limit theorem. It begins by defining key terms like population, sample, statistic, and sampling distribution. It then provides examples to illustrate concepts like the mean and standard deviation of the sample mean. The main points are that as sample size increases, the sampling distribution of the mean takes on a bell shape even if the population is not normally distributed, and for samples of 30 or more the sample mean is approximately normally distributed. It concludes with an example problem to demonstrate applying the central limit theorem.

Uploaded by

Jewel Galvez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views14 pages

Sampling Distributions in Data Analytics

Uploaded by

Jewel Galvez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Romblon State University

College of Engineering and Technology

Civil Engineering Department

Engineering Data Analysis

Chapter 6
Sampling Distributions

Prepared by:

Engr. Jeffy Jones F. Fetalvero

Lecturer
ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Topic Overview

A statistic, such as the sample mean or the sample standard deviation, is a

number computed from a sample. Since a sample is random, every statistic is a
random variable: it varies from sample to sample in a way that cannot be
predicted with certainty. As a random variable it has a mean, a standard
deviation, and a probability distribution. The probability distribution of a statistic is
called its sampling distribution. Typically, sample statistics are not ends in
themselves, but are computed in order to estimate the corresponding population
parameters. This chapter introduces the concepts of the mean, the standard
deviation, and the sampling distribution of a sample statistic, with an emphasis on
the sample mean

Intended Learning Outcomes

At the end of this chapter, the students are expected to:

1. To become familiar with the concept of the probability distribution of the

sample mean.
2. To understand the meaning of the formulas for the mean and standard
deviation of the sample mean.
3. To learn what the sampling distribution of 𝑋̅ is when the sample size is large.
4. To learn what the sampling distribution of 𝑋̅ is when the population is normal.
5. To understand the meaning of the formulas for the mean and standard
deviation of the sample proportion.
6. To learn what the sampling distribution of 𝑝̂ is when the sample size is large.

ENGR. JEFFY JONES F. FETALVERO 2

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

6.1: The Mean and Standard Deviation of the Sample Mean

Suppose we wish to estimate the mean μ of a population. In actual practice we

would typically take just one sample. Imagine however that we take sample after
sample, all of the same size n, and compute the sample mean 𝑥̅ each time. The
sample mean x is a random variable: it varies from sample to sample in a way
that cannot be predicted with certainty. We will write 𝑋̅ when the sample mean
is thought of as a random variable, and write x for the values that it takes. The
random variable 𝑋̅ has a mean, denoted 𝜇𝑋̅ , and a standard deviation, denoted
𝜎𝑋̅ . Here is an example with such a small population and small sample size that
we can actually write down every single sample.

Example 6.1.1
A rowing team consists of four rowers who weigh 152, 156, 160, and 164 pounds.
Find all possible random samples with replacement of size two and compute the
sample mean for each one. Use them to find the probability distribution, the
mean, and the standard deviation of the sample mean 𝑋̅.

Solution
The following table shows all possible samples with replacement of size two, along
with the mean of each:

The table shows that there are seven possible values of the sample mean 𝑋̅. The
value 𝑥̅ = 152 happens only one way (the rower weighing 152 pounds must be
selected both times), as does the value 𝑥̅ = 164, but the other values happen
more than one way, hence are more likely to be observed than 152 and 164 are.
Since the 16 samples are equally likely, we obtain the probability distribution of
the sample mean just by counting:

ENGR. JEFFY JONES F. FETALVERO 3

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

For 𝜇𝑋̅ , we obtain

For 𝜎𝑋̅ , we first compute

which is 24,974, so that

The mean and standard deviation of the population {152,156,160,164} in the

example are μ = 158 and 𝜎 = √20. The mean of the sample mean 𝑋̅ that we have
just computed is exactly the mean of the population. The standard deviation of
the sample mean 𝑋̅ that we have just computed is the standard deviation of the
population divided by the square root of the sample size: √10 = √20/√2. These
relationships are not coincidences, but are illustrations of the following formulas.

Suppose random samples of size n are drawn from a population with mean μ and
standard deviation σ. The mean 𝜇𝑋̅ and standard deviation 𝜎𝑋̅ of the sample
mean 𝑋̅ satisfy

The first equation says that if we could take every possible sample from the
population and compute the corresponding sample mean, then those numbers
would center at the number we wish to estimate, the population mean μ. The
second equation says that averages computed from samples vary less than
individual measurements on the population do, and quantifies the relationship.

ENGR. JEFFY JONES F. FETALVERO 4

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Example 6.1.2
The mean and standard deviation of the tax value of all vehicles registered in a
certain state are μ = $13,525 and σ = $4,180. Suppose random samples of size 100
are drawn from the population of vehicles. What are the mean 𝜇𝑋̅ and standard
deviation 𝜎𝑋̅ of the sample mean 𝑋̅?

Solution
Since n = 100, the formulas yield

6.2: The Sampling Distribution of the Sample Mean

In Example 6.1.1, we constructed the probability distribution of the sample mean

for samples of size two drawn from the population of four rowers. The probability
distribution is:

Figure 6.2.1 shows a side-by-side comparison of a histogram for the original

population and a histogram for this distribution. Whereas the distribution of the
population is uniform, the sampling distribution of the mean has a shape
approaching the shape of the familiar bell curve. This phenomenon of the
sampling distribution of the mean taking on a bell shape even though the
population distribution is not bell-shaped happens in general. Here is a somewhat
more realistic example.

Figure 6.2.1. Distribution of a Population and a Sample Mean

ENGR. JEFFY JONES F. FETALVERO 5

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Suppose we take samples of size 1, 5, 10, or 20 from a population that consists

entirely of the numbers 0 and 1, half the population 0, half 1, so that the
population mean is 0.5. The sampling distributions are:

Histograms illustrating these distributions are shown in Figure 6.2.2.

Figure 6.2.2. Distributions of the Sample Mean

ENGR. JEFFY JONES F. FETALVERO 6

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

As n increases the sampling distribution of 𝑋̅ evolves in an interesting way: the

probabilities on the lower and the upper ends shrink and the probabilities in the
middle become larger in relation to them. If we were to continue to increase n
then the shape of the sampling distribution would become smoother and more
bell-shaped.

What we are seeing in these examples does not depend on the particular
population distributions involved. In general, one may start with any distribution
and the sampling distribution of the sample mean will increasingly resemble the
bell-shaped normal curve as the sample size increases. This is the content of the
Central Limit Theorem.

The Central Limit Theorem

For samples of size 30 or more, the sample mean is approximately normally
𝜎
distributed, with mean 𝜇𝑋̅ = 𝜇 and standard deviation 𝜎𝑋̅ = , where n is the
√𝑛
sample size. The larger the sample size, the better the approximation. The Central
Limit Theorem is illustrated for several common population distributions in Figure
6.2.3.

Figure 6.2.3. Distribution of Populations and Sample Means

ENGR. JEFFY JONES F. FETALVERO 7

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

The dashed vertical lines in the figures locate the population mean. Regardless of
the distribution of the population, as the sample size is increased the shape of the
sampling distribution of the sample mean becomes increasingly bell-shaped,
centered on the population mean. Typically, by the time the sample size is 30 the
distribution of the sample mean is practically the same as a normal distribution.

The importance of the Central Limit Theorem is that it allows us to make probability
statements about the sample mean, specifically in relation to its value in
comparison to the population mean, as we will see in the examples. But to use
the result properly we must first realize that there are two separate random
variables (and therefore two probability distributions) at play:

1. X, the measurement of a single element selected at random from the

population; the distribution of X is the distribution of the population, with mean the
population mean μ and standard deviation the population standard deviation σ;

2. 𝑋̅, the mean of the measurements in a sample of size n; the distribution of 𝑋̅ is

𝜎
its sampling distribution, with mean 𝜇𝑋̅ = 𝜇 and standard deviation 𝜎𝑋̅ = 𝑛 .
√

Example 6.2.1
Let 𝑋̅ be the mean of a random sample of size 50 drawn from a population with
mean 112 and standard deviation 40.

1. Find the mean and standard deviation of 𝑋̅.

2. Find the probability that 𝑋̅ assumes a value between 110 and 114.
3. Find the probability that 𝑋̅ assumes a value greater than 113.

Solution:
1. By the formulas in the previous section

2. Since the sample size is at least 30, the Central Limit Theorem applies: 𝑋̅ is
approximately normally distributed. We compute probabilities using normal
distribution in the usual way, just being careful to use 𝜎𝑋̅ and not σ when we
standardize:

ENGR. JEFFY JONES F. FETALVERO 8

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

3. Similarly

Note that if in the above example we had been asked to compute the probability
that the value of a single randomly selected element of the population exceeds
113, that is, to compute the number P(X > 113), we would not have been able to
do so, since we do not know the distribution of X, but only that its mean is 112 and
its standard deviation is 40. By contrast we could compute P(𝑋̅ > 113) even without
complete knowledge of the distribution of X because the Central Limit Theorem
guarantees that 𝑋̅ is approximately normal.

Normally Distributed Populations

The Central Limit Theorem says that no matter what the distribution of the
population is, as long as the sample is “large,” meaning of size 30 or more, the
sample mean is approximately normally distributed. If the population is normal to
begin with then the sample mean also has a normal distribution, regardless of the
sample size.

For samples of any size drawn from a normally distributed population, the sample
𝜎
mean is normally distributed, with mean 𝜇𝑋̅ = 𝜇 and standard deviation 𝜎𝑋̅ = 𝑛,
√
where n is the sample size.

The effect of increasing the sample size is shown in Figure 6.2.4.

ENGR. JEFFY JONES F. FETALVERO 9

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Figure 6.2.4: Distribution of Sample Means for a Normal Population

Example 6.2.2
An automobile battery manufacturer claims that its midgrade battery has a mean
life of 50 months with a standard deviation of 6 months. Suppose the distribution
of battery lives of this particular brand is approximately normal.

1. On the assumption that the manufacturer’s claims are true, find the probability
that a randomly selected battery of this type will last less than 48 months.
2. On the same assumption, find the probability that the mean of a random
sample of 36 such batteries will be less than 48 months.

Solution:
1. Since the population is known to have a normal distribution

ENGR. JEFFY JONES F. FETALVERO 10

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

𝜎 6
2. The sample mean has mean 𝜇𝑋̅ = 𝜇 = 50 and standard deviation 𝜎𝑋̅ = = =
√𝑛 √36
1. Thus

6.3: The Sample Proportion

Often sampling is done in order to estimate the proportion of a population that

has a specific characteristic, such as the proportion of all items coming off an
assembly line that are defective or the proportion of all people entering a retail
store who make a purchase before leaving. The population proportion is denoted
p and the sample proportion is denoted 𝑝̂ . Thus, if in reality 43% of people entering
a store make a purchase before leaving,

if in a sample of 200 people entering the store, 78 make a purchase,

The sample proportion is a random variable: it varies from sample to sample in a

way that cannot be predicted with certainty. Viewed as a random variable it will
be written 𝑃̂. It has a mean 𝜇𝑃̂ and a standard deviation 𝜎𝑃̂ . Here are formulas for
their values.

Suppose random samples of size n are drawn from a population in which the
proportion with a characteristic of interest is p. The mean 𝜇𝑃̂ and standard
deviation 𝜎𝑃̂ of the sample proportion 𝑃̂ satisfy

ENGR. JEFFY JONES F. FETALVERO 11

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

where q = 1 − p.

The Central Limit Theorem has an analogue for the population proportion 𝑃̂ . To
see how, imagine that every element of the population that has the characteristic
of interest is labeled with a 1, and that every element that does not is labeled with
a 0. This gives a numerical population consisting entirely of zeros and ones. Clearly
the proportion of the population with the special characteristic is the proportion
of the numerical population that are ones; in symbols,

But of course, the sum of all the zeros and ones is simply the number of ones, so
the mean μ of the numerical population is

Thus, the population proportion p is the same as the mean μ of the corresponding
population of zeros and ones. In the same way the sample proportion 𝑝̂ is the
same as the sample mean 𝑥̅ . Thus, the Central Limit Theorem applies to 𝑝̂ .
However, the condition that the sample be large is a little more complicated than
just being of size at least 30.

The Sampling Distribution of the Sample Proportion

For large samples, the sample proportion is approximately normally distributed,
with mean and standard deviation

A sample is large if the interval lies wholly within the interval

[0,1].

In actual practice p is not known, hence neither is 𝜎𝑃̂ . In that case in order to
check that the sample is sufficiently large we substitute the known quantity 𝑝̂ for
p. This means checking that the interval

ENGR. JEFFY JONES F. FETALVERO 12

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

lies wholly within the interval [0,1]. This is illustrated in the examples.

Figure 6.3.1 shows that when p = 0.1, a sample of size 15 is too small but a sample
of size 100 is acceptable.

Figure 6.3.1: Distribution of Sample Proportions

Figure 6.3.2 shows that when p = 0.5 a sample of size 15 is acceptable.

Figure 6.3.2: Distribution of Sample Proportions for p = 0.5 and n = 15

Example 6.3.1
Suppose that in a population of voters in a certain region 38% are in favor of
particular bond issue. Nine hundred randomly selected voters are asked if they
favor the bond issue.

1. Verify that the sample proportion 𝑝̂ computed from samples of size 900 meets
the condition that its sampling distribution be approximately normal.
2. Find the probability that the sample proportion computed from a sample of
size 900 will be within 5 percentage points of the true population proportion.

ENGR. JEFFY JONES F. FETALVERO 13

ENGINEERING DATA ANALYSIS SAMPLING DISTRIBUTIONS

Solution:
1. The information given is that p = 0.38, hence q = 1 – p = 0.62. First, we use the
formulas to compute the mean and standard deviation of 𝑝̂ :

Then 3𝜎𝑃̂ = 3(0.01618) = 0.04854 ≈ 0.05 so

which lies wholly within the interval [0,1], so it is safe to assume that 𝑝̂ is
approximately normally distributed.

2. To be within 5 percentage points of the true population proportion 0.38 means

to be between 0.38 − 0.05 = 0.33 and 0.38 + 0.05 = 0.43. Thus

ENGR. JEFFY JONES F. FETALVERO 14

Common questions

To determine the probability of a sample mean falling within a specific range for a large sample size, apply the Central Limit Theorem to approximate the sample mean distribution as normal. Use the population mean (μ) and calculate the standard deviation of the sample mean (σ/√n). Then, standardize the range values and use z-scores to find the corresponding probabilities from the standard normal distribution table .

The practical implications of using the Central Limit Theorem in engineering statistical analysis are significant as it simplifies the analysis of complex systems by enabling the application of normal distribution tools to sample data. It allows engineers to derive meaningful insights from limited sample sizes by assuming approximate normality in the estimation of population parameters. This is particularly useful when working with unknown or difficult to access population distributions, providing a foundational tool for risk assessment, quality control, and decision making in engineering .

The sampling distribution of a sample proportion can be considered approximately normal if the sample size is large enough for the interval (p ± 3σₚ̂) to lie wholly within [0,1]. Unlike the condition for the sample mean, which generally requires the sample size to be at least 30, the acceptability of the sample size for proportions additionally depends on the actual proportion (p) and adjusts based on estimates if p is unknown. This makes the condition for proportions more complex, needing proportion-specific computation for validity .

The Central Limit Theorem facilitates statistical inference by ensuring that the sampling distribution of the sample mean approaches normality as the sample size grows. This allows for the use of normal distribution theory to create probabilities, confidence intervals, and hypothesis tests based on sample means, even when the population distribution is unknown. It is essential for validating sample-based methods in inferential statistics, as it justifies the application of normal-based techniques under broad conditions .

The distribution of individual population elements (X) is the actual distribution of the entire population, which can take any form and is characterized by its own mean (μ) and standard deviation (σ). In contrast, the sampling distribution of the sample mean becomes increasingly normal as sample size increases, due to the Central Limit Theorem. This sampling distribution is centered on the population mean (μ) and has a smaller standard deviation (σ/√n), resulting in less variability than the original population distribution .

Increasing the sample size enhances the certainty of statistical estimates derived from sample proportions by reducing the standard deviation of the sample proportion. This results in a narrower confidence interval and an increase in precision of the estimate, making the sample statistics more reliable for inferring the true population proportion. A larger sample size also ensures the condition for normal approximation of the sample proportion is met .

The standard deviation of a population (σ) measures the spread of individual data points around the population mean. Conversely, the standard deviation of the sample mean (σₓ̄) is calculated as σ/√n, where σ is the population standard deviation and n is the sample size. This formula reflects how sample means tend to vary less than individual data points, decreasing as the sample size increases, according to the Central Limit Theorem .

As the sample size increases, the sampling distribution of the sample mean becomes more similar to a normal distribution, regardless of the population's original distribution. The mean of the sampling distribution equals the population mean (μ), and its standard deviation is the population standard deviation (σ) divided by the square root of the sample size (√n). This convergence towards normality with increased sample size is a fundamental aspect of the Central Limit Theorem .

By comparing histograms of a population distribution and a sample mean distribution, one can observe that while the population may not have a bell-shaped distribution, the sample mean distribution tends to approximate a normal distribution as the sample size increases. This effect illustrates the Central Limit Theorem and highlights that the sample mean distribution becomes smoother and more bell-shaped with increasing sample size .

The Central Limit Theorem allows us to use the normal distribution to estimate probabilities about sample means because it ensures that, for sufficiently large sample sizes (generally n ≥ 30), the distribution of the sample mean will approximate a normal distribution, regardless of the original population distribution. This property enables probability estimations of the sample mean even when the distribution of individual measurements (X) is unknown .

CEP233 M09 Meridian
No ratings yet
CEP233 M09 Meridian
11 pages
Statistics and Data Analysis Overview
No ratings yet
Statistics and Data Analysis Overview
39 pages
Continuous Random Variables Overview
No ratings yet
Continuous Random Variables Overview
17 pages
Solutions of Differential Equations
No ratings yet
Solutions of Differential Equations
7 pages
Instantaneous Center in Plane Motion
No ratings yet
Instantaneous Center in Plane Motion
23 pages
Moment of Inertia Calculations
No ratings yet
Moment of Inertia Calculations
4 pages
Chapter 2 2-002 (Basis)
No ratings yet
Chapter 2 2-002 (Basis)
2 pages
CIE 115 Civil Engineering Numerical Solutions Guide
No ratings yet
CIE 115 Civil Engineering Numerical Solutions Guide
88 pages
Engineering Math V2 by Gillesania1 PDF
No ratings yet
Engineering Math V2 by Gillesania1 PDF
238 pages
Dynamics of Rigid Bodies Overview
No ratings yet
Dynamics of Rigid Bodies Overview
11 pages
Statistical Intervals in Engineering Analysis
No ratings yet
Statistical Intervals in Engineering Analysis
113 pages
Binomial Distribution Problems and Solutions
No ratings yet
Binomial Distribution Problems and Solutions
2 pages
Highway and Railroad Engineering Overview
No ratings yet
Highway and Railroad Engineering Overview
20 pages
Numerical Solutions for CE Quiz 1
No ratings yet
Numerical Solutions for CE Quiz 1
19 pages
Linear Functions of Random Variables
No ratings yet
Linear Functions of Random Variables
41 pages
Engineering Data Analysis Problem Set 1
0% (1)
Engineering Data Analysis Problem Set 1
2 pages
Importance of Engineering Economy in Civil Engineering
No ratings yet
Importance of Engineering Economy in Civil Engineering
29 pages
Z-Test Fundamentals and Applications
No ratings yet
Z-Test Fundamentals and Applications
4 pages
ES 209 Engineering Data Analysis Quiz
No ratings yet
ES 209 Engineering Data Analysis Quiz
3 pages
Joint Probability Distributions: Chapter Outline
No ratings yet
Joint Probability Distributions: Chapter Outline
12 pages
Differential Equations Exam Review
No ratings yet
Differential Equations Exam Review
8 pages
Sinusoidal Voltage and AC Circuit Analysis
No ratings yet
Sinusoidal Voltage and AC Circuit Analysis
4 pages
Differential Equations Examples and Problems
No ratings yet
Differential Equations Examples and Problems
9 pages
First Order Linear Differential Equations
No ratings yet
First Order Linear Differential Equations
8 pages
Road Patterns in Highway Engineering
No ratings yet
Road Patterns in Highway Engineering
51 pages
Incremental Search Method in MATLAB
No ratings yet
Incremental Search Method in MATLAB
7 pages
Velocity of Projected Particle A
No ratings yet
Velocity of Projected Particle A
4 pages
Geometrical Design of Horizontal Curves
No ratings yet
Geometrical Design of Horizontal Curves
90 pages
Engineering Data Analysis Overview
No ratings yet
Engineering Data Analysis Overview
38 pages
Statics Problems with Solutions
50% (2)
Statics Problems with Solutions
16 pages
Civil Engineering Laws and Ethics Syllabus
No ratings yet
Civil Engineering Laws and Ethics Syllabus
24 pages
Understanding Stadia Surveying Methods
No ratings yet
Understanding Stadia Surveying Methods
6 pages
Differential Equations of Curves
No ratings yet
Differential Equations of Curves
14 pages
Engineering Data Analysis and Statistics
No ratings yet
Engineering Data Analysis and Statistics
32 pages
Simple Curves - Surveying and Transportation Engineering
No ratings yet
Simple Curves - Surveying and Transportation Engineering
7 pages
Rectilinear Motion Overview and Examples
No ratings yet
Rectilinear Motion Overview and Examples
18 pages
Centroid and Moment of Inertia Calculations
No ratings yet
Centroid and Moment of Inertia Calculations
9 pages
CMT Properties and Testing Methods
No ratings yet
CMT Properties and Testing Methods
8 pages
Understanding Shearing Stress in Mechanics
No ratings yet
Understanding Shearing Stress in Mechanics
15 pages
Engineering Data Analysis Course Modules
No ratings yet
Engineering Data Analysis Course Modules
19 pages
Engineering Data Analysis Problem Set
No ratings yet
Engineering Data Analysis Problem Set
8 pages
ECE 069 Engineering Data Analysis Quiz
No ratings yet
ECE 069 Engineering Data Analysis Quiz
4 pages
Linear Interpolation Examples and Methods
No ratings yet
Linear Interpolation Examples and Methods
10 pages
Dota 2021 AniMajor Statistics Analysis
No ratings yet
Dota 2021 AniMajor Statistics Analysis
2 pages
Engineering Economy Review: Interest & Annuities
No ratings yet
Engineering Economy Review: Interest & Annuities
2 pages
Symmetrical Parabolic Curve Analysis
0% (1)
Symmetrical Parabolic Curve Analysis
13 pages
Selecting Lightest W-Shape Beams
No ratings yet
Selecting Lightest W-Shape Beams
20 pages
Understanding Torsion in Engineering
No ratings yet
Understanding Torsion in Engineering
16 pages
Greenberg and Greenshields Traffic Models
No ratings yet
Greenberg and Greenshields Traffic Models
8 pages
Radioactive Decay and Its Applications
No ratings yet
Radioactive Decay and Its Applications
4 pages
Physics of Motion: Key Concepts and Problems
No ratings yet
Physics of Motion: Key Concepts and Problems
21 pages
Methods for Measuring Horizontal Distances
No ratings yet
Methods for Measuring Horizontal Distances
22 pages
Hydrology Assignment: Virgin Flow Analysis
No ratings yet
Hydrology Assignment: Virgin Flow Analysis
1 page
Probability in Engineering Data Analysis
No ratings yet
Probability in Engineering Data Analysis
16 pages
Rounding Numbers to Significant Figures
No ratings yet
Rounding Numbers to Significant Figures
38 pages
Joint Probability Distributions Overview
No ratings yet
Joint Probability Distributions Overview
33 pages
Statistical Analysis Methods Overview
No ratings yet
Statistical Analysis Methods Overview
11 pages
Civil Engineering Sampling Methods
No ratings yet
Civil Engineering Sampling Methods
30 pages
Sampling Distributions and Estimation
No ratings yet
Sampling Distributions and Estimation
13 pages
Understanding Sampling Distributions
No ratings yet
Understanding Sampling Distributions
7 pages
Bias-Variance Decomposition Explained
No ratings yet
Bias-Variance Decomposition Explained
55 pages
Backtesting Investment Strategies Guide
No ratings yet
Backtesting Investment Strategies Guide
30 pages
Summer 2012 ST107 Statistics Exam Paper
No ratings yet
Summer 2012 ST107 Statistics Exam Paper
12 pages
Bentler and Bonett, 1980
No ratings yet
Bentler and Bonett, 1980
19 pages
Moving Human Detection and Tracking From Thermal Video Through Intelligent Surveillance System For Smart Applications
No ratings yet
Moving Human Detection and Tracking From Thermal Video Through Intelligent Surveillance System For Smart Applications
21 pages
NIST Uncertainty Machine User Manual
No ratings yet
NIST Uncertainty Machine User Manual
19 pages
SPC vs. Acceptance Sampling Explained
No ratings yet
SPC vs. Acceptance Sampling Explained
8 pages
JMeter Timers: Optimize Request Delays
No ratings yet
JMeter Timers: Optimize Request Delays
36 pages
Normal Distribution Problem Solving Guide
No ratings yet
Normal Distribution Problem Solving Guide
29 pages
Asymptotic Normality of OLS Estimators
No ratings yet
Asymptotic Normality of OLS Estimators
10 pages
Benford's Law: Impact of Scatter & Regularity
No ratings yet
Benford's Law: Impact of Scatter & Regularity
18 pages
FRK: R Package for Spatial Prediction
No ratings yet
FRK: R Package for Spatial Prediction
41 pages
Homogeneity and Normality Tests in Statistics
No ratings yet
Homogeneity and Normality Tests in Statistics
6 pages
When to Use Mean, Median, or Mode
No ratings yet
When to Use Mean, Median, or Mode
26 pages
Introduction to Elementary Statistics
No ratings yet
Introduction to Elementary Statistics
60 pages
Statistical Analysis of Employee Data
No ratings yet
Statistical Analysis of Employee Data
9 pages
Buonopane e Schafer (2006) - Reliability Hot-Rolled
No ratings yet
Buonopane e Schafer (2006) - Reliability Hot-Rolled
10 pages
Cognitive Radio Capacity in Fading Environments
No ratings yet
Cognitive Radio Capacity in Fading Environments
9 pages
Measurement Uncertainty in Conformity Assessment
No ratings yet
Measurement Uncertainty in Conformity Assessment
14 pages
Understanding Normal Distribution
No ratings yet
Understanding Normal Distribution
31 pages
Normal Distribution and Hypothesis Testing
No ratings yet
Normal Distribution and Hypothesis Testing
5 pages
7 Anthropometry and Workplace Design
No ratings yet
7 Anthropometry and Workplace Design
22 pages
CA Foundation Mock Test Paper 3
No ratings yet
CA Foundation Mock Test Paper 3
19 pages
Pavement Condition and Roughness Index Analysis
No ratings yet
Pavement Condition and Roughness Index Analysis
9 pages
Item Response Theory 1st Edition R. Darrell Bock Completed Chapters
100% (3)
Item Response Theory 1st Edition R. Darrell Bock Completed Chapters
176 pages
Understanding Normal Distribution Properties
100% (1)
Understanding Normal Distribution Properties
4 pages
Probability and Statistics Exam Review
No ratings yet
Probability and Statistics Exam Review
5 pages
Normal Distributions, Revisited: Section 5A
No ratings yet
Normal Distributions, Revisited: Section 5A
15 pages
Neural Network Model for Residual Life
No ratings yet
Neural Network Model for Residual Life
11 pages
SEM Method Comparison: CB-SEM vs PLS-SEM
No ratings yet
SEM Method Comparison: CB-SEM vs PLS-SEM
18 pages

Sampling Distributions in Data Analytics

Uploaded by

Sampling Distributions in Data Analytics

Uploaded by

Romblon State University

College of Engineering and Technology

Engineering Data Analysis

Engr. Jeffy Jones F. Fetalvero

A statistic, such as the sample mean or the sample standard deviation, is a

Intended Learning Outcomes

At the end of this chapter, the students are expected to:

1. To become familiar with the concept of the probability distribution of the

ENGR. JEFFY JONES F. FETALVERO 2

6.1: The Mean and Standard Deviation of the Sample Mean

Suppose we wish to estimate the mean μ of a population. In actual practice we

ENGR. JEFFY JONES F. FETALVERO 3

For 𝜇𝑋̅ , we obtain

For 𝜎𝑋̅ , we first compute

which is 24,974, so that

The mean and standard deviation of the population {152,156,160,164} in the

ENGR. JEFFY JONES F. FETALVERO 4

6.2: The Sampling Distribution of the Sample Mean

In Example 6.1.1, we constructed the probability distribution of the sample mean

Figure 6.2.1 shows a side-by-side comparison of a histogram for the original

Figure 6.2.1. Distribution of a Population and a Sample Mean

ENGR. JEFFY JONES F. FETALVERO 5

Suppose we take samples of size 1, 5, 10, or 20 from a population that consists

Histograms illustrating these distributions are shown in Figure 6.2.2.

Figure 6.2.2. Distributions of the Sample Mean

ENGR. JEFFY JONES F. FETALVERO 6

As n increases the sampling distribution of 𝑋̅ evolves in an interesting way: the

The Central Limit Theorem

Figure 6.2.3. Distribution of Populations and Sample Means

ENGR. JEFFY JONES F. FETALVERO 7

1. X, the measurement of a single element selected at random from the

2. 𝑋̅, the mean of the measurements in a sample of size n; the distribution of 𝑋̅ is

1. Find the mean and standard deviation of 𝑋̅.

ENGR. JEFFY JONES F. FETALVERO 8

Normally Distributed Populations

The effect of increasing the sample size is shown in Figure 6.2.4.

ENGR. JEFFY JONES F. FETALVERO 9

Figure 6.2.4: Distribution of Sample Means for a Normal Population

ENGR. JEFFY JONES F. FETALVERO 10

6.3: The Sample Proportion

Often sampling is done in order to estimate the proportion of a population that

if in a sample of 200 people entering the store, 78 make a purchase,

The sample proportion is a random variable: it varies from sample to sample in a

ENGR. JEFFY JONES F. FETALVERO 11

The Sampling Distribution of the Sample Proportion

A sample is large if the interval lies wholly within the interval

ENGR. JEFFY JONES F. FETALVERO 12

Figure 6.3.1: Distribution of Sample Proportions

Figure 6.3.2 shows that when p = 0.5 a sample of size 15 is acceptable.

Figure 6.3.2: Distribution of Sample Proportions for p = 0.5 and n = 15

ENGR. JEFFY JONES F. FETALVERO 13

Then 3𝜎𝑃̂ = 3(0.01618) = 0.04854 ≈ 0.05 so

2. To be within 5 percentage points of the true population proportion 0.38 means

ENGR. JEFFY JONES F. FETALVERO 14

Common questions

How would you determine the probability of a sample mean falling within a specific range, given a large sample size, when the original population parameters are known?

How would you determine the probability of a sample mean falling within a specific range, given a large sample size, when the original population parameters are known?

Discuss the practical implications of using the Central Limit Theorem in engineering statistical analysis, particularly when considering sample sizes and population unknowns.

Discuss the practical implications of using the Central Limit Theorem in engineering statistical analysis, particularly when considering sample sizes and population unknowns.

Under what conditions can the sampling distribution of a sample proportion be considered approximately normal, and how is this conditions' complexity different from the sample mean's conditions?

Under what conditions can the sampling distribution of a sample proportion be considered approximately normal, and how is this conditions' complexity different from the sample mean's conditions?

In what ways does the Central Limit Theorem facilitate statistical inference about populations based on sample statistics?

In what ways does the Central Limit Theorem facilitate statistical inference about populations based on sample statistics?

What differences emerge between the distribution of individual population elements and the sampling distribution of the sample mean as explained by the Central Limit Theorem?

What differences emerge between the distribution of individual population elements and the sampling distribution of the sample mean as explained by the Central Limit Theorem?

How does increasing the sample size influence the certainty of statistical estimates derived from sample proportions?

How does increasing the sample size influence the certainty of statistical estimates derived from sample proportions?

Explain the difference between the computation of a population's standard deviation and the standard deviation of the sample mean, using the formulas highlighted in the text.

Explain the difference between the computation of a population's standard deviation and the standard deviation of the sample mean, using the formulas highlighted in the text.

How does the sampling distribution of a sample mean behave as the sample size increases, according to the Central Limit Theorem?

How does the sampling distribution of a sample mean behave as the sample size increases, according to the Central Limit Theorem?

What insights can be gained by comparing a histogram of a population distribution to a histogram of a sample mean distribution?

What insights can be gained by comparing a histogram of a population distribution to a histogram of a sample mean distribution?

In the context of the Central Limit Theorem, why is it possible to use the normal distribution to estimate probabilities about sample means even when the population distribution is unknown?

In the context of the Central Limit Theorem, why is it possible to use the normal distribution to estimate probabilities about sample means even when the population distribution is unknown?

You might also like