0% found this document useful (0 votes)
9 views37 pages

Understanding Probability Distributions

Uploaded by

yuqirinsong23
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views37 pages

Understanding Probability Distributions

Uploaded by

yuqirinsong23
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

PROBABILITY DISTRIBUTIONS

Understanding the probability distribution tool is crucial as we look at


occurrences that are likely to occur. A probability distribution provides the
entire range of values that can cocur given an experiment. It practically lists
all the outcomes of an experiment vis-à-vis the probability values associated
with each outcome. As an analogy, a probability distribution is similar to a
relative frequency distribution. However, instead of tabulating the frequency
of what has already existed, we tabulate the frequency of the likelihood of
some future event.

Random Variables

Before we proceed with further discussion on probability distributions, you


need to understand the concept of a random variable. We have been
implicitly applying this concept since Chapter 1, but this section is the best
time to formally introduce this terminology. A random variable is any
variable, denoted by x, resulting from a given experiment or observation
that, by chance, can take different values. It is called a random variable
because, in any experiment of chance, outcomes happen randomly (note:
holding deterministic input constant).

For instance, let rolling a single six-sided die be the experiment. As you roll
the die, any of the six possible outcomes can happen (i.e., 1, 2, 3, 4, 5, or 6).
There are experiments that result in quantitative outcomes (e.g., age, height,
weight, value of investments, price of stock, and number of employees,
among others) or qualitative outcomes (e.g., marital status, religion, gender
preference, and color choice, among others).

For example, suppose our experiment is determining the number of tails that
will show face up on three coin tosses. Let x be our random variable that can
assume these possible outcomes: zero tails, one tail, two tails, and three
tails. We find the probability distribution for the number of tails. Using the
multiplication formula (Equation 3.7), we know that there are eight possible
outcomes determined by (2)(2)(2), which we list in Table 3.3.
A discrete random variable, as defined in Chapter 1, is a random variable
that can only take particular, clearly distinct values. For example, let x be a
random variable indicating the degree of concurrence, from 1 to 5, with 5
being the highest, that Korean boy groups are effective product endorsers in
the Philippines. Such a random variable will generate a discrete probability
distribution.

Likewise, a continuous random variable is a random variable that can


assume, with certain limitations, one of an infinitely large number of values.
For example, let x be the number of hours you devote to relaxation per day.
As such, any value between 0 and 24 can reasonably occur. Such a random
variable will generate a continuous probability distribution.

Hence, you need to identify whether your random variable is discrete or


continuous to know which probability distribution to use.

DISCRETE PROBABILITY DISTRIBUTION

Mean, Variance, and Standard Deviation of a Discrete Probability Distribution

Let us illustrate how a discrete probability distribution is analyzed. Let us say


that a media research company conducted a study on Filipinos attitudes
toward Korean boy groups. Are you in agreement that Korean boy groups are
effective product endorsers in the Philippines? You were given a Likert scale
to indicate your answer. It has a scale of t to 5 with 5 being the highest
(strongly agree) and 1 being the lowest (strongly disagree).

Hance, let x be a random variable indicating the degree of concurrence that


Korean boy groups are effective product endorsers in the Philippines. After
the survey period, suppose Table 3.5 tabulates the survey results.

Table 3,5. Survey Results


Outcome

Random Variable, x

Number of Respondents,

Probability. Ptx)

40

Strongly Agree

Agree

40/100-040

30

30/100-030

15
Neutral

15/100=0.15

10

Disagree

10/100-0.10

Strongly Disagree

Total

5/100-0.05

100

100/100=1,00

From Table 3.5, we can describe that there is a 40 percent chance that a
randomly selected individual will strongly agree that Korean boy groups are
effective product endorsers in the Philippines. Meanwhile, there is a 5
percent chance that a randomly selected individual will strongly disagree.
We can further our description of the survey results by determining the
mean, variance, and standard deviation of our probability distribution.

A probability distribution’s mean indicates where the data are most


concentrated, while its variance and standard deviation show where the data
are spread out. We use the same symbols for the mean, variance, and
standard deviation. By now, you should be familiar with such symbols
whenever you see them, particularly in a statistical context.

Similar to our discussion in Chapter 2, the mean is a typical value


representative of the central location of a probability distribution. It is also
referred to as the expected value or the long-run average of the random
variable. It is represented by Equation 3.10.

Equation 3.10

Μ= xP(x) = x,P(x) +x,P(x) + … + x,P(x) 1

Mean of a probability distribution

That is the mean is the sum of the products of the random variable’s value
and t probability of occurrence fle, weighted mean). Do not simply get the
arithmetic mean l the answers of your respondental in computing the mean
of the probability distributive mpresented in Table 3.5. see the accompanying
spreadsheet by scanning the QR code an or using the provided URL for
Chapter 3 on page 322 (refer to Sheet 3.2. Discrete PD): You can compute
the mean manually or using Excel by simply using the formula” sumproduct
Refer to the Excel manual for the full syntax.

We have computed a mean value of 3.9. This means that, typically, you can
expecta the mean value of 3.9 should not be interpreted literally because
there is no 3.9 on the scale response from an individual that is within neutral
and agree or most likely agree. Of course, given the survey data and the
probability distribution that you have. That is why the mean is of 1 to 56…
discrete). It serves as direction on what response you can reasonably expe
also known as the expected value.

We have established that the measure of location should be supported by


the measure Equation 3.11 represents the formula to compute the variance.
As we learned in Chapter 2. Of variation. Hence, it is also necessary to
determine the variance and standard deviation the standard deviation is
equal to the variance squared.

Equation 3.11

= [(χ- μ) P(x)]

=(x-4) P(x)] 1

Variance of a probability distribution

=(x-4)P(x)+(x)PP(x)++ (x – μ)P(x)

That is, the variance is the sum of the products of the random variable’s
squared difference from the mean and its probability of occurrence. See the
accompanying spreadsheet for information on calculating the variance of the
probability distribution shown in Table 3.5 (refer to Sheet 3.2). For the
required columns, the Excel formula =sumproduct() was also used.

The standard deviation is more intuitive to interpret. The standard deviation


value of 1.18 that we have solved is often represented as a measure of risk.
That is, compared to the standard deviation of another probability
distribution from, say, another sample, a larger standard deviation indicates
a greater probability that the random variable x is different from the mean or
expected value. Suppose the same survey was given to non-Filipinos and had
a standard deviation of 1.9. We can conclude that there is more variability in
the responses of non-Filipinos relative to Filipinos.

Binomial Probability Distribution

One specific type of discrete probability distribution is the binomial


probability distribution. A binomial probability distribution is applied to a
random variable that satisfies the following characteristics:

The outcome of each trial in an experiment is either one of two mutually


exclusive categories (e.g., yes or no, head or tail, true or false, success or
failure). Hence, the prefix “bi-“ in “binomial indicates two. When there are
only two possible outcomes. We call it the Bernoulli process

Random variable counts the number of successes in a fixed number of trials.

Probability values for success and failure remain the same for each trial.

Trials are statistically independent.

To construct a particular binomial probability distribution, you need


information on the number of trials and the probability of success on each
trial. The binomial probability distribution is computed by Equation 3.12.

Equation 3.12

P(x) = binomial(n,x) * pi ^ x * (1 – pi) ^ (n – x) = (n!)/(x!(n – x)!) * pi ^ x * (1


– pi) ^ (n – x)
Binomial probability

Where: (2) is the number of combination of n trials taken x random variable


at a time;

N is the number of trials;

X is the random variable defined as the number of successes; and

Tis the Greek lowercase letter “pi” that represents the probability of success
on each trial. It also denotes a binomial population parameter. Do not
confuse it with the mathematical constant 3.141593.

For example, let us say we want to calculate the likelihood that four heads
will land on the coin after five flips. This results in n = 5 tosses, x = 4 heads,
P(H) = 0.5 and P(7) =1-P(H) = 0.5 Applying Equation 3.12, we have:

P(4 heads in 5 tosses) (1)=0.5°(1-0.5)=0.15625. Therefore, the probability of


getting four heads in five tosses of a coin is 15.63 percent.

Let us do a more comprehensive example. Suppose an airline company has


five flights from Clark International Airport (CRK) to Mactan-Cebu
International Airport (CEB) daily. Suppose further that the probability that any
of its flights will arrive late is 0.2. Calculate the likelihood that there will be
no late flights, one late flight, and a maximum of three late flights today.

Let pi = 0.2 n = 5 and x be the random variable that denotes the number of
successes; in our case, a “success” is a late plane, using Equation 3.12.
Therefore, if there are no latecomers, x = 0
To solve the probability that no flight is late today:

Consequently, there is a 32.77 percent likelihood that no also be solved


using the Excel formula =[Link](x, η, π, FALSE). FALSE is for the
probability mass function (i.e., exactly x). TRUE is for the cumulative
distribution function (i.e., at most x). Refer to the accompanying spreadsheet
by scanning the QR code and/or using the provided URL for Chapter 3 on
page 322 (see Sheet 3.3. Binomial PD). P(0) = (n!)/(x ^ (9 * (n – x)!)) * pi ^ x
* (1 – pi) ^ (n – x) = (5!)/(0!(5 + 0)!) * 0.2 ^ 0 * (1 – 0.2) ^ (5 + 0) = 0.3277

To solve the probability that exactly one flight is late today:

There is a 40.96 percent chance that exactly one flight is late today. This can
using the Excel formula=binom distix FALSE FALSE is for the probability mass
function (i.e., exactly TRUE is for the cumulative distribution function (Le, at
most x). Refer to the accompanying spreadsheet (see Sheet 3.3)

We must calculate the probabilities for all xs and tabulate them into the
binomial probability distribution for n = 5 and pi = 0.2 as shown in Table 3.6.
in order to determing the likelihood that no more than three flights will be
late today. We employed the binom Sheet 3.3). spreadsheet (ses

Table 3.6. Binomial Probability Distribution for n = 5 pi = 0.2

Number of Lata Foghte.

Probabilly (PMP),

2
3

Px

Probability (CDF), Cumulative Pix)

0.32788

0.32768

0.40960

0,73728

0.20480

0.94208

0.05120
0.99328

0.00640

0.99968

0.00032

1.00000

We add the probabilities for zero, one, two, and three late flights, or A(x <=
3) = P(3) + P(2) + P(1) + P(0) = 0.99328 or simply refer to the cumulative
probability distribution at x = 3 Thus, percent likelihood that at most three
flights will arrive late.

We can further our description of the survey results by determining the


mean, variance, and standard deviation of our binomial probability
distribution.

The mean and variance of a binomial probability distribution are given by


Equations 3.13 and 3.14. The standard deviation is simply the square root of
the variance.

Equation 3.13

Μηπ

Mean of a binomial probability distribution


Equation 3.14

Sigma ^ 2 = n*pi(1 – pi)

Variance of a binomial probability distribution

Hence, from our example of late flights, the mean of our binomial probability
distribution is 1.0, found by (5)(0.2), and the variance is 0.8, found by (5)
(0.2)(1 – 0.2) Of course, you can verify these computations using the
formulas for mean and variance in Equations 3.10 and 3.11, respectively.
Refer to the accompanying spreadsheet (see Sheet 3.3). The same manner
of interpreting the statistic holds.

Poisson Probability Distribution

Another specific type of discrete probability distribution is the Poisson


probability distribution. The Poisson probability distribution describes the
number of times an even happens during a specified interval (i.e., a random
variable). It is based on two presumptions that intervals are independent and
that probability is related to the length of arrival. Fo instance, using this
probability distribution, we can describe situations in which clients arrive
independently in a bank during a certain time interval, and the number of
arrivals depends. An the length of the time interval.

The Poisson probability distribution is mathematically represented by


Equation 3.15.

Equation 3.15

P(x) = mu^ * theta^ - nu x!


Poisson probability distribution

Where: uis Greek lowercase letter “mu” representing the mean number of
occurrences or successes in a particular interval;

E is the Euler’s constant of 2.71828, which is the base of the Naperian


logarithmic system;

X is the number of occurrences or successes; and

P(x) is the probability for a specified value of x.

Note that the variance of the Poisson probability distribution is also equal to
its mean (Lind et al. 2006, 174). For example, suppose a loan officer at a
bank calculates that 0.025 of applicants will not be able to pay back their
installment loans based on his/her years of expertise. Last month, he/she
approved 40 loan applications. Determine the probability that there will be
three defaulted loans and that at least three loans will be defaulted.

From this example, the mean and the variance for the number of loans
defaulted are 1, found by mu = nn = 1 where n = 49 and pi = 0.025

As such, P(3)= H^ prime sigma^ * x! = (H’ * sigma’)/(3!) = 0.0613 . Thus,


there is a 6.13 percent likelihood that three loans will be defaulted. This can
also be solved using the Excel formula poisson. Dist(x, μ. FALSE). Refer to the
accompanying spreadsheet by scanning the QR code and/or using the
provided URL for Chapter 3 on page 322 (see Sheet 3.4. Poisson PD).

To solve for the probability of at least three defaulted loans: = 1 – 0.1839 –


0.36788 – 0.36788 = 0.08034 P(x >= 3) = 1 – P(2) – P(1) – P(0) = 1 – (t ^ 2 *
theta ^ 4)/(2!) – (r ^ 2 * e ^ 4)/(1!) – (t ^ 2 * theta ^ 4)/(0!) = 8.034%
Thus, there is an 8.03 percent likelihood that at least three loans will be
defaulted.

Think of another illustration. Assume that two vehicles arrive at the Mexico
Exit of the North Luzon Expressway (NLEX) every minute. The arrival
distribution is assumed to resemble a Poisson distribution. Find the
probability that no automobiles will come at a specific minute and the
probability that at least one car will arrive at a specific minute.

To determine the likelihood that no cars will arrive during a particular minute,
we know that mu = 2 Hence:

= (1(0.1353))/1 = 0.1353 \\ =13.53\% aligned P(0) = (2 ^ 0 * theta ^ -


2)/(0!)

To determine the likelihood that at least one car arrives during a particular
minute:

=1 200-2 0! 1-0.1353 = 0.8647 P(x≥1) 1-P(0) = 1(0.1353) 2 = 86.47%

This can also be solved using the Excel formula =[Link](x, μ, FALSE).

Note that the Poisson probability distribution is always positively skewed


(i.e., skewed to the right). Also, it has no specific upper limit (e.g., no upper
limit on the number of loans that can be defaulted or the number of cars that
will arrive at Mexico Exit).

CONTINUOUS PROBABILITY DISTRIBUTION

Mean, Variance, and Standard Deviation of a Continuous Probability


Distribution
We have emphasized in our discussion on discrete probability distribution
that our random variables of interest can assume only clearly separated
values (ie., discrete variables). We use the continuous probability distribution
to account for random variables that arise from measuring something (l.e.,
continuous variables).

The uniform probability distribution, the normal probability distribution, and


the that will be covered in this section are the continuous probability
distributions describe exponential probability distribution are the three
families of continuous probability distributions the probability that a
continuous random variable with an infinite number of possible values will
fall within a specified range.

Uniform Probability Distribution

The uniform probability distribution is the most basic distribution for a


continuous random variable. Its shape is rectangular and is defined by
minimum and maximum values. For example, the time it takes for a flight
from Ninoy Aquino International Airport (MNL) to Iloilo International Airport
(ILO) ranges from 70 to 90 minutes. The MNL-ILO flight time, expressed in
minutes, falls under the category of random variable. Keep in mind that the
flight time is continuous between 70 and 90 minutes.

The mean of the uniform distribution can be found In the middle of the
interval between the minimum (a) and maximum (b) values, computed in a
similar fashion to that of the median. See Equation 3.16.

Equation 3.16

μ= a+b 2
Mean of the uniform probability distribution

Meanwhile, the standard deviation, or the square root of the variance in the
uniform probability distribution, is also related to the interval between the
maximum (b) and minimum (a) values. This is represented by Equation 3.17.

Equation 3.17

Sigma = sqrt(((b – a) ^ 2)/12)

Standard deviation of the uniform probability distribution

The height of the distribution, P(x), is equal for all values of the random
variable, x. It can be computed using Equation 3.18.

Equation 3.18

Il a <= x <= b elsewhere P(x) = 1/(b – a)

Height of the uniform probability distribution

Because of the rectangular shape of the uniform probability distribution, it


allows us to use the formula for the area of a rectangle to determine the
areas within the distribution representing probabilities. Hence, the area can
be computed using Equation 3.19.

Equation 3.19

Area base x height = P(x)(b – a) = 1


Area of the uniform probability distribution

Equation 3.19 tells us that the total area within a continuous probability
distribution is always equal to 1. This is the same as our previous discussion
that the sum of probabilities of all outcomes must always be equal to 1.

Imagine, for instance, that students at a state university in Metro Manila


have access to jeepney service while they are on campus. A jeepney arrives
at the dorms every 30 minutes between 6:00 a.m. and 11:00 a.m. on
weekdays. Students arrive at the jeepney stop randomly. A student’s waiting
time is uniformly distributed from 0 to 30 minutes. Determine a student’s
typical waiting time and its corresponding standard deviation. The likelihood
that a student will wait more than 25 minutes, as well as the likelihood that
they will wait between 10 and 20 minutes, should be calculated. Let x be the
continuous random variable waiting time.

To determine a student’s typical waiting time (i.e., mean waiting time), we


apply Equation 3.16. Let a = 0 and b = 30 mu = (a + b)/2 = (0 + 30)/2 =
30/2 = 15

The mean of the distribution is 15 minutes. Thus, the usual waiting time for
the jeepney service is 15 minutes.

To determine the standard deviation of the waiting time, we apply Equation


3.17. sigma = sqrt(((b – a) ^ 2)/12) = sqrt(((30 + 0) ^ 2)/12) = sqrt((30 ^
2)/12) = 8.66

The standard deviation of the distribution is 8.66 minutes. This calculates the
range of the students’ waiting time, which can be 8.66 minutes or more or
less.
In order to calculate the likelihood that a student will wait longer than 25
minutes, we need to know the area under the distribution over the range of
25 (a) to 30 (b) minutes Hence, we apply Equations 3.18 and 3.19
P25<x<30)=Px(b-a)= (30-0) (30-25) 0.1667 A student has a 16 67 percent
probability of waiting between 25 and 30 minutes. In computing the
probability that a student will wall within 10 and 20 minutes, we need

The area within the distribution for the interval 10 (a) to 20 (b) minutes.
Hence, we apply Equations 3. 18 and 3.19. 1 P10<x<20) Pix)(b-a)= (30-0)
(20-10) -0.3333 A student has a 33.33 percent probability of waiting between
10 and 20 minutes.

Normal Probability Distribution

Next, we take into account a probability distribution that is characterized by


its mean and standard deviation rather than its minimum and maximum
values. This distribution is known as the normal probability distribution and is
mathematically represented by Equation 3.20.

Equation 3.20

P(x) =

Normal probability distribution

The terms comprising Equation 3.20 are not new to you anymore. We have
defined all of them previously. This is the process by which the probabilities
found in the areas under the normal curve are generated, as seen in Figure
3.3, with the probabilities that make up the area under the normal curve.
This is also the mechanism used by the Excel formula =normdist() in
spewing out probability values. Refer to the Excel manual for the full syntax.
The normal probability distribution Is characterized as follows:

It is bell-shaped and has a single peak at the center of the distribution. The
arithmetic mean, median, and mode are equal and located in the center of
the distribution. Hence, 50 percent of the area under the normal curve is to
the right of this center point, and the other 50 percent is to the left of it.

It is symmetrical about the mean. The area to the left of the center point is a
mirror image of the area to the right.

It is asymptotic wherein the curve’s tails, both left and right, infinitely and
indefinitely approach the x-axis but never touch or intersect it.

The location of a normal distribution is anchored on the mean, µ. The


dispersion of the distribution is anchored on the standard deviation, σ.

Because the normal probability distribution is a continuous probability


distribution, areas below the curve define probabilities. Thus, the sum of
probabilities to the left of the center point is 0.5, and the sum of probabilities
to the right of the center point is 0.5. So, the sum of all probabilities (i.e.,
total area) under the normal curves is always 1.

Figure 3.1 illustrates the normal distribution reflecting its characteristics


enumerated on the previous page,

Figure 3.1. The Normal Probability Distribution

Because the normal probability distribution is defined by the mean and


standard deviation, there is a family of normal probability distributions
instead of just one. This is because different normal probability distributions
can have the same mean but different standard deviations. For example, we
compare the normal probability distribution of the length of stay, in number
of days, of foreign tourists visiting Coron, El Nido, and Puerto Princesa.
Because all of them are in the province of Palawan, there is a chance that the
mean number of days of stay is the same for the three destinations, but the
standard deviations are different. On the other hand, it is also possible that
the mean numbers of days of stay are different and the standard deviations
are the same. They may also have different means and standard deviations.
See Figure 3.2 for the illustration.

0.7

0.5

0.4

02

40.5

5,1

0.0

Figure 3.2. Family of Normal Probability Distributions

Standard Normal Probability Distribution

Because there is a family of normal probability distributions, with each


having a different mean, standard deviation, or both, it follows that there are
an infinite number of normal probability distributions. In contrast to the
binomial and Poisson discrete probability distributions, we are unable to build
probability tables for everyone. The probabilities for all normal probability
distributions can be calculated using one member of the normal probability
distribution family, though. We call this member the standard normal
probability distribution.

The standard normal probability distribution is unique because it has a mean


t O and a standard deviation of 1. This is often denoted as the random
variablex-M01)

Standard normal probability distribution by subtracting the mean from each


observation and Therefore, any normal probability distribution can be
transformed into its corresponding dividing the difference by the standard
deviation. The resulting quotient is referred to as one of the following: z
score, z value, z statistic, standard normal deviate, standard normal value or
normal deviate, in our discussion, we simply refer to this as our z score.

The signed distance between a chosen value, denoted, and the meanu,
divided by the from the mean measured in units of standard deviation. It is
expressed mathematically as standard deviation a, is how we determine our
z score. Therefore, a z score is the distancs Equation 3:21.

Equation 3.21

1-x 2 σ

Standard Normal Value (z score)

Where: x is the value of any particular observation or measurement;

Is the mean of the distribution; and


A is the standard deviation of the distribution.

Table of Sasndard Normal Probabilities for Negative Z-scores

Table of Standard Normal Probabilities for Positive Z-scores

Note: Notice the shaded area of the bell curve. It means that the probabilities
given in this table represent the area to the LEFT of the z score. The area to
the RIGHT of a z score is equal to 1-the area to the left of the z score.

Figure 3.3. Standard Normal Table

Note that once normally distributed observations are standardized, the z


scores are normally distributed with a mean of 0 and a standard deviation of
1. The standard normal table (usually appended at the end of most textbooks
in statistics) contains the probabilities for the standard normal probability
distribution (Glen, n.d.), as seen in Figure 3.3. The standard normal table is
appended at the end of this textbook It is also nasily accessible online and
through any statistical software. To avoid confusion. I prefer using a standard
normal table that presents the probabilities for positive and negative z scores
separately These probability values can also be retrieved using the Excel
formula normsdist(). Refer to the Excel manual for the full syntax

Before we demonstrate the application of the z score, we recall our


discussion of the empirical rule in Chapter 2 (Figure 2.9). Recall that the
empirical rule established that:

1. About 68 percent of the area under the normal curve is within one
standard deviation of the mean, written as (mu = 10)

2. About 95 percent of the area the normal curve is within two standard
deviations of the mean, written as (mu plus/minus 20)
3. Almost all the area under the normal is within three standard
deviations of the mean, written as (mu plus/minus 30)

As shown in Figure 3.4, the scale deviates when measurements are


converted to the standard normal. In other words, (mu plus/minus 10) is
transformed to a z score of 1.0, (mu plus/minus 2 * d) 10 8 z score of 2.0,
(mu plus/minus 3 * sigma) a z score of 3.0, and the center to a z score of 0.0,
showing no departure from the mean, u.

00.7

05.4% between 12 sd

68.3% between x1 ad

Only 3 points in 1000 will fall outside the ares 3 standard deviations either
side of the center

34.1%

34.1%

13.0%

2.1%
Mea

s.d.

13.0%

Sd

Adstandard deviation

Figure 3.4. Transformed Measurements to Standard Normal

Consider, for instance, that a battery manufacturing company conducts a


study on the life span of their products in accordance with the company’s
quality assurance procedure. For an AAA battery, its mean life is 19 hours.
Suppose further that the useful life of the battery is approximately normally
distributed with a standard deviation of 1.2 hours. Using the empirical rule,
we know the following:

1. About 68 percent of the batteries will fail between 17.8 and 20.2 hours
computed by 19 plus/minus 1 * (1.2)

2. About 95 percent of the batteries will fail between 16.6 and 21.4 hours
computed by 19 plus/minus 2 * (1.2)

3. Practically all batteries will fail between 15.4 and 22.6 hours computed
by 19 pm 3(1.2).
The standard normal probability distribution is unique because it has a mean
of O and a standard deviation of 1. This is often denoted as the random
variable x~ (0,1)

Standard normal probability distribution by subtracting the mean from each


observation and dividing the difference by the standard deviation. The
resulting quotient is referred to as one Therefore, any normal probability
stribution can be transformed into its corresponding of the following: z score,
z value, z statistic, standard normal deviate, standard normal value, or
normal deviate. In our discussion, we simply refer to this as our z score.

The signed distance between a chosen value, denoted x, and the mean u,
divided by the standard deviation a, is how we determine our z score.
Therefore, a z score is the distance from the mean measured in units of
standard deviation. It is expressed mathematically as Equation 3.21.

Equation 3.21

ZR χημ

Standard Normal Value (z score)

Where: x is the value of any particular observation or measurement;

Is the mean of the distribution; and

O is the standard deviation of the distribution.

Tabar of Standard Normal Probabilities for Negative Z-scores


Table of Standard Normal Probabilities for Positive Z-scores

Note: Notice the shaded area of the bell curve. It means that the probabilities
given in this table represent the area to the LEFT of the z score. The area to
the RIGHT of a z score is equal to 1-the area to the left of the z score.

Figure 3.3. Standard Normal Table

Note that once normally distributed observations are standardized, the z


scores are normally distributed with a mean of 0 and a standard deviation of
1. The standard normal table (usually appended at the end of most textbooks
in statistics) contains the probabilities for the standard normal probability
distribution (Glen, n.d.), as seen in Figure 3.3. The standard normal table is
appended at the end of this textbook. It is also easily accessible online and
through any statistical software. To avoid confusion, I prefer using a standard
normal table that presents the probabilities for positive and negative z scores
separately. These probability values can also be retrieved using the Excel
formula normsdist(2). Refer to the Excel manual for the full syntax.

Before we demonstrate the application of the z score, we recall our


discussion of the empirical rule in Chapter 2 (Figure 2.9). Recall that the
empirical rule established that:

1. About 68 percent of the area under the normal curve is within one
standard deviation of the mean, written as (mu plus/minus 10)

2. About 95 percent of the area normal curve is within two standard


deviations of the mean, written as (mu plus/minus 20)

3. Almost all the area under the normal is within three standard
deviations of the mean, written as (mu plus/minus 30)
As shown in Figure 3.4, deviates when measurements are converted to the
standard normal. In other words, is transformed to a z score of 1.0, (mu =
20) to a z score of 2.0, (mu plus/minus 30) to a z score , and the center to a z
score of 0.0, showing no departure from the mean, μ (mu plus/minus 10)

99.7% between ad

95.4% between 120

08.3% between 21 sa

Only 3 points in 1000 will fall outside the area 3 standard dovi ndard
deviations either side of the center line.

34.1%

34.1%

13.8%

Sd. Standard deviation

13.6%

Mean

a.d.
Figure 3.4. Transformed Measurements to Standard Normal

Consider, for instance, that a battery manufacturing company conducts a


study on the life span of their products in accordance with the company’s
quality assurance procedure. For an AAA battery, its mean life is 19 hours.
Suppose further that the useful life of the battery is approximately normally
distributed with a standard deviation of 1.2 hours. Using the empirical rule,
we know the following:

1. About 68 percent of the batteries will fail between 17.8 and 20.2 hours
computed by 19 plus/minus 1 * (1.2)

2. About 95 percent of the batteries will fail between 16.6 and 21.4 hours
computed by 19 plus/minus 2 * (1.2)

3. Practically all batteries will fail between 15.4 and 22.6 hours computed
by 19 ± 3(1.2).

Because there are an endless number of potential normal probability


distributions, standardization is significant from a statistical standpoint.
Standardization facilitates comparison from a practical standpoint. We give
an example to further clarify this.

We all know that before applying to law school, students need to take the
LSAT (Law School Admission Test). Before applying to medical school,
students need to take the MCAT (Medical College Admission Test). Assume
that the scores are normally distributed. Suppose that the mean score for the
LSAT is 151 with a standard deviation of 10, and the mean score for the
MCAT is 25.1 with a standard deviation of 6.4. Suppose further that a
prospective student took both exams. He/She scored 172 on the LSAT and 37
on the MCAT. On whichh test did the prospective student do better?
To answer this question, we cannot simply outright compare his/her scores
on the LSAT and MCAT because they are not comparable. We need to convert
the score to allow direct comparison. Here, we can use the standard normal
by computing for the respective z scores for the LSAT and MCAT. For I =
LSAT, MCAT, define x, as the prospective student’s score on both
examinations, u as the population mean score, and a, as the population
standard deviation.

The prospective student’s z score for LSAT is found by applying Equation


3.21 wherein 2.1, and his/her z score for MCAT is found by z = (x – y)/sigma
= (37 – 25.1)/64 = 1.9. Hence, we can conclude that the prospective student
performed relatively better in the LSAT than MCAT. 2

To demonstrate the full application of the standard normal probability


distribution, we can also inquire on the probability or proportion of the
sample or population assuming a specific value of a random variable.

For illustration, let us say that an FM radio station discovers that listeners’
duration of tuning in (measured in minutes) follows a distribution that is
roughly normal. The distribution’s mean is 15.0 minutes, and its standard
deviation is 3.5 minutes. The probability that a random listener will tune in is:
(a) for more than 20 minutes; (b) for less than 20 minutes; and (c) for
between 10 and 12 minutes. The duration of a listener’s attention, x, is our
random variable.

We allow x = 20 mu = 15 sigma = 3.5 and do the following steps to obtain


P(x > 20) , which is the probability that a specific listener will tune in for
more than 20 minutes

1. Find the z score by applying Equation 3.21: z = (A – mu)/o = (20 *


15)/3.5 = 1.43 if you are manually calculating and using the standard
normal table, round the result to two decimal places; if you are using
Excel, leave the result as it is).
2. Find the probability value that matches your calculated z score. If you
are using the standard normal table because the computed z score is
positive, use the table for positive z scores.

3. Because the z score is 1.43, move down the left margin of the standard
normal table to the row 1.4 and across that row to the column headed
0.03. The corresponding probability value is: z(1.43) = 0.9236 The
same can be determined using the Excel formula =normsdist(1.43),
which is equal to a more precise probability value of 0.92364149….

4. This probability value represents the area to the left of the z score, as
illustrated in Figure 3.5. However, we are interested in the probability
that a particular listener will tune in for more than 20 minutes. Thus,
we are interested in the area below the normal curve that is to the
RIGHT of the z score.

5. Therefore, P(x > 20) = 7.64 percent. There is a 7.64 percent probability
that a listener will tune in for more than 20 minutes.

We let x = 20 mu = 15 sigma = 3.5 and perform the following procedures to


ascertain that a specific listener will tune in for 20 minutes or less, P(x <=
20)

1. Find the z score by applying Equation 3.21 / z = (s – mu)/sigma = (29 –


10)/15 = 1.43

2. Find the probability value that matches your calculated z score.


3. Because the z score is 1.43, move down the left margin of the standard
normal table to the row 1.4 and across that row to the column headed
0.03. The corresponding probability value is: z(1.43) = 0.9236 The
same can be determined using the Excel formula normsdist(1.43),
which is equal to a more precise probability value of 0.92364149….

4. This probability value represents the area to the left of the z score, as
illustrated in Figure 3.5. However, we are interested in the probability
that a particular listener will tune in for 20 minutes or less. Thus, we
are interested in the area below the normal curve that is to the LEFT of
the z score.

5. Therefore, P(x <= 20) = 92.36 percent. There is a 92.36 percent


probability that a listener will tune in for 20 minutes or less.

50%/0.3501

0.02302.36%

143

1-0.0230-0.0704-7.04%

Figure 3.5. Determining Probabilities Using the Standard Normal Table

We calculate the z score for each limit to estimate the probability that a
specific listener will tune in between 10 and 12 minutes, P(10 < x < 12) We
will assume that x = 10, 12 mu = 15; and and proceed as follows: sigma =
3.5
1. Find the z score at 10 by applying Equation 3.21: z = (z – mu)/9 = (10 –
15)/3.5 = - 1.43 Meanwhile, the z score at 12 is z = (x – mu_{m})/c =
(12 – 18)/25 = - 0.86

2. Find the likelihood that matches your computed z scores. If you are
using the standard normal table because the computed z score is
negative, use the table for negative z scores.

3. Because the first z score is-1,43, move down the left margin of the
standard normal table to the row-1.4 and across that row to the column
headed 0.03. It follows that z(- 1.43) = 0.0764 is the probability value.
The same can be determined using the Excel formula =normsdist(-
1.43), which is equal to a more precise probability value of
0.07635851…..

4. Because the second z score is -0.86. move down the left margin of the
standard normal table to the row -0.8 and across that row to the
column headed 0.06 The corresponding probability value is: z(- 0.86) =
0.1949 The same can be determined using the Excel formula
=normsdist(-0.86), which is equal to a mors precise probability value of
0.19489452…..

5. These probability values represent the area to the left of their


respective z score, as illustrated in Figure 3.6. However, we are
interested in the probabilities found in the area between the z scores of
-0.86 and-1.43. Hence, we subtract the area of the probabilities to the
left of z = - 1.43 from the area of the probabilities to the lett of z = -
0.86 So, we have 0.1949-0.07640.1185.

6. In light of this, P(10 < x < 12) = 11.85 percent. There is an 11.85
percent probability that a listener will tune in between 10 and 12
minutes.
0.07647.64%

5.5 = 50%
5.6

15 = 50%

0.1949-19.40%

0.1185-11.85%

-1.43 -0.86 0

Figure 3.6. Determining Probabilities in between Using the Standard Normal


Table

Exponential Probability Distribution

The exponential probability distribution, commonly known as the negative


exponential distribution, is the last one (Render et al. 2003, 61). What is the
likelihood that a certain event will take place within the next x hours or days?
What is the likelihood that a certain event will take place between x_{1} and
x_{2} hours? What is the likelihood that the event will take more than x_{1}
hours? That is, the random variable x equals (a) the time between events or
(b) the passage of time to complete an action (e.g., serve a customer). It is
often concerned with the amount of time until some specific event occurs or
with queuing problems (Lumen Learning, n.d.).

Its probability distribution function, Equation 3.22, depicts a continuous


probability distribution, while Equations 3.23 and 3.24 provide its mean and
variance.
Equation 3.22

P(x) = m * e ^ (- m * x)

Exponential probability distribution

Equation 3.23

M = 1/mu

Mean of the exponential probability distribution

Equation 3.24

Sigma ^ 2 = 1/(mu ^ 2)

Variance of the exponential probability distribution

Where: x is the random variable;

E is the Euler’s constant of 2.71828, which is the base of the Naperian


logarithmic system;

M is the mean of the exponential probability distribution or the decay factor,


which measures how fast the likelihood of an event decreases as the random
variable xincreases:
Uis the mean number of units that can be handled in a specific period of
time; and is the variance of the exponential probability distribution. Sigma ^
2

The amount of time left until a specific volcano erupts again; the duration, in
minutes, of long-distance business calls from Manila to Davao; the number of
months it takes for a car battery to last; and other situations are examples of
situations that resemble an exponential probability distribution. Among other
things, the time needed to serve those consumers waiting in line. The
following formula is the probability density function:

The general shape of the exponential probability distribution represented by


Equation 3.22 is illustrated in Figure 3.7.

Probability Density

Figure 3.7. Negative Exponential Distribution

Let x = the length of time (in minutes) that a bank teller spends serving a
customer be our continuous random variable as an example. With a mean
duration of four minutes, the time is distributed exponentially. Let mu = 4
minutes.

You need to determine first the mean by applying Equation 3.23. Hence,
From Equation 3:22: P(x) The variance is given by = m * e ^ (- m * x) =
0.25e ^ (-0.25x) where Suppose x = 5 words, the P(x) has a value of 0.072
when x = 5 as in Figure 3.8. sigma ^ 2 = 1/(mu ^ 2) = 1/16 = 0.0625 m =
1/mu = ¼ = 0.25 P(5) = 0.25theta ^ (-0.28(8)) = 0.072 x >= U In other

Figure 3.8. Negative Exponential Distribution at 1x = 5 = 4; m = 0.25


Using this information, we can compute the probability that a teller services
a random client in four to five minutes, P(4 < x < 5) using the cumulative
distribution function (CDF), which gives the area to the left.

To find P(4 < x < 5) the CDF is given by P(x < k) = 1 – e ^ (mk) and P(x >=
k) = e ^ (mk) , When P (x = k) equiv0 As such:

P(x < 4) = 1 – e ^ ((- 6.25)(4)) = 0.6321 P(x < 5) = 1 – e ^ ((- 0.26)(5)) =


0.7135 P(4 < x < 5) = 0.7135 – 0.6321 = 0.0814 = 8.14%

Therefore, there is an 8.14 percent likelihood that a teller spends four to five
minutes with a random client. See Figure 3.9 for the illustration.

0.25 -0.25

Ty Density Probability

0.7158

0.6321

45

Y=4

Figure 3.9. Negative Exponential Distribution at 4 < x < 5 m = 0.25 mu = 4


The exponential probability distribution, in theory and practice, will be
discussed further in detail when you take more advanced statistical or
cognate courses, where its application is more pronounced. In the meantime,
it is important that you are aware of its definition and general use as one of
the continuous probability distributions.

You might also like