relative frequency of the number of times A wins out of n trials of the game as Ann where
An is the number of times A wins and n is the total number of trials. Then, the law of
large numbers states the following:
where p denotes the theoretical probability of A winning (in this game, that is 0.45).
This is saying that the probability that the relative frequency Ann is different from p is tend-
ing toward 0. What is important to note in the law of large numbers is that it connects
the empirical probability of an outcome to the theoretical probability of an outcome.
INVESTIGATION SUMMARY:
The main concepts developed in the last banana investigation are:
1. The empirical probability computed through simulations will converge on the
theoretical probability as the number of simulations performed increases.
2. Illustrate how to estimate probabilities for complex situations using simulations.
3. If X and Y are independent events, then P(X and Y) = P(X) • P(Y).
4. The complement rule is given by the following formula: P(X) = 1 - P(not X).
Investigation 3A.4: Game Board3
Goals of this investigation: Introduce the notion of probability in a
complex situation through simulations.
The committee for a school fundraiser is organizing a game to raise money for the school.
To play the game, parents purchase tickets. For each ticket they purchase, they get one
turn at the game. The game the committee designs is the following:
A wooden peg board is constructed and a player releases one ball from the top of the
board. The ball follows some pathway and ends in a bin at the bottom of the board.
3 This activity was adapted from an activity developed by Wendy Weber at Central College, Iowa.
156 | Statistics and Data Science For Teachers
Image created via [Link]
The pegs on the board are placed in such a way that there is an equal chance for a ball to
fall to the left past the peg or to the right past the peg. At the bottom, the cell has either a
prize or no prize associated with it. A total of two prizes need to be placed at the bottom
bins. The students of the school want to create a game so that the chances of winning a
prize is low and those of not winning a prize are high. The students pose the following
investigative question:
Where should the two prizes be placed in order to have the lowest probability of winning?
To help answer the investigative question, the students decide to simulate a few drops of
the balls in the game board. They set up the following simulation:
Each student receives a diagram of the game board showing all possible pathways the ball
could take, a coin, and a ball. The following picture shows the game board that is used by
the students:
Unit 3A: Probability Introduction | 157
All the students start their ball on the start square. To advance down the board, the
students flip the coin. If the student flips heads, then they advance their ball to the right.
If the student flips tails, then they advance to the left. The players keep going until they
end in a bin at the bottom.
Each player plays the game 10 times. For each game, the players mark in what bin they
ended. Here is an example of 20 simulations that two students carried out:
Based on the 20 simulated games, the students can compute the relative frequency of
landing in each of the cells. For the simulated game depicted, the relative frequencies are:
• Bin 0 happened 2/20 = 0.10
• Bin 1 happened 3/20 = 0.15
• Bin 2 happened 6/20 = 0.30
• Bin 3 happened 6/20 = 0.30
• Bin 4 happened 3/20 = 0.15
After going through the simulation in pairs, the students decide to combine their results
into a single data set. They get the following relative frequencies for the class:
• Bin 0 happened 23/300 = 0.077
• Bin 1 happened 84/300 = 0.280
• Bin 2 happened 100/300 = 0.333
• Bin 3 happened 76/300 = 0.253
• Bin 4 happened 17/300 = 0.057
158 | Statistics and Data Science For Teachers
While having 300 simulations is indeed a lot, the students want to find out if these rela-
tive frequencies hold in the long run. To test this, they find an online game that mimics
their game at [Link] They then
run the simulation 10,000 times. The following results are obtained:
For this large simulation, the following relative frequencies and empirical probabilities
are computed:
• P(Bin 0) = 623/10,000 = 0.062
• P(Bin 1) = 2472/10,000 = 0.247
• P(Bin 2) = 3716/10,000 = 0.372
• P(Bin 3) = 2557/10,000 = 0.256
• P(Bin 4) = 632/10,000 = 0.063
These represent the empirical probabilities of landing in each cell. Next, we can use for-
mulas and rules to compute the theoretical probabilities. We then can compare how close
our empirical probabilities are to the theoretical ones.
To begin, the students list their sample space, which consists of all possible pathways from
the start to the bins at the end.
Unit 3A: Probability Introduction | 159
There are 16 possible pathways to the end bin. They are:
ACF0 BEI4
ACF1 BEI3
ACG1 BEH3
ACG2 BEH2
ADG1 BDH3
ADG2 BDH2
ADH2 BDG2
ADH3 BDG1
The first thing to notice is that although each pathway is equally likely (have the same
probability), the probabilities of landing in different bins are not equally likely. Instead we
see that many more pathways lead to bin two than bin zero. In fact, bins zero and four are
the ending cell for one pathway, bins one and three are the ending cell for four pathways,
and bin two is the ending cell for six pathways. From this, we can compute the following
theoretical probabilities:
1
P(A) = P(E) = 16
+ 0.0625
4
P(B) = P(D) = 16 = 0.25
6
P(C) = 16 = 0.375
We note that these theoretical probabilities were very well approximated by empirical
probabilities found by the 10,000 simulations. From these results, the students decide to
put prizes on bin zero and bin four only.
160 | Statistics and Data Science For Teachers
As an additional note to this investigation, teachers could explore the connections to the
binomial coefficient and Pascal’s triangle by considering a generalization of the Plinko
board to incorporate more rows. The online simulation app allows one to increase or
decrease the number of rows in the Plinko board. The board in the investigation has four
rows with five bins; however, what would the probability of landing in specific bins be
if there were seven rows? Ten rows? Fifty rows? rows? We can generalize our results to
the case of rows using counting rules and noticing patterns.
First, if there are n rows in a Plinko board, how many bins will be at the bottom? The
answer is that there are n + 1 bins.
Next, in the previous, four-row example, we found that there were 16 total possible path-
ways to get to the bottom. We found this by writing out all of the pathways; however, we
could recognize that for four rows, we have a total of 24 = 16 possible pathways. In the
general case of n rows, this means that there are a total of 2n possible pathways. This will
give us our denominator for the theoretical probabilities.
For the numerators, we have to somehow express the number of pathways that lead to
each bin in terms of n and the selected bin. As before, we can label our n + 1 bins from
0 to n + 1. By simulating, we notice that the middle bins always have more balls landing
in them and that the number of pathways leading to the bins are symmetric about the
middle bin(s). From this, we know that the number of paths leading to bin zero will be
the same as the number of paths leading to bin n + 1, bin one is the same as bin n, bin three
is the same as n − 1, etc.
We can first reduce the number of rows and look at the number of rows leading to the
bins for these lower numbers. For example, for the Plinko board with two rows, we
would have
with one possible pathway leading to bin zero, two possible pathways leading to bin
one, and one possible pathway leading to bin two. Adding one more row to the Plinko
board, we have the following:
Unit 3A: Probability Introduction | 161
For this Plinko board, we can count the pathways again and we see that there is one path-
way to bin zero, three pathways to bin one, three pathways to bin two, and one pathway
to bin three.
At this point, we observe a familiar pattern begin to emerge—Pascal’s triangle is at play!
In fact, our example Plinko board with four rows has the possible pathways as one to
bin zero, four to bin one, six to bin two, four to bin four, and one to bin four, which is
exactly the next row in Pascal’s triangle. Using our knowledge about Pascal’s triangle,
we know that the number of pathways to the n + 1 bins for the n row Plinko board
must therefore be given by the binomial coefficient. Thus for bin k, the number of
pathways leading to it in an n row Plinko board are:
n n!
C(n, k) = nCk k = k!(n-k)!
These connections to the binomial coefficient and Pascal’s triangle are additional
mathematical ideas that can be easily connected to probability. These types of con-
nections are not statistical in nature; however, they do offer math teachers and
math-teacher educators ways to make deeper connections between mathematics
and probability. While these concepts are interesting to explore, the main point of
the game board investigation is not to derive or work with the binomial formula or
Pascal’s triangle, but instead to illustrate how simulations can be useful in computing
probabilities and how the empirical probabilities tend toward the theoretical proba-
bilities as the number of repetitions increases.
INVESTIGATION SUMMARY:
The main concepts developed in the game board investigation are:
1. Probability calculations can help make decisions.
2. Implementing hand simulations is useful to build conceptual understanding
before moving to software to carry out a large number of simulations.
162 | Statistics and Data Science For Teachers
Investigation 3A.5: Random Exams
Goals of this investigation: Use probabilistic modeling and probability
long-run computations through theory and simulation.
Jessica is a high-school teacher who recently collected and graded tests for her statistics
class. While doing this, she noticed that out of all the tests she collected, four of them
were turned in without names. When Jessica handed back the tests to the class, there
were four students who did not receive a test, because they had forgotten to put their
names on them. At this point, Jessica decides to randomly hand back the four tests to the
four students and hope that the students get the test they turned in. She wonders what the
chances are that each student will receive the correct test and decides to investigate the
following questions with her class:
How often will all four students be given their correct test?
This same question could be rephrased as follows:
What is the probability that all four students received the correct test?
This scenario highlights probabilistic modeling. While the investigative questions ask for
computations of probabilities, the underlying idea in the scenario is randomness. Students
are being handed back tests randomly—a random process is occurring in this setting. The
outcome from this random process of handing back the tests is not predictable, because
we do not know whether each student will be handed back the correct test. The computa-
tion of the probability of obtaining four matches is based on the notion that if the random
phenomenon of handing the tests back was repeated over and over again, we would want
to identify the chances of getting four out of four correct matches.
With the class, Jessica can simulate the random process of handing back the tests over and
over again and observe how many matches happen after each simulation, or she can com-
pute the probability directly of four matches using theoretical rules. She begins by simulating
the process over and over again, first by hand and then using software. She then can com-
pare how close the empirical probability that she derived through the simulation matches
up with the theoretical probability she computed using probability rules and formulas.
To simulate by hand, she takes four cards and labels them 1 through 4, each represent-
ing one of the tests, and four stick figures, labeled 1 through 4, each representing one
of the four students (see Figure 1). She can mix the cards without looking at the labels
Unit 3A: Probability Introduction | 163