0% found this document useful (0 votes)
6 views3 pages

Insights Beyond Boxplots Explained

The document discusses the limitations of boxplots, emphasizing that they do not provide information about the number of data points or the distribution pattern within quartiles. It illustrates these points with examples, including a comparison of age distributions for Best Actor/Actress Oscar winners, highlighting differences in median ages and variability. Ultimately, it argues that actresses tend to win Oscars at younger ages than actors, despite some overlap in age distributions.

Uploaded by

jellycreamus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

Insights Beyond Boxplots Explained

The document discusses the limitations of boxplots, emphasizing that they do not provide information about the number of data points or the distribution pattern within quartiles. It illustrates these points with examples, including a comparison of age distributions for Best Actor/Actress Oscar winners, highlighting differences in median ages and variability. Ultimately, it argues that actresses tend to win Oscars at younger ages than actors, despite some overlap in age distributions.

Uploaded by

jellycreamus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

What a boxplot does not tell us

At this point, you should know how to

 Create a boxplot from a five-number summary.


 Use a boxplot to identify and interpret quartiles.
 Identify the median and the IQR of a distribution from a boxplot.

Now we want to focus on what a boxplot does not tell us. A boxplot does not give us information
about the following:

 The number of data points in the data set.


 The number of data points within each quartile (though each quartile contains the same
number of data points).
 The pattern of the data within each quartile.

EXAMPLE The same boxplots but very different distributions

Here are four data sets that illustrate these ideas.

How are these data sets similar? Notice that the four data sets have the same boxplot. This is
because the five-number summary is the same for each data set. The data sets have identical
minimum value, quartile marks and maximum value, so we can say that these data sets have the
same center and spread.

 Center: Each data set has a median of 10.


 Spread: In each data set, the middle half of the data varies from 7 to 14, so the IQR is 7.
In each data set, the data varies from 4 to 19, so the overall range is 15.
How are these data sets different? The data sets do not have the same number of data points.
Also, the shape of each distribution is different. For small data sets, the boxplot does not give us
reliable information about the shape.

EXAMPLE Comparing ages for Best Actor/Actress Oscar winners

Here we examine the age distributions of ages for Best Actor/Actress Oscar winners for the years
1929-2019.

Best actress/actor age summary


Column n Min Q1 Median Q3 Max
Actress age 92 21 28.5 33 41 80
Actor ages 92 29 37.5 42 49 76

We start our analysis by making observations about shape, center and spread. We focus on
comparing the two distributions, instead of just making a list of disconnected observations.

Shape: For both men and women, the distribution of ages is skewed to the right because the
upper half of the data has more variability than the lower half. This is due to older actresses and
actors who are outliers. In both cases, the shape suggests that fewer older actresses and actors
win the Oscar for best acting.

Center: Actresses tend to win the Oscar at a younger age than do actors. The median age for
actresses (33) is lower than for the actors (42), and quite a bit lower because it is also less than
Q1 for actors (37.5).

Spread: The actresses have more variability in their overall ages (range = 59) compared to the
actors (range = 47). But the variability in the middle half each distribution is similar: the
actresses’ ages (IQR = 12.5) than in the actors’ ages (IQR = 11.5).
Intervals of ages for the middle half of each distribution overlap. Typical ages for actors range
from 37.5 to 49 years, compared to 28.5 to 41 years for actresses. This overlap is largely due to
the 3rd quartile for actresses overlapping the 2nd quartile for actors.

The 1st quartile mark for actresses is almost equal to the minimum age for actors, which means
that about 25% of the winning actresses were younger than the youngest winner actor.

Outliers: We see that we have outliers in both distributions. There is only one high outlier in the
actors’ distribution, compared with six high outliers in the actresses’ distribution.

Now let’s pull these observations together into a paragraph that compares the two distributions
and supports a central thesis. Our paragraph demonstrates a good use of course concepts and
vocabulary, which makes it less appropriate for a general audience who may not be trained in
statistics.

We are going to argue that actresses tend to win the Oscar for best acting at younger ages than
actors.

Our paragraph:

In general, actresses win the Best Actress Oscar at a younger age than actors. The median age for
actresses is 33, lower than median (42) and lower than the first quartile mark (37.5) for actors.
There is some overlap in middle half of each distribution which can be interpreted as
representing typical winning ages. Typical actors win this Oscar between the ages of 37.5 and
49, compared to typical actresses who are between the ages of 28.5 and 41. For both sexes, the
typical winning ages are similarly consistent as seen by a comparison of the IQRs (Actresses
IQR = 12.5; Actor IQR = 11.5.) Despite this overlap and the similar spread in the middle half of
the distributions, it is important to note that about 25% of the winning actresses are younger than
the youngest winning actor. Though winning actresses are arguably younger than winning actors,
both distributions have older winners that are outliers. These older winners are unusual and skew
the distribution of ages to the right.

You might also like