0% found this document useful (0 votes)

6 views37 pages

Data Set Summarization Techniques

Chapter 3 discusses summarizing data sets through various statistical measures including location (mean, median, mode), dispersion (range, variance, standard deviation), and shape (skewness). It explains how to calculate these measures and provides examples for better understanding. Additionally, it covers noncentral tendency measures such as quartiles, quintiles, and percentiles, emphasizing their importance in data analysis.

Uploaded by

danar.talabani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views37 pages

Data Set Summarization Techniques

Uploaded by

danar.talabani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 3

Summarizing Data Sets

24 January 2024

Statistics - Spring semester 2023-2024

Outline

- Numerical data may be summarized according to several characteristics.

We will work with each in turn and then all of them together.
Measures of Location
- Measures of central tendency: Mean; Median; Mode
- Measures of noncentral tendency - Quantiles
Quartiles; Quintiles; Percentiles
Measures of Dispersion
- Range
- Interquartile range
- Variance
- Standard Deviation
- Coefficient of Variation
Measures of Shape
Skewness
Five Number Summary (Boxplot)
Scatter Plots and Correlation
24 January 2024 Statistics - Spring semester 2023-2024
Measures of Location
Measures of Central Tendency
• Measures of location place the data set on the scale of real numbers.
• Measures of central tendency (i.e., central location) help find the approximate
center of the dataset.
• These include the mean, the median, and the mode.

The Mean: The mean of a set of data is the sum of all values in a data set
divided by the number of values in the set.
1- Arithmetic mean: the mean of set of numbers(N) x1,x2 …xn denoted by x (̅ x
bar) and defined as :

Example-1: Eight measurement of the diameter of a cylinder were recorded by a

scientist as 38.8 40.9 ,39.2 , 39.7, 40.2, 39.2, 39.8 and 40.6 millimeter find the
arithmetic mean:
Solution:
mean = (38.8+40.9+39.2+39.7+40.2+39.2+39.8+40.6)/8 = 39.8
24 January 2024 Statistics - Spring semester 2023-2024
Example-2: For the data: 1, 2, 2, 3, 51. Calculate the mean. Note: n = 5 (five
observations)
Solution: ∑Xi = 1 + 2+ 2+ 3+ 51 = 59
ത 59 / 5 = 11.8
𝑋=
Here we see that the mean is affected by extreme values.
2. Weighted mean: Sometimes we associate with the numbers x1 ,x2 , …..certain
weighting factor or weight w1, w2 , ….
𝑤1 𝑥1 + 𝑤2 𝑥2 … + 𝑤𝑛 ⋅ 𝑥𝑛 σ𝑛𝑖=0 𝑤𝑖 𝑥𝑖
𝑥ҧ = = 𝑛
𝑤1 + 𝑤2 ⋯ 𝑤𝑛 σ𝑖=0 𝑤𝑖
If the numbers x1 ,x2 , …. xn Occur f1 ,f2 , ….. …fn times respectively then the means
is: 𝑛
𝑓1 𝑥1 + 𝑓2 𝑥2 … +𝑓𝑛 ⋅ 𝑥𝑛 σ𝑖=0 𝑓𝑖 𝑥𝑖
𝑥ҧ = = 𝑛
𝑓1 + 𝑓2 ⋯ 𝑓𝑛 σ𝑖=0 𝑓𝑖
Example-1: If the marks of a student were as follow : math(85),statistic(92),
computer(90) and English(88) . And the weight for these subject were 4, 2, 2, and 3
ത 88)
respectively find the weighted mean mark. (Ans. 𝑋=
Example-2: If 5, 8, 6, and 2 occur with frequencies 3 ,2 ,4 and 1. Find the mean.
ത 5.7)
(Ans. 𝑋=
24 January 2024 Statistics - Spring semester 2023-2024
The Median: The median is the middle value of the ordered data.

➢ To get the median, we must first rearrange the data into an ordered array (in
ascending or descending order). Generally, we order the data from the
lowest value to the highest value.
➢ Therefore, the median is the data value such that half of the observations
are larger and half are smaller. It is also the 50th percentile (we will be
learning about percentiles in a bit).
To find the median:
• Case 1: When the data is a simple list of numbers arranged in ascending order,
then the median is the middle value (if there is an odd number of values) or the
average of the middle two values (if there is an even number of values)

Example-1: 43,47,48, 50, 51, 51, 75 Example-2:1,2,2,2,3,3,4,6

median = 50 median = (3+2)/2 = 2.5

o The mean and the median are unique for a given set of data. There will be
exactly one mean and one median.
o Unlike the mean, the median is not affected by extreme values.

Q: What happens to the median if we change the 75 in Example-1 to 7500?

Ans: The median will still be 50.
24 January 2024 Statistics - Spring semester 2023-2024
• Case 2: When the data is arranged in a
frequency distribution then calculate the
cumulative frequency and use the following
formula:
Where:
• L1 =lower boundaries for median class.
• [Link]. =cumulative frequency of the class before
median class.
• fi =Frequency of median class.
• C=length of classes.
• The Median Class is the interval in which the middle
item (N/2) lies.
Example: Find the median for frequency distribution shown below:
Solution: N/2=50

Median
Class

24 January 2024 Statistics - Spring semester 2023-2024

The Mode: The mode is the most frequent value.

Example-1: Find the mode for following.

A: 3, 5, 2, 6, 5, 9 , 5, 2, 8, and 6 mode=5
B: 51, 48, 59, 49, and 47 No mode

Example-2: 5, 5, 5, 6, 8, 10, 10, 10.

The mode is: 5, 10.
There are two modes. This is a bi-modal dataset.

▪ The mode is different from the mean and the median in that those measures
always exist and are always unique. For any numeric data set there will be one
mean and one median.
▪ The mode may not exist. (Example-1 B)
▪ The mode may not be unique. (Example-2)

24 January 2024 Statistics - Spring semester 2023-2024

Finding mode for frequency distribution:

Where:
▪ Mode class= is the class which has largest frequencies.
▪ L1=lower boundary for mode class.
▪ D1=the difference between F of mode class and F of previous class.
▪ D2= the difference between F of mode class and F of next class.
▪ C= class length.
Example: Find the mode for frequency distribution shown below:

Solution:

24 January 2024 Statistics - Spring semester 2023-2024

Mode can also be obtained from a histogram.
• Step 1: Identify the modal class and the bar representing it.
• Step 2: Draw two cross lines as shown in the diagram.
• Step 3: Drop a perpendicular from the intersection of the two lines until it
touch the horizontal axis.
• Step 4: Read the mode from the horizontal axis.

24 January 2024 Statistics - Spring semester 2023-2024

Measures of Noncentral Tendency
➢ Measures of non-central location used to summarize a set of data.

➢ Examples of commonly used quantiles:

• Quartiles
• Quintiles
• Deciles
• Percentiles

Quartiles
• Quartiles split a set of ordered data into four parts.
• Imagine cutting a chocolate bar into four equal pieces… How many cuts
would you make? (yes, 3)
• Q1 is the First Quartile: 25% of the observations are smaller than Q1 and 75% of
the observations are larger
• Q2 is the Second Quartile: 50% of the observations are smaller than Q2 and 50%
of the observations are larger. Same as the Median. It is also the 50th percentile.
• Q3 is the Third Quartile: 75% of the observations are smaller than Q3and 25% of
the observations are larger

24 January 2024 Statistics - Spring semester 2023-2024

• A quartile, like the median, either takes the value of one of the observations, or
the value halfway between two observations.

• If n/4 is an integer, the first quartile (Q1) has the value halfway between the
(n/4)th observation and the next observation.

• If n/4 is not an integer, the first quartile has the value of the observation
whose position corresponds to the next highest integer.

n=10

24 January 2024 Statistics - Spring semester 2023-2024

Example-1: Computer Sales (n = 12 salespeople)
Original Data: 3, 10, 2, 5, 9, 8, 7, 12, 10, 0, 4, 6
Compute the mean, median, mode, quartiles.

First order the data:

0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 12
∑Xi = 76
𝑋= 76 / 12 = 6.33 computers sold
Median = 6.5 computers
Mode = 10 computers
Q1 = 3.5 computers, Q3 = 9.5 computers

24 January 2024 Statistics - Spring semester 2023-2024

Other Quantiles
• Similar to what we just learned about quartiles, where 3 quartiles split the data
into 4 equal parts,
• There are 9 deciles dividing the distribution into 10 equal portions
(tenths).
• There are four quintiles dividing the population into 5 equal portions.
• There are 99 percentiles dividing the data set into 100 equal portions.
• In all these cases, the convention is the same. The point, be it a quartile, decile,
or percentile, takes the value of one of the observations or it has a value
halfway between two adjacent observations. It is never necessary to split the
difference between two observations more finely.
➢ Percentiles are used in analyzing the results of standardized exams. For
instance, a score of 40 on a standardized test might seem like a terrible grade,
but if it is the 99th percentile, don’t worry about telling your parents.

➢ Which percentile is Q1? Q2 (the median)? Q3?

24 January 2024 Statistics - Spring semester 2023-2024

Example-2: For the following data ,
1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 8, 10
Compute the mean, median, mode, quartiles.
Solution: n=16
1 1 2 2 ┋ 2 2 3 3 ┋ 4 4 5 5 ┋ 6 7 8 10
Mean = 65/16 = 4.06
Median = 3.5
Mode = 2
Q1 = 2, Q2 = Median = 3.5, Q3 = 5.5

Example-3: Data- number of absences

0, 5, 3, 2, 1, 2, 4, 3, 1, 0, 0, 6, 12
Compute the mean, median, mode, quartiles.
Solution: First order the data (n=13):
0, 0, 0,┋ 1, 1, 2, 2, 3, 3, 4,┋ 5, 6, 12
Mean = 39/13 = 3.0 absences
Median = 2 absences
Mode = 0 absences
Q1 = 0.5 absences, Q3 = 4.5 absences

24 January 2024 Statistics - Spring semester 2023-2024

Example: for the frequency distribution table, find:
1-First and Third quartiles (Q1)
2-6th decile.
3-fortieth percentile.
𝛴𝑓
− 𝐹𝑄1−1
𝑄1 = 𝐿1 + 4 ⋅𝑐
𝑓𝑄1

L1=lower boundaries for first quartile class

FQ1-1=cumulative frequency of the class before first quartile class
fQ1=frequency of first quartile class
C=length of classes
Note: the First Quartile Class is the interval in which the first quartile item (N/4) lies.
Solution:
1- Q1:
First Quartile
𝑁 𝛴𝑓 100
= = = 25 Class
4 4 4
25 − 23
𝑄1 = 66 + ⋅ 3 = 66.143
42
24 January 2024 Statistics - Spring semester 2023-2024
2- 6th decile:
6
∗ 100 = 60
10
60 − 23 6th decile
th
6 decile = 66 + ⋅3
42 Class

6th decile = 68.643

3- 80th percentile:

80
∗ 100 = 80
100
40th
80 − 65 percentile
40th percentile = 69 + ⋅3 Class
27

40th percentile = 70.667

24 January 2024 Statistics - Spring semester 2023-2024

Measures of Dispersion
• Dispersion is the amount of spread, or variability, in a set of data.
• Why do we need to look at measures of dispersion?
• Consider this example:
A company is about to buy computer chips that must have an average life of 10
years. The company has a choice of two suppliers. Whose chips should they
buy? They take a sample of 10 chips from each of the suppliers and test them.
See the data. Supplier A chips Supplier B chips
(life in years) (life in years)
11 170
11 1
10 1

We see that supplier B’s chips have a longer 10 160

average life. 11 2
11 150
However, what if the company offers a 3-year 11 150
warranty? 11 170

Then, computers manufactured using the chips 10 2

12 140
from supplier A will have no returns while using
supplier B will result in 4/10 or 40% returns.
𝑋A = 10.8 years 𝑋𝐵 = 94.6 years
MedianA = 11 years MedianB = 145 years
sA = 0.63 years sB = 80.6 years
24 January 2024 Statistics - Spring semester 2023-2024 Range = 2 years RangeB = 169 years
A
• We will study these five measures of dispersion
▪ Range
▪ Interquartile Range
▪ Standard Deviation
▪ Variance
▪ Coefficient of Variation

▪ Range : The range is the difference between the largest and smallest
values in a set of date. (Range = Largest Value – Smallest Value)

Example: The mean of the following data in section A and B is equal to 19.
Determine the range for both sections.
Section A : 6 9 11 13 15 21 23 28 29 35
Section B: 15 16 16 17 18 19 20 21 23 25
The range for Section A = 35 - 6 = 29
The range for Section B = 25 – 15 = 10
• The range is simple to use and to explain to others.
• One problem with the range is that it is influenced by extreme values at
either end.
24 January 2024 Statistics - Spring semester 2023-2024
▪ Inter-Quartile Range (IQR): The Inter-Quartile Range is the difference between
the values of third and first quartile. (IQR = Q3 – Q1)

• Example (n = 15):
0, 0, 2, 3, 4, 7, 9, 12, 17, 18, 20, 22, 45, 56, 98
Q1 = 3, Q3 = 22
IQR = 22 – 3 = 19 (Range = 98)

• This is basically the range of the central 50% of the observations in the
distribution.
• Problem: The interquartile range does not take into account the
variability of the total data (only the central 50%). We are “throwing
out” half of the data.

24 January 2024 Statistics - Spring semester 2023-2024

▪ Standard Deviation
• The standard deviation, s, measures a kind of “average” deviation about the
mean. It is not really the “average” deviation, even though we may think of it
that way.
• Why can’t we simply compute the average deviation σ𝑛𝑖=1(𝑋𝑖 − 𝑋)
about the mean, if that’s what we want? 𝑛
• If you take a simple mean, and then add up the deviations about the mean, as
above, this sum will be equal to 0. Therefore, a measure of “average deviation”
will not work.

• Instead, we use:

• This is the “definitional formula” for standard deviation.

• The standard deviation has lots of nice properties, including:
o By squaring the deviation, we eliminate the problem of the deviations summing
to zero.
o In addition, this sum is a minimum. No other value subtracted from X and
squared will result in a smaller sum of the deviation squared. This is called the
“least squares property.”
24 January 2024 Statistics - Spring semester 2023-2024
Estimation of Variance and Standard Deviation from a Sample
When we calculate the results from a sample (i.e., a part of the population) we
do not usually know the population mean, so we must find a way to use the
sample mean, which we can calculate.

2
σ𝑁 𝑋
𝑖=1 𝑖 −𝜇
Note that σ = 𝑁
σ = populationstandard deviation

σ𝑁
𝑖=1 𝑥 − 𝑥ҧ
2
S = sample standard deviation
𝑆=
𝑁−1

σ𝑁
𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥
2
For data organized in frequency distribution
𝑆=
σ𝑁𝑖=1 𝑓𝑖 − 1 table

For a set of values where xi represents the class mid-point in frequency

distribution table.

24 January 2024 Statistics - Spring semester 2023-2024

for Faster Calculation

2 σ 𝑓𝑖 𝑥𝑖 2
σ 𝑥𝑖 σ 𝑓𝑖 𝑥𝑖2 −
σ 𝑥𝑖2 − σ 𝑓𝑖
𝑆= 𝑁 𝑆=
𝑁−1 σ 𝑓𝑖 − 1

Example: Two data sets, X and Y. Which of the two data sets has greater
variability? Calculate the standard deviation for each.

We note that both sets of data have the same mean:

𝑋=3
𝑌=3
10
SX = 4
= 1.58

80
SY = 4
= = 4.47

24 January 2024 Statistics - Spring semester 2023-2024

Variance: The variance, s2, is the standard deviation (s) squared.

Conversely, 𝑠 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒.

σ 𝑥𝑖 2
σ 𝑥𝑖2 −
or 𝑆2 = 𝑁
𝑁−1

σ 𝑓𝑖 𝑥𝑖 2
σ𝑁
𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥
2 σ 𝑓𝑖 𝑥𝑖2 −
𝑆2 = or σ 𝑓𝑖
σ𝑁 𝑆2 =
𝑖=1 𝑓𝑖 − 1 σ 𝑓𝑖 − 1

For data organized in frequency distribution table

24 January 2024 Statistics - Spring semester 2023-2024

Coefficient of Variation: A dimensionless quantity, the coefficient of variation is
the ratio between the standard deviation and the mean for the same set of data,
expressed as a percentage. This can be either (σ / μ) or (s / 𝑥ҧ ), whichever is
appropriate, multiplied by 100%.
𝜎
𝐶𝑂𝑉 = ∗ 100 (for population) 𝐶𝑂𝑉 = 𝑆ൗ ∗ 100 (for sample)
𝜇 𝑥

• The problem with s2 and s is that they are both, like the mean, in the “original”
units.
• This makes it difficult to compare the variability of two data sets that are in
different units or where the magnitude of the numbers is very different in the
two sets. For example,
• Suppose you wish to compare two stocks and one is in dollars and the other
is in ID; if you want to know which one is more volatile, you should use the
coefficient of variation.
• It is also not appropriate to compare two stocks of vastly different prices
even if both are in the same units.
• The standard deviation for a stock that sells for around $300 is going to be
different from one with a price of around $25.
• The coefficient of variation will be a better measure of dispersion in these cases
than the standard deviation.
24 January 2024 Statistics - Spring semester 2023-2024
COV is in terms of a percent. What we are in effect calculating is what percent
of the sample mean is the standard deviation. If COV is 100%, this indicates that
the sample mean is equal to the sample standard deviation. This would
demonstrate that there is a great deal of variability in the data set. 200% would
obviously be even worse.

Example: Stock Prices Stock A Stock B

JAN $1.00 $180
Which stock is more volatile? FEB 1.50 175
Closing prices over the last 8 months: MAR 1.90 182
APR .60 186
MAY 3.00 188
JUN .40 190
COVA = (1.62/1.7)x 100% = 95.3% JUL 5.00 200
AUG .20 210
COVB = (11.33/188.88)x 100% = 6.0%
Mean $1.70 $188.88
s2 2.61 128.41
s $1.62 $11.33
Answer: The standard deviation of B is higher than for A, but A is more volatile:

24 January 2024 Statistics - Spring semester 2023-2024

Example: Test Scores
Data (n=10): 0, 0, 40, 50, 50, 60, 70, 90, 100, 100
Compute the mean, median, mode, quartiles (Q1, Q2, Q3), range,
interquartile range, variance, standard deviation, and coefficient of variation.
We shall refer to all these as the descriptive (or summary) statistics for a set
of data.

Solution: First order the data:

0, 0, 40, 50, 50 ┋ 60, 70, 90, 100, 100
• Mean: ∑Xi = 560 and n = 10, so 𝑋 = 560/10 = 56.
Median = Q2 = 55
• Q1 = 40 ; Q3 = 90
• Mode = 0, 50, 100
Range = 100 – 0 = 100
• IQR = 90 – 40 = 50
• s2 = 11,840/9 = 1315.5
• s = √1315.5 = 36.27
• COV = (36.27/56) x 100% = 64.8%

24 January 2024 Statistics - Spring semester 2023-2024

Example: Compute variance and standard deviation for the
data shown in frequency distribution table.

Solution:

σ 𝑓𝑖 𝑥𝑖 2
σ 𝑓𝑖 𝑥𝑖2−
σ 𝑓𝑖
𝑆2 =
σ 𝑓𝑖 − 1

2537.52
197306 −
2
𝑆 = 33 = 68.37
33 − 1

𝑆 = 68.37 = 8.27

24 January 2024 Statistics - Spring semester 2023-2024

Measures of Shape
• A third important property of data – after location and dispersion - is its
shape.
• Shape can be described by degree of asymmetry (i.e., skewness).
• mean > median positive or right-skewness
• mean = median symmetric or zero-skewness
• mean < median negative or left-skewness

• Positive skewness can arise when the mean is increased by some unusually
high values.

positive skew
24 January 2024 Statistics - Spring semester 2023-2024
• symmetric or zero-skewness: can arise when the values of mean, median
and mode are equal. There is equal number of values on both sides of the
mean which means the values occur at regular frequencies.

symmetric or zero-skewness
• Negative skewness can arise when the mean is decreased by some unusually
low values.

negative skew
24 January 2024 Statistics - Spring semester 2023-2024
Example: The following data represent number of hours to complete a
task. Describe the shape of the data.
Data (for n=12 employees): This employee took
2 3 8 ┋ 8 9 10 ┋ 10 12 15 ┋ 18 22 63 a VERY long time!

𝑋= 180/12 = 15 hours
Median = 10 hours

The (extremely slow) employee who took 63 hours to complete the task
skewed the entire distributon to the right.

24 January 2024 Statistics - Spring semester 2023-2024

Five Number Summary
• When examining a distribution for shape, sometime the five number summary is
useful:
Smallest| Q1 | Median | Q3 | Largest
• Example: The data from previous example.
Data (for n=12 employees):
2 3 8 ┋ 8 9 10 ┋ 10 12 15 ┋ 18 22 63

𝑋 = 15, Q1=8, median=10, Q3=16.5

5-number summary: 2 | 8 | 10 | 16.5 | 63

This data is right-skewed.
In right-skewed distributions, the distance from Q3 to Xlargest (16.5 to 63) is
significantly greater than the distance from Xsmallest to Q1(2 to 8).

24 January 2024 Statistics - Spring semester 2023-2024

Boxplot: The boxplot is a way to graphically portray a distribution of data by
means of its five-number summary.
• Vertical line drawn within the box is the median
• Vertical line at the left side of box is Q1
• Vertical line at the right side of box is Q3
• Line on left connects left side of box with Xsmallest (lower 25% of data)
• Line on right connects right side of box with Xlargest (upper 25% of data)

Boxplot for previous example

Boxplot can be drawn along the horizontal or vertically.

• A “bell-shaped” symmetric data distribution would

look like this:

24 January 2024 Statistics - Spring semester 2023-2024

Scatter Plots and Correlation

▪ A scatter plot is a graph of the ordered pairs (x, y) of numbers

consisting of the independent variable x and the dependent
variable y.
• A scatter plot is used to determine the relationship exists
between the two variables.

24 January 2024 Statistics - Spring semester 2023-2024

Analyzing the Scatter Plot
1. A positive linear relationship exists when the points fall
approximately in an ascending straight line from left to right
and both the x and y values increase at the same time.
2. A negative linear relationship exists when the points fall
approximately in a descending straight line from left to right.

24 January 2024 Statistics - Spring semester 2023-2024

Analyzing the Scatter Plot
3. A nonlinear relationship exists when the points fall in a
curved line.
4. It is said that no relationship exists when there is no
discernable pattern of the points.

24 January 2024 Statistics - Spring semester 2023-2024

Problem: The table shows the depth in (cm) for foundation of large building.
construct frequency , relative , cumulative and cumulative relative frequency
distribution and find:
a) number of foundation with depth less than 70 cm.
b) percentage of foundation in a less than 70 cm.
c) number of foundations with depth more than 60 cm.
d) number of foundation with depth more than 60 cm and less than 80 cm.

Depth class in cm f.
50-55 6
55-60 4
60-65 8
65-70 8
70-75 6
75-80 6
80-85 12
total 50

24 January 2024 Statistics - Spring semester 2023-2024

Example: Draw the scatter plot for the following data of wet
bike accidents and then state the type of the relationship.

24 January 2024 Statistics - Spring semester 2023-2024

Measures of Location in Statistics
No ratings yet
Measures of Location in Statistics
21 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
64 pages
Descriptive S
No ratings yet
Descriptive S
44 pages
Exploratory Data Analysis Techniques
No ratings yet
Exploratory Data Analysis Techniques
38 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
155 pages
Central Tendency Measures Explained
No ratings yet
Central Tendency Measures Explained
67 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
72 pages
Nursing Statistics Course Overview
No ratings yet
Nursing Statistics Course Overview
49 pages
Descriptive Statistics Lecture Notes
No ratings yet
Descriptive Statistics Lecture Notes
44 pages
Central Tendency in Business Statistics
No ratings yet
Central Tendency in Business Statistics
44 pages
Central Tendency & Dispersion Explained
No ratings yet
Central Tendency & Dispersion Explained
11 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
21 pages
MMW Module 5 Part 2 Data Management
No ratings yet
MMW Module 5 Part 2 Data Management
41 pages
Understanding Statistical Measures
No ratings yet
Understanding Statistical Measures
36 pages
Central Tendency Notation Explained
No ratings yet
Central Tendency Notation Explained
9 pages
Chapter-3-Measures of Central Tendency
No ratings yet
Chapter-3-Measures of Central Tendency
14 pages
Measure of Central Tendency PDF
No ratings yet
Measure of Central Tendency PDF
28 pages
Central Tendency Measures Explained
No ratings yet
Central Tendency Measures Explained
30 pages
ECOE 1302 Spring 2017 2slide
0% (1)
ECOE 1302 Spring 2017 2slide
295 pages
Central Tendency and Variation Measures
No ratings yet
Central Tendency and Variation Measures
67 pages
Central Tendency Measures Explained
No ratings yet
Central Tendency Measures Explained
21 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
171 pages
Statistics: Measures of Central Tendency
No ratings yet
Statistics: Measures of Central Tendency
7 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
29 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
46 pages
Central Tendency and Variation Measures
No ratings yet
Central Tendency and Variation Measures
41 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
54 pages
(Measures of Location) - Lec#1 - Chapter 1 - Part1
No ratings yet
(Measures of Location) - Lec#1 - Chapter 1 - Part1
33 pages
Numerical Measures To Describe Data
No ratings yet
Numerical Measures To Describe Data
103 pages
L-4 Central Tendency - RN Biostatistics - Ji
No ratings yet
L-4 Central Tendency - RN Biostatistics - Ji
61 pages
Central Tendency and Variability Measures
No ratings yet
Central Tendency and Variability Measures
42 pages
Lecture 3
No ratings yet
Lecture 3
37 pages
Central Tendency Measures Explained
No ratings yet
Central Tendency Measures Explained
49 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
39 pages
Percentiles and Interquartile Range Analysis
No ratings yet
Percentiles and Interquartile Range Analysis
56 pages
Understanding Mean, Median, Mode
No ratings yet
Understanding Mean, Median, Mode
28 pages
Numerical Descriptive Measures Overview
No ratings yet
Numerical Descriptive Measures Overview
103 pages
Introduction to Central Tendency Measures
No ratings yet
Introduction to Central Tendency Measures
30 pages
Understanding Measures of Central Tendency
No ratings yet
Understanding Measures of Central Tendency
43 pages
Descriptive Statistics: Measures of Center
No ratings yet
Descriptive Statistics: Measures of Center
48 pages
STA 102: Measures of Location in Statistics
No ratings yet
STA 102: Measures of Location in Statistics
15 pages
Central Tendency and Dispersion Measures
No ratings yet
Central Tendency and Dispersion Measures
89 pages
Central Tendency: Mean, Median, Mode
No ratings yet
Central Tendency: Mean, Median, Mode
43 pages
Data Analysis: Central Tendency & Variation
No ratings yet
Data Analysis: Central Tendency & Variation
69 pages
Introduction to Statistical Measures
No ratings yet
Introduction to Statistical Measures
44 pages
Central Tendency and Variation Explained
No ratings yet
Central Tendency and Variation Explained
51 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
42 pages
Measures of Center: Mean, Median, Mode
No ratings yet
Measures of Center: Mean, Median, Mode
33 pages
Descriptive Statistics Overview and Measures
No ratings yet
Descriptive Statistics Overview and Measures
19 pages
Lecture 5 & 6
No ratings yet
Lecture 5 & 6
38 pages
Unit 1: Descriptive Statistics Overview
No ratings yet
Unit 1: Descriptive Statistics Overview
101 pages
Univariate Analysis: Central Tendency & Variability
No ratings yet
Univariate Analysis: Central Tendency & Variability
44 pages
Descriptive Statistics Explained
No ratings yet
Descriptive Statistics Explained
44 pages
Data Analysis: Mean, Median, Mode, Charts
No ratings yet
Data Analysis: Mean, Median, Mode, Charts
12 pages
Statistical Methods for Data Summary
No ratings yet
Statistical Methods for Data Summary
53 pages
Choosing the Best Measure of Center
No ratings yet
Choosing the Best Measure of Center
32 pages
Class1 - Descriptive Statistics
No ratings yet
Class1 - Descriptive Statistics
23 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
8 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
14 pages
101 Things I Learned in Architecture
No ratings yet
101 Things I Learned in Architecture
103 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
23 pages
Binomial Probability Distribution Explained
No ratings yet
Binomial Probability Distribution Explained
17 pages
Understanding Probability Concepts
No ratings yet
Understanding Probability Concepts
47 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
12 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
21 pages
Stock A vs. Stock B Volatility Analysis
No ratings yet
Stock A vs. Stock B Volatility Analysis
1 page
Validating Discrete Probability Tables
No ratings yet
Validating Discrete Probability Tables
17 pages
Graph Terminology and Concepts Explained
No ratings yet
Graph Terminology and Concepts Explained
7 pages
Mechanical Engineering Project Ideas
100% (1)
Mechanical Engineering Project Ideas
7 pages
Flexible Pavement Design Methods
No ratings yet
Flexible Pavement Design Methods
67 pages
Forensic Pharmacy Overview and Legislation
No ratings yet
Forensic Pharmacy Overview and Legislation
111 pages
Ebook Is Your Part Numbering Scheme Costing You Millions
No ratings yet
Ebook Is Your Part Numbering Scheme Costing You Millions
15 pages
Lemon Tree Guitar Chords and Tab
No ratings yet
Lemon Tree Guitar Chords and Tab
3 pages
Quatro Modbus Registers Overview
No ratings yet
Quatro Modbus Registers Overview
6 pages
Triangulation Cupping Guide
No ratings yet
Triangulation Cupping Guide
3 pages
S-Forty-9er QRP Kit User Manual
No ratings yet
S-Forty-9er QRP Kit User Manual
27 pages
DC OPF Model GAMS Code Overview
No ratings yet
DC OPF Model GAMS Code Overview
7 pages
Environmental Biotechnology Curriculum Guide
No ratings yet
Environmental Biotechnology Curriculum Guide
2 pages
Lenovo Yoga Slim 7 83CV003MIN Specs
No ratings yet
Lenovo Yoga Slim 7 83CV003MIN Specs
2 pages
B.Sc. Nursing Syllabus Kerala University
No ratings yet
B.Sc. Nursing Syllabus Kerala University
67 pages
Gratitude Speech for Teaching Internship
No ratings yet
Gratitude Speech for Teaching Internship
1 page
Flood Routing Techniques and Methods
No ratings yet
Flood Routing Techniques and Methods
13 pages
Microsoft Publisher Lesson Guide
No ratings yet
Microsoft Publisher Lesson Guide
16 pages
VAMP 300-series-NRJED113425EN - 12-2013-Web-260314
No ratings yet
VAMP 300-series-NRJED113425EN - 12-2013-Web-260314
12 pages
Al-Haj FAW Motors: Expansion Strategies
No ratings yet
Al-Haj FAW Motors: Expansion Strategies
10 pages
Rubal Satay Visa
No ratings yet
Rubal Satay Visa
1 page
Mechanical Measurement Basics in Engineering
No ratings yet
Mechanical Measurement Basics in Engineering
32 pages
Aee AP11 Quick Start Guide
No ratings yet
Aee AP11 Quick Start Guide
4 pages
Secflow-4: Modular Ruggedized Scada-Aware Ethernet Switch/Router
No ratings yet
Secflow-4: Modular Ruggedized Scada-Aware Ethernet Switch/Router
4 pages
Disbarment of Atty. Ariola for Fraud
No ratings yet
Disbarment of Atty. Ariola for Fraud
2 pages
Effective Planning in Management
No ratings yet
Effective Planning in Management
59 pages
Palliative Care for Dementia Patients
No ratings yet
Palliative Care for Dementia Patients
18 pages
Liduation of PVT LTD
No ratings yet
Liduation of PVT LTD
47 pages
Hypebeast Economy: Trends and Insights
No ratings yet
Hypebeast Economy: Trends and Insights
11 pages
Understanding Object-Oriented Programming
No ratings yet
Understanding Object-Oriented Programming
26 pages
AI Manual for Health Improvement Proposals
No ratings yet
AI Manual for Health Improvement Proposals
13 pages
Teacher Education: Kenya vs. Finland
100% (1)
Teacher Education: Kenya vs. Finland
10 pages

Data Set Summarization Techniques

Uploaded by

Data Set Summarization Techniques

Uploaded by

Chapter 3

Summarizing Data Sets

Statistics - Spring semester 2023-2024

- Numerical data may be summarized according to several characteristics.

Example-1: Eight measurement of the diameter of a cylinder were recorded by a

Example-1: 43,47,48, 50, 51, 51, 75 Example-2:1,2,2,2,3,3,4,6

Q: What happens to the median if we change the 75 in Example-1 to 7500?

24 January 2024 Statistics - Spring semester 2023-2024

Example-1: Find the mode for following.

Example-2: 5, 5, 5, 6, 8, 10, 10, 10.

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

➢ Examples of commonly used quantiles:

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

First order the data:

24 January 2024 Statistics - Spring semester 2023-2024

➢ Which percentile is Q1? Q2 (the median)? Q3?

24 January 2024 Statistics - Spring semester 2023-2024

Example-3: Data- number of absences

24 January 2024 Statistics - Spring semester 2023-2024

L1=lower boundaries for first quartile class

6th decile = 68.643

40th percentile = 70.667

24 January 2024 Statistics - Spring semester 2023-2024

We see that supplier B’s chips have a longer 10 160

Then, computers manufactured using the chips 10 2

24 January 2024 Statistics - Spring semester 2023-2024

• This is the “definitional formula” for standard deviation.

For a set of values where xi represents the class mid-point in frequency

24 January 2024 Statistics - Spring semester 2023-2024

We note that both sets of data have the same mean:

24 January 2024 Statistics - Spring semester 2023-2024

For data organized in frequency distribution table

24 January 2024 Statistics - Spring semester 2023-2024

Example: Stock Prices Stock A Stock B

24 January 2024 Statistics - Spring semester 2023-2024

Solution: First order the data:

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

𝑋 = 15, Q1=8, median=10, Q3=16.5

5-number summary: 2 | 8 | 10 | 16.5 | 63

24 January 2024 Statistics - Spring semester 2023-2024

Boxplot for previous example

Boxplot can be drawn along the horizontal or vertically.

• A “bell-shaped” symmetric data distribution would

24 January 2024 Statistics - Spring semester 2023-2024

▪ A scatter plot is a graph of the ordered pairs (x, y) of numbers

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

24 January 2024 Statistics - Spring semester 2023-2024

You might also like