0% found this document useful (0 votes)
5 views41 pages

Topic 2 - Descriptive Statistics

The document covers descriptive statistics, focusing on tabular and graphical displays for both categorical and quantitative data. It explains methods for summarizing data, including frequency distributions, bar charts, pie charts, and histograms, as well as measures of location such as mean, median, and mode. Additionally, it provides examples and exercises to reinforce understanding of these concepts.

Uploaded by

Mustafa Hussien
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views41 pages

Topic 2 - Descriptive Statistics

The document covers descriptive statistics, focusing on tabular and graphical displays for both categorical and quantitative data. It explains methods for summarizing data, including frequency distributions, bar charts, pie charts, and histograms, as well as measures of location such as mean, median, and mode. Additionally, it provides examples and exercises to reinforce understanding of these concepts.

Uploaded by

Mustafa Hussien
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Quantitative Methods

Topic 2: Descriptive Statistics


Descriptive Statistics I: Tabular and Graphical Display
Outlines

• Recall:
• Categorical data are labels or names that identify an attribute of each element e.g.
gender, race.
• Quantitative data indicate how many or how much; always numeric e.g.
Temperature, age.

• To summarize data:
• For categorical variable:
• Frequency Distribution; Relative Frequency Distribution; Percent Frequency Distribution;
• Bar Chart, Pie Chart

• For quantitative variable:


• Frequency Distribution; Relative Frequency Distribution; Percent Frequency Distribution;
• Histogram; Cumulative Distributions
Categorical Variable: Example
Example: Marada Inn
• Guests staying at Marada Inn were asked to rate the quality of their accommodations as
being excellent, above average, average, below average, or poor.

• The ratings provided by a sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Categorical Variable: Frequency Distributions
• A frequency distribution is a tabular summary of data showing the number (frequency)
of observations in each of several non-overlapping categories or classes.

Frequency of the class


• Relative frequency of a class =
𝑛

• The percent frequency of a class is the relative frequency multiplied by 100.

Rating Frequency Relative Frequency Percent Frequency


Poor 2 .10 10
Below Average 3 .15 15
Average 5 .25 25
Above Average 9 .45 45
Excellent 1 .05 5
Total 20 1.00 100
Categorical Variable: Bar Chart
• A bar chart is a graphical display for depicting
qualitative data.

• On one axis (usually the horizontal axis), we specify Marada Inn Quality Ratings
the labels that are used for each of the classes. 10
9
8
7
• A frequency, relative frequency, or percent

frequency
6

frequency scale can be used for the other axis 5


4
(usually the vertical axis). 3
2
1

• Using a bar of fixed width drawn above each class 0


Poor Below Average Average Above Average Excellent

label, we extend the height appropriately.

• The bars are separated to emphasize the fact that


each class is a separate category.
Categorical Variable: Pie chart
• The pie chart is a commonly used to represent
relative frequency and percent frequency Percent Frequency
distributions for categorical data.
5% 10%

• First draw a circle; then use the relative


15%

45%

frequencies to subdivide the circle into sectors 25%

that correspond to the relative frequency for


each class. Poor Below Average Average Above Average Excellent

• Since there are 360 degrees in a circle, a class


with a relative frequency of .25 would
consume .25(360) = 90 degrees of the circle.
Test your understanding
• Assume only 4 classes and assume total sample Class Relative Frequency Percent
size is 50. Fill the blanks in the table
frequency frequency

• Make a bar chart with percent frequency on A .20 -- --


the Y-axis B .18 -- --
C .40 -- --
• Make a pie chart
D -- -- --
Test your understanding
• Assume only 4 classes and assume total sample Class Relative Frequency Percent
size is 50. Fill the blanks in the table
frequency frequency

• Make a bar chart with percent frequency on A .20 .20(50)=10 20


the Y-axis B .18 .18(50)=9 18
C .40 .40(50)=20 40
• Make a pie chart
D .22 .22(50)=11 22
Percent frequency
45% Percent frequency
40%
35%
30%
25% 22% 20%

20%
15% 18%
10%
40%
5%
0%
A B C D

A B C D
Excel
• Use Excel to fill in the table

• Make bar and pie charts

• Useful Excel functions: to find frequency use,


=countif(range,”criterion”)
Quantitative Variable: Example
• The manager of Hudson Auto would like to gain a better understanding of
the cost of parts used in the engine tune-ups performed in the shop.

• She examines 50 customer invoices for tune-ups.

• The costs of parts, rounded to the nearest dollar, are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Quantitative Variable: Frequency Distributions
1. Determine the number of non-
overlapping classes (usually between 5
Parts Cost ($) Frequency
and 20). 2
50-59
60-69 13
2. Determine the width of each class 70-79 16
𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒 −𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑑𝑎𝑡𝑎 𝑣𝑎𝑙𝑢𝑒
𝑐𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ = . 80-89 7
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
Say, we choose 6 classes. Then, class 90-99 7
width = (109 - 50)/6 = 9.83 10 100-109 5
Total 50

3. Determine the class limits.


Quantitative Variable: Relative & Percent Frequency
Distributions
Parts Frequency Relative Percent
Cost ($) Frequency Frequency
50- 59 2 .04=2/50 4=.04(100)
60-69 13 .26 26
70-79 16 .32 32
80-89 7 .14 14
90-99 7 .14 14
100-109 5 .10 10
Total 50 1.00 100

• Insights Gained from the Percent Frequency Distribution, for example:


• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third) of the parts costs are in the $70-79
class.
• 10% of the parts costs are $100 or more.
Histogram
• A histogram is just a bar chart with no separation between bars.
18
Tune-up Parts Cost Hudson Auto Repair
16
14
Frequency

12
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Histogram and Skewness
Symmetric
e.g. exam grades, height,
and weight

Skewed to left e.g. exam scores Skewed to right e.g. salaries


Cumulative Distributions
• Cumulative frequency: number of observations <= upper limit of each class. Last entry = total
number of observations

• Cumulative relative frequency: proportion with values <= upper limit of each class. Last entry = 1

• Cumulative percent frequency: percent with values <= upper limit of each class. Last entry = 100

Cost ($) Cumulative frequency Cumulative relative Cumulative percent


frequency frequency
<=59 2 .04 4
<=69 15=2+13 .30 = 15/50 30 = .3(100)
<=79 31 .62 62
<=89 38 .76 76
<=99 45 .90 90
<=109 50 1.00 100
Test your understanding
The sales records of a real estate company for the month of May shows the
following house prices (rounded to the nearest $1,000). Values are in thousands of
dollars: 105, 55, 45, 85, 75, 30, 60, 75, 79, [Link] 5 classes and have your first class
be 20 – 39, answer the following:

1. Show a frequency distribution

2. Show a percent frequency distribution.

3. Show a cumulative frequency distribution

4. Show a cumulative percent frequency distribution.

5. What percentage of the houses are sold at a price below $80,000?


Test your understanding
Sales price 1) 2) Percent 3) Cumulative 4) Cumulative
Frequency Frequency Frequency Percent Frequency
20-39 1 10 1 10
40-59 2 20 3 30
60-79 4 40 7 70
80-99 2 20 9 90
100-119 1 10 10 100

5) 70% of houses will be sold for less than $80k


Excel
• Frequency, and cumulative percent frequency distributions.

• Histogram.
Descriptive Statistics II: Numerical Measures
Foundation
• Recall: we summarized data using tabular and graphical presentation

• Now, we use numerical measures (location and dispersion) to summarize data

• If a measure is computed from a sample, then it is called sample statistic

• If a measure is computed from population, then it is called population parameter

• Recall: in statistical inference, we use a sample statistic as a point estimator of the


corresponding population parameter
Measures of Location: Mean
• Mean (average value)
∑𝑥𝑖 𝑥1 +𝑥2 +⋯+𝑥𝑛
• Sample mean 𝑥ҧ = =
𝑛 𝑛
∑𝑥𝑖
• Population mean 𝜇 =
𝑁

• Example: suppose the starting monthly salaries for a sample of 5


business graduates are: $3000, $2800, $3300, $3200, $2700. Find the
average monthly salary for this business school graduate.
∑𝑥𝑖 3000+2800+3300+3200+2700
• Sample mean 𝑥ҧ = = = $3000
𝑛 5
Measures of Location: Weighted Mean
• Weighted mean: “observations are not weighted equally”
∑𝑤𝑖 𝑥𝑖
• 𝑥ҧ = where 𝑤𝑖 is weight of observation 𝑖
∑𝑤𝑖

• Example: suppose purchase of raw material over the last 3 months as


follows. What is the average cost per pound of the raw material?
Purchase Cost per pound ($) Number of pounds Cost * weights

• Using the simple mean formula, the 1 3.00 1200 3600


average cost per pound is $3.07 (WRONG) 2 3.40 500 1700
• Weighted mean 3 2.80 2750 7700
∑𝑤𝑖𝑥𝑖 18500
• 𝑥ҧ = = = $2.96 4 2.90 1000 2900
∑𝑤𝑖 6250
5 3.25 800 2600
Total = 6250 18,500
Measures of Location: Weighted Mean
• Weighted mean: “observations are not weighted equally”
∑𝑤𝑖 𝑥𝑖
• 𝑥ҧ = where 𝑤𝑖 is weight of observation 𝑖
∑𝑤𝑖

• Example: suppose purchase of raw material over the last 3 months as


follows. What is the average cost per pound of the raw material?

Purchase Cost per pound ($) Number of pounds Cost * weights

• Using the simple mean formula, the 1 3.00 1200 3600


average cost per pound is $3.07 (WRONG) 2 3.40 500 1700
• Weighted mean 3 2.80 2750 7700
∑𝑤𝑖𝑥𝑖 18500
• 𝑥ҧ = = = $2.96 4 2.90 1000 2900
∑𝑤𝑖 6250
5 3.25 800 2600
Total = 6250 18,500
Test your understanding
• Example: find the mean of apartment’s rent in Bethlehem given this
sample $600, $650, $700, $700, $750, $800
• -------------------------------

• Example: find Alex’s GPA if he got 9 hrs of A, 15 hrs of B, 33 hrs of C, 3


hrs of D. Note A=4, B=3, C=2, D=1, F=0
• ---------------------------
Test your understanding
1. Find the mean of apartment’s rent in Bethlehem given
this sample $600, $650, $700, $700, $750, $800
∑𝑥𝑖 600+650+700+700+750+800
• 𝑥ҧ = = = $700
𝑛 6

2. Find Alex’s GPA if he got 9 hrs of A, 15 hrs of B, 33 hrs Grade 𝒙𝒊 Weight 𝒘𝒊


of C, 3 hrs of D. Note A=4, B=3, C=2, D=1, F=0 4(A) 9
∑𝑤𝑖 𝑥𝑖 9 4 +15 3 +33 2 +3 1 3(B) 15
• 𝑥ҧ = = = 2.5
∑𝑤𝑖 60 2(C) 33
1(D) 3
0(F) 0
60
Measures of Location: Mode and Median
• Mode: “the most frequent value”
• What is the most common car make for Lehigh’s students?
• In apartment rent example: $700

• Median
• Sort data in ascending order
• For an odd number of observations, median = middle value
• For an even number of observations, median = average of two middle value

• Example: Find the median class size for the sample of five college classes
38, 42, 46, 54, 38
• -----------------------------------
Measures of Location: Mode and Median
• Mode: “the most frequent value”
• What is the most common car make for Lehigh’s students?
• In apartment rent example: $700

• Median
• Sort data in ascending order
• For an odd number of observations, median = middle value
• For an even number of observations, median = average of two middle value

• Example: Find the median class size for the sample of five college classes
38, 42, 46, 54, 38
• Sort 38, 38, 42, 46, 54 → middle value is 42 = median
Measures of Location: Median and Percentile
• Why median may be preferred?
• Mean is sensitive to extreme value

• Salary example: $3000, $2800, $3300, $3200, $2700


• Assume highest value is $10k not $3300: $3000, $2800, $10000, $3200, $2700

• Mean was $3000, now $4340


• Median was and still $3000

• Percentile
• The pth percentile of a data set is a value such that at least p percent of the items
take on this value or less and at least (100 - p) percent of the items take on this value
or more. Example: SAT test scores
Excel
• Data: Ch. 3

• Useful functions
• Mean: =average(range)
• Median: =median(range)
• Mode: =mode(range)
• Percentile: =[Link](range, percentile)
Measures of Variability: Range
• Why we need them?

• Range = largest value – smallest


value

• Range uses only two


observations and very sensitive
to extreme values

• Given the following data, find the


range. 15, 20, 25, 25, 27, 28, 30,
34
• ---------------
Measures of Variability: Range
• Why we need them?

• Range = largest value – smallest


value

• Range uses only two


observations and very sensitive
to extreme values

• Given the following data, find the


range. 15, 20, 25, 25, 27, 28, 30,
34
• Range = 34 – 15 = 19
Measures of Variability: Variance
• Variance: utilize all the data
2 ∑ 𝑥𝑖 −𝜇 2 2 ∑ 𝑥𝑖 −𝑥ҧ 2
• Population variance 𝜎 = ; sample variance 𝑠 =
𝑁 𝑛−1

• Steps: find mean, compute deviations about the mean, square them, divide
sum of squared deviations about the mean by 𝑁 (if population) or 𝑛 − 1 (if
sample)

• Why square deviations about mean?


• The sum of deviations about the mean will always equal zero (positive deviations will
cancel out negative deviations)

• Interpretation: hard to interpret variance because of squared units


Test your understanding
• Given the following sample data, find the variance.
• 46, 54, 42, 46, 32
𝒙𝒊 ഥ
𝒙 ഥ)
(𝒙𝒊 − 𝒙 ഥ 𝟐
𝒙𝒊 − 𝒙
46 -------- -------- --------
54 -------- -------- --------
42 -------- -------- --------
46 -------- -------- --------
32 -------- -------- --------
-------- --------

∑ 𝑥𝑖 − 𝑥ҧ 2
𝑠2 = = −−−−−−−−
𝑛−1
Test your understanding
• Given the following sample data, find the variance.
• 46, 54, 42, 46, 32
𝒙𝒊 ഥ
𝒙 ഥ)
(𝒙𝒊 − 𝒙 ഥ 𝟐
𝒙𝒊 − 𝒙
46 44 2 4
54 44 10 100
42 44 -2 4
46 44 2 4
32 44 -12 144
ഥ =𝟎
∑ 𝒙𝒊 − 𝒙 ഥ
∑ 𝒙𝒊 − 𝒙 𝟐 = 𝟐𝟓𝟔

∑ 𝑥𝑖 −𝑥ҧ 2 256
• 𝑠2 = = = 64
𝑛−1 4
Measures of Variability: Standard Deviation

• Standard Deviation
• is the positive square root of the variance

• Sample 𝑠 = 𝑠 2 , population 𝜎 = 𝜎 2

• Is measured in the same units as original data

• What is the standard deviation in the previous example?


• −−−−− −
Measures of Variability: Standard Deviation

• Standard Deviation
• is the positive square root of the variance

• Sample 𝑠 = 𝑠 2 , population 𝜎 = 𝜎 2

• Is measured in the same units as original data

• What is the standard deviation in the previous example?


• 𝑠 = 𝑠2 = 8
Measures of Variability: coefficient of variation

• Coefficient of variation: how large the standard deviation relative to


the mean

𝑆𝑡𝑑 𝐷𝑒𝑣
• CV = ∗ 100%
𝑀𝑒𝑎𝑛

• In the previous example, the CV is (8/44)*100= 18.18% which means


the sample standard deviation is 18.18% of the value of the sample
mean.
Test your understanding
• The hourly wages of a sample of 130 system analysts are given as
follows. Mean = 60, range = 20, mode = 73, variance = 324, median =
74. The coefficient of variation equals
a. 0.30%. b. 30%. c. 5.4%. d. 54%.

• The standard deviation of a sample of 100 observations equals 64.


The variance of the sample equals
a. 8. b. 10. c. 6400. d. 4,096.
Test your understanding
• The hourly wages of a sample of 130 system analysts are given as
follows. Mean = 60, range = 20, mode = 73, variance = 324, median =
74. The coefficient of variation equals
a. 0.30%. b. 30%. c. 5.4%. d. 54%.

• The standard deviation of a sample of 100 observations equals 64.


The variance of the sample equals
a. 8. b. 10. c. 6400. d. 4,096.
Excel
• Data: Ch. 3

• Useful functions
• Max: =max(range)
• Min: =min(range)
• Std Dev.: =stdev.S(range)
• Variance: =var.s(range)

You might also like