UNIT 2.
1
DATA ORGANIZATION AND PRESENTATION
DESCRIPTIVE SUMMARY STATISTICS
.Descriptive statistics: Techniques used to
organize and summarize a set of data in more
comprehensible and meaningful way.
– Organization of data
– Summarization of data
– Presentation of data
Numbers that have not been summarized and
organized are called raw data
2
Raw data
Definition
Data that have been collected or
recorded but have not been arranged or
processed yet are called raw data
3
Example1: Ages of 50 students in years
21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24
Ordered array
Ordered array : is a simplearrangement
of individual observations in the order of
magnitude
- Example: Ages of 50 students
18 19 19 21 22 23 23 25 26 31
18 19 20 21 22 23 24 25 27 33
18 19 20 21 22 23 24 25 27 34
19 19 20 21 22 23 25 25 28 37
19 19 21 21 22 23 25 26 29 37
Very difficult with large sample
7
5
Presentation of data
Data
Qualitative Data Quantitative Data
Tabular Graphical Tabula Graphical
Methods Methods r Methods
Methods
6
Frequency Distribution
Frequency distribution: is a table that summarizes a
raw data into non-overlapping classes or categories along
with their corresponding class frequency.
Class frequency: The number of observations that fall
into the class
The objective is to provide insights about the data that
cannot be quickly obtained by looking only at the original
data
Frequency Distribution for categorical 7
variables
Count the number of observations (frequency) in e a c
h category and present as relative
frequencies.
Often presented in the form of Table, Bar and
Pie charts.
Frequency Distribution for categorical 8
variables
A relative frequency distribution: Shows the proportion
of counts that fall into each class or category.
For nominal and ordinal data, frequency distributions are
often used as a summary.
The % of times that each value occurs, or the relative
frequency, is often listed.
Ta b l e s make i t easier to see how the data
a r e distributed.
Example 1: Nominal data
Table 1: Type of hospitals owned by MOH in
Ethiopia in 2006/07
Source: Health and health related indicator
9
Example 2: Ordinal data
Table 2: Levelof satisfaction, with nursing
care by 475 psychiatric in-patients, 1991
10
Frequency Distribution for numerical
variables
A frequency distribution can also show the number of
observations at different values or within certain
ranges
There are two types of frequency distribution:
– Single value (ungrouped frequency)
– Interval type (classes) – grouped frequency
11
A).Ungrouped Frequency Distribution
Ungrouped frequency distribution: Consists of a
single data with their respective frequency
Can be used when the range of values in the
data set is not large
Classes are one unit in width
12
Example:
Leisure time in hours per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23
19
21 31 16 28 19 18 12 27 15 21 25 16
Construct a frequency distribution table?
13
Leisure time Frequency
(hours)
10 1
12 1
13 1
14 2
15 2
16 3
18 2
19 4
20 2
21 3
22 2
23 3
24 2
25 1
26 1
27 2
28 2
29 1
31 1
32 1
34 1
36 1
38 1
Total 40
14
[Link] Frequency Distribution
Can be used when the range of values in the
data set is large.
The data must be grouped into classes that are
more than one unit in width.
20
16
Grouped Frequency Distribution
Steps in Constructing Frequency Distribution
Tables
Step 1: Determine the range of the data.
- R = Highest Value – Lowest Value
17
Step 2: Determine the number of classes (k) and the
corresponding width, we may use:
Where;
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
18
Step 3: For each class, count the number of
observations (class frequency)
Step 4: Determine the relative frequency for each
class
Frequency of each class interval
Relative frequency =
Total
number of
19
Grouped Frequency Distribution
The classes must be mutually
exclusive
The classes must be
continuous The class must
be equal in width
20
Example:
Leisure time (hours) per week for 40 college
students:
23 24 18 14 20 36 24 26 23 21 16 15 19 20
22 14 13 10 19 27 29 22 38 28 34 32 23
19
21 31 16 28 19 18 12 27 15 21 25 16
Maximum value = 38, Minimum value = 10 K
= 1 + 3.322 (log40) = 6.32 6 Width
21
22
Cumulative frequencies: When frequencies of two
or more classes are added
Cumulative relative frequency: The proportion of the
total number of observations that have a value less than
or equal to the upper limit of the interval
Mid-point: The value of the interval which lies
midway between the lower and the upper limits of a
class
23
True limits: Are those limits that make an
interval of a continuous variable continuous in
both directions
Used for smoothening of the class intervals
Subtract 0.5 from the lower and add it to the
upper limit
29
25
Graphical presentation
Importance of Graphical presentation:
Diagrams have greater attraction than mere figures
They give quick overall impression of the data.
They have great memorizing value than mere
figures
They facilitate comparison
Used to understand patterns and trends
Types of graphs
Categorical data
– Bar chart
– Pie-chart
Quantitative data
– Histogram
– Frequency Polygon
– Ogive
– Stem-and-leaf plot
– Box plot
– Scatter Diagram
26
Bar chart
Definition:
A graph made of bars whose heights represent the
frequencies of respective categories is called a
bar graph.
27
Bar chart
Used to display frequency contained in the
frequency distribution of categorical variable
It is used with categorical data
Each bar represent one category and its height is the
frequency or relative frequency
o y – axis: Frequency or the relative
frequency or percentage
o x – axis: Category
28
Bar chart
Rules
o
Bars should be separated
o
The gap between each bar is uniform
o
All bars should be of the same width
o
All the bars should rest on the same line called the
base
o
It is very important that Y axis begin with 0
o
Label both axes clearly
Simple bar chart
The simple bar chart is appropriate if only one
variable is to be shown
60
53.9
50
40.6
40
Percentage
30
20
10 5.5
0
First trimester Second trimester
Third trimester
Figure 1 : First ANC booking time among pregnant women in
38
Debre Berhan Town, Ethiopia, 2017
Clustered bar chart
95 9
90
0 First
85
day
80 74.
75 Second and subsquent
3
70 days
65
60
55
Percen
50
45
40
t
35
30 25.
25 7
20
15 10.
10 0
5
0
Urban Rural
Residence
Figure 2 : Timing of health care seeking
forresidence,
of U5 children by place
in Jeldu District, Ethiopia, 39
Pie-chart
A pie chart: is a circle that is divided into
sections according to the percentage of
frequencies in each category of the distribution
Used for a single categorical variable relative
frequency.
Each slice of pie correspond at
relative frequency of categories of variable.
32
Pie-chart
Steps to construct a pie-chart
Construct a frequency table
Change the frequency into percentage (P)
Change the percentages into degrees, where:
degree = Percentage X 360o
Draw a circle and divide it accordingly
33
Example
Digestive
System Others
8%
Injury and 4%
Poisoning
3%
Respiratory
system ciculatory
13% system
42%
Neoplasmas
30%
Figure 3: Distribution for cause of death for
females, in England and Wales, 1989
34
Histogram
Histograms are frequency distributions with
continuous class intervals that have been
turned into graphs
To construct a histogram, we draw the;
A) Interval b o u n d a r i e s on a h o r i z o n t a l l ine and
B)The frequencies on a vertical line
35
Histogram
In a histogram, the bars are drawn adjacent to
each other
The bars are drawn to touch each other, to show the
underlying continuity of the data
In a histogram, the area of each bar is proportional to
the frequency of observations in the interval
36
Example
Using the following frequency distribution
of the home runs hit by Major League
Baseball teams during the 2002 season,
construct the histogram
Total124
Home Runs
– 145 6 f
146 – 167 13
168 – 189 4
190 – 211 4
212 - 233 3
37
Class boundaries and their
Frequency and cumulative
frequency distributions
Total Home Cumulative
Class Boundaries Frequency
Runs frequency
124 – 145 123.5 - 145.5 6 6
146 – 167 145.5 - 167.5 13 19
168 – 189 167.5 - 189.5 4 23
190 – 211 189.5 - 211.5 4 27
212 - 233 211.5 - 233.5 3 30
Total 30
38
Histogram
• 12
• 9
Frequency
• 6
• 3
• 15
• 0
123.5 145.5 167.5 189.5 211.5 233.5
Figure 4: Total home runs hit by all players of each of
theLeague
Major 30 Baseball teams during the 2002 4
Frequency polygon
Frequency polygon: Is a graph formed by joining the
midpoints of the tops of successive bars in a
histogram with straight lines
The total area under the frequency polygon is
equal to the area under the histogram
48
Frequency polygon
15
12
9
Frequency
0
134.5 156.5 178.5 200.5 222.5
Figure 5: Total home runs hit by all players of each of
the 30League Baseball teams during the 2002
Major
49
42
Ogive
Ogive: Is a curve drawn for the
cumulative frequency distribution by joining
with straight lines the dots marked above
the upper boundaries of classes at heights
equal to the cumulative frequencies of
respective classes
43
Ogive
It is obtained as follows:
On a vertical axis we mark cumulative frequency
On a horizontal axis we mark the upper boundaries
of all classes.
However, the lower boundary of the first class will
be the starting point
Then, a smooth curve is drawn joining all these
points
44
Class boundaries and their Frequency
and cumulative frequency
distributions
Total Home Cumulative
Class Boundaries Frequency
Runs frequency
124 – 145 123.5 - 145.5 6 6
146 – 167 145.5 - 167.5 13 19
168 – 189 167.5 - 189.5 4 23
190 – 211 189.5 - 211.5 4 27
212 - 233 211.5 - 233.5 3 30
Total 30
Ogive
• 30
• 25
Cumulative frequency
• 20
• 15
• 10
•5
▫ 123.5 145.5 167.5 189.5 211.5 233.5
Major League Baseball teams during the 2002
•Figure 6: Total home runs hit by all players of each of 53
season
46
Stem-and leaf plot
⬥ Another common tool for visually
displaying continuous data is the “stem
and leaf” plot
⬥ Allows for easier identification of
individual values
in the sample
⬥ Very similar to a histogram
⬥ Are most effective with relatively small
data sets
⬥ Helps to understand the nature of data
47
Stem-and leaf plot
Can be constructed as follows:
(1)Separate each data point into a stem
component and a leaf component
The stem component consists of the
number formed by all but the rightmost
digit of the number, and the leaf
component consists of the rightmost
digit. Thus the stem of the number 483
is 48, and the leaf is 3
(2)Write the smallest stem in the data set in
the
Data of birth weights from
100 consecutive
deliveries
48
Stem Leaves
Stem-and-leaf plot for the
birth weight data
(N=100)
49
Stem-and-leaf plot can be constructed
as follows:
(3)Write the second stem, which equals the
first stem
+ 1, below the first stem
(4)Continue with step until you reach the
largest stem in the data set
(5)Draw a vertical bar to the right of the
column of stems
(6)For each number in the data set, find the
appropriate stem and write the leaf to the
right of the vertical bar
51
you
hank
T