Descriptive Statistics
dr abrar umar
1
In Last Lecture .......
⚫ What is variable?
⚫ What is data?
⚫ Data Types
2
DATA
Qualitative Quantitative
Data Data
Ordinal Data Discrete Data
Nominal Data Continuous Data
3
Descriptive Statistics
Statistical procedures used to summarise,
organise, and simplify the raw data.
⚫ Frequency tables
⚫ Graphical techniques
⚫ Measures of Central Tendency
⚫ Measures of Variation
4
FREQUENCY TABLE
⚫A frequency distribution table, shows the
Categories/classes, the frequency and percentage
with which the observations fall into each category
e.g.
Education Frequency Percentage
Illiterate 34 11%
Primary 56 18%
Middle 43 14%
Matric 67 21%
Inter 65 20%
Graduate 32 10%
Post Graduate 21 7%
Total 318 100%
5
Example: Frequency Table
Suppose we are interested in studying the number
of children in the families living in a community.
The following data has been collected based on a
random sample of n = 30 families from the
community.
2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0,
5, 8, 6, 5, 4 , 2, 4, 4, 7, 6
Organize this data in a Frequency Table!
6
No. of Count (Percentage)
Children (Freq.)
0 2 7%
1 3 10%
2 5 17%
3 5 17%
4 6 20%
5 4 13%
6 2 7%
7 2 7%
8 1 3%
7
Relative Frequency. A frequency count is a measure
of the number of times that an event occurs. To
compute relative frequency, one obtains
a frequency count for the total population and
a frequency count for a subgroup of the population.
8
9
Cumulative frequency is defined as a running total
of frequencies. The frequency of an element in a set
refers to how many of that element there are in the
set. Cumulative frequency can also defined as the
sum of all previous frequencies up to the current
point.
10
11
Graphical Techniques
⚫ Graphs have a powerful impact on the
imagination of the people
⚫ Diagrams are better retained in the
memory than statistical tables.
12
Bar charts
Bar charts are used for nominal or ordinal data.
Cigarette consumption of persons 18 years
of age or older, United States, 1900 - 1990
No. of cigarettes
Years
13
Pie chart
Pie charts can also be used to display nominal or ordinal data.
Gender distribution
14
15
Histogram
A histogram depicts a frequency distribution for quantitative data
Histogram showing distribution of Age (years)
16
Difference between Pie Chart, Bar Chart and
Histogram
⚫ Pie charts and bar charts are graphical display of
counts
⚫ A histogram is a graphical display of counts for
ranges of data values
⚫ In pie chart, the size of slice depend on number of
cases in the category.
⚫ In bar chart, the length of bar depends upon the
number of cases in the category.
17
SUMMIARIZATION
OF DATA
18
MEASURES OF CENTRAL TENDENCY
•Mean
•Median
•Mode
19
Criteria of a Satisfactory Central Value:
⚫ It should based on all observations
⚫ Quickly and easily calculated
⚫ Not unduly influenced by abnormally large or small
values
⚫ Relatively stable in repeated samples
20
MEAN
⚫ The MEAN (or arithmetic mean) is also
known as the AVERAGE. It is calculated
by totaling the results of all the
observations and dividing by the total
number of observations.
Total of all observations
Mean = ---------------------------------------
Number of Observations
21
MEAN …..
• Can only be calculated for numerical data e.g.
Measurement of height of 7 women :
141, 141, 143, 144, 145, 146, 155 cm
Total = 1015 cm
Mean = 1015/7 = 145cm
• Is affected by the extreme values e.g.
141, 141, 143, 144, 145, 146, 185 cm
✹ Mean = 1045/7 = 150 cm
22
MEDIAN
The MEDIAN is the value that divides a distribution
into two equal halves.
⚫ The median is useful when some measurements are much
bigger or much smaller than the rest. The mean of such
data will be biased toward these extreme values.
⚫ The median is not influenced by extreme values.
23
MEDIAN – cont’d ……
• Middle value if data is odd,
• Average of middle two values if data is even
FORMULA : (n+1)/2, If data is odd
eg.
Weights of 7 pregnant women (in kg):
40, 41, 42, 43, 44, 47, 72
• MEDIAN= (7+1) /2 = FOURTH VALUE i.e.
= 43 KG
24
Mean & Median
• When there are no extreme values in a
sample, the mean and the median of the
sample will be close in value.
25
MODE
• Most frequently occurring value
in a set of observations e.g.
2, 5, 7, 5, 8, 5, 3, 6, 7, 9,
26
MEASURES OF VARIATION
▪ Range
▪ Standard Deviation
27
Range:
is defined as the difference between the
highest (maximum) and the lowest (minimum)
observation e.g.
141, 141, 143, 144, 145, 146, 155 cm
Range= 155 – 141
= 14 cm
28
Standard Deviation
SAMPLE
⚫ The STANDARD DEVIATION is a measure, which
describes how much individual measurements differ,
on the average, from the mean.
⚫ A large standard deviation shows that there is a wide
scatter of measured values around the mean, while a
small standard deviation shows that the individual
values are concentrated around the mean with little
variation among them.
29
By far the most common measure of variation for numerical data in statistics is the standard deviation.
The standard deviation measures how concentrated the data are around the mean; the more concentrated, the
smaller the standard deviation.
A Worked Example
Suppose you're given the data set 1,2,2,4,6. Work through each of the steps to find thestandard deviation.
Calculate the mean of your data [Link] the mean of the data is (1+2+2+4+6)/5 = 15/5 = 3.
Subtract the mean from each of the data values and list the [Link] 3 from each of the values
1,2,2,4,6
1-3 = -2
2-3 = -1
2-3 = -1
4-3 = 1
6-3 = 3
Your list of differences is -2,-1,-1,1,3
Square each of the differences from the previous step and make a list of the [Link] need to square each of
the numbers -2,-1,-1,1,3
Your list of differences is -2,-1,-1,1,3
(-2)2 = 4
(-1)2=1
(-1)2=1
12=1
32=9
Your list of squares is 4,1,1,1,9
Add the squares from the previous step [Link] need to add 4+1+1+1+9=16
Subtract one from the number of data values you starte
30
[Link] one from the number of data values you started with.
You began this process (it may seem like awhile ago) with five data values. One less than this is 5-1 = 4.
[Link] the sum from step four by the number from step five.
The sum was 16, and the number from the previous step was 4. You divide these two numbers 16/4 = 4.
[Link] the square root of the number from the previous step. This is the standard deviation.
Your standard deviation is the square root of 4, which is 2.
Tip: It’s sometimes helpful to keep everything organized in a table, like the one shown below.
Then sum the column of the squared deviations. Next divide by one less than the number of data values, and then take the square root.
1 -2 4
2 -1 1
2 -1 1
4 1 1
6 3 9
31
Then sum the column of the squared deviations. Next divide by one less than the number of data values, and
then take the square root.
Where x represents each value in the population, is the mean value of the
sample, Σ is the summation (or total), and n-1 is the number of values in the sample
minus 1.
32
Example: Standard Deviation
POPULATION
⚫ Standard Deviation
Σ(X - μ)² = 106.55 = 5.33
n 20
33
Thank You
34