0% found this document useful (0 votes)
11 views12 pages

Central Tendency and Variation Explained

The document discusses measures of central tendency, including the mean, median, and mode, highlighting their properties and appropriate usage in data analysis. It also covers measures of dispersion, explaining their importance in assessing the reliability of central tendency measures and comparing data sets. Additionally, it details various statistical calculations such as variance, standard deviation, and mean deviation, along with examples to illustrate these concepts.

Uploaded by

h34547256
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views12 pages

Central Tendency and Variation Explained

The document discusses measures of central tendency, including the mean, median, and mode, highlighting their properties and appropriate usage in data analysis. It also covers measures of dispersion, explaining their importance in assessing the reliability of central tendency measures and comparing data sets. Additionally, it details various statistical calculations such as variance, standard deviation, and mean deviation, along with examples to illustrate these concepts.

Uploaded by

h34547256
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Properties and Uses of Central Tendency

The Mean

1. The mean is found by using all the values of the data.

2. The mean varies less than the median or mode when samples are taken

from the same population and all three measures are computed for these

samples.

3. The mean is used in computing other statistics, such as the variance.

4. The mean for the data set is unique and not necessarily one of the data

values.

5. The mean cannot be computed for the data in a frequency distribution

that has an open-ended class.

6. The mean is affected by extremely high or low values, called outliers, and

may not be the appropriate average to use in these situations.

The Median

1. The median is used to find the center or middle value of a data set.

2. The median is used when it is necessary to find out whether the data

values fall into the upper half or lower half of the distribution.

3. The median is used for an open-ended distribution.

4. The median is affected less than the mean by extremely high or

extremely low values.

The Mode

1. The mode is used when the most typical case is desired.

2. The mode is the easiest average to compute.

3. The mode can be used when the data are nominal or categorical, such as

religious preference, gender, or political affiliation.

4. The mode is not always unique. A data set can have more than one mode,

or the mode may not exist for a data set.

1
CHAPTER 4
MEASURES OF VARIATION (DISPERSION)

The scatter or spread of the items of a distribution is known as a dispersion or


variation. It is the degree to which numerical data tend to spread about an
average value. Measure of central tendency is not good enough to describe the
nature of data it is better to use both central and dispersion.

Purposes of Measures of Dispersion

 To have an idea about the reliability of the measure of central tendency.

If the degree of scatterdness is large, an average is less reliable. If the value

of the dispersion is small, it indicates that a central value is a good

representative of all the values in the data set.

 To compare two or more sets of data with regard to their variability. Two

or more data sets can be compared by calculating the same measure of

dispersion having the same unit of measurement. A set with smaller value

possess less variability or is more uniform (or more consistent).

 To provide information about the structure the data. A value of a measure

of dispersion gives an idea about the spread of the observations. Further, one

can surmise about the limits of the expansion of the values in the data set.

 To pave way to the use of other statistical measures. Measures of

dispersion, especially variance and standard deviation, lead to many statistical

techniques like correlation, regression, analysis of variance.

Example 4.1: consider the following summary of marks of five students from

three section out of 10%.

section Mark mean variation


1 7,7,7,7,7 7 0
2 8,8,7,6,10 7 2.2
3 9,7,6,8,5 7 2.5

2
As shown above the mean is the same for all section but the degree of
uniformity or consistency of mark values is different as explained by measure
of dispersion.
Types of measures of variation
 Absolute measures of variation
 Relative measures of variation

Absolute measures of variation: A measure of variation is said to be an


absolute form when it shows the actual amount of variation of an item from a
measure of central tendency and are expressed in concrete units in which the
data have been expressed.

Relative measure of variation: It is the quotient obtained by dividing the


absolute measure by a quantity in respect to which absolute deviation has been
computed. Relative measure of variation is a pure number and used for making
comparisons between different distributions.

Absolute Measures Relative Measures


Range Relative Range
Quartile Deviation Coefficient of Quartile Deviation
Mean Deviation Coefficient of Mean Deviation
Variance and Standard Deviation Coefficient of Variation
Standard Scores
Range: the range is the largest score minus the smallest score. It is a quick and
the simplest measure of variability.
Ungrouped Data: R=Maximum value – Minimum value

Grouped Data: R=UCLlast class – LCLfirst class

Relative Range (RR): It is also sometimes called coefficient of range.


LS
Ungrouped data: RR=
LS
UCLlast  LCL first
Grouped data: RR=
UCLlast  LCL first

Example 4.2: consider the following distribution

23,14,,13,45,14,1,54,45,46,23,24,48,15,17,18,20,41

a) Find the range


b) Find the coefficient of range

3
Exercise 4.1: the range and the relative range of data values is 6 and 0.2
respectively find the maximum and the minimum data values
Quartile Deviation (Q.D): Sometimes known as Semi-interquartile Range (SIR)

The inter quartile range is the difference between the third and the first

quartiles of a set of items and semi-inter quartile range is half of the inter

quartile range.
Q3  Q1
Q.D=
2
It gives the average amount by which the two quartiles differ from the median.

QD involves only the middle 50% of the observations by excluding the

observations below the lower quartile and the observations above the upper

quartile.

Coefficient of Quartile Deviation (C.Q.D): It gives the proportion of the

average amount by which the two quartiles differ from the median.
Q3  Q1
C.Q.D 
Q3  Q1

Example 4.3: given, Q = 174.90, Q = 190.23, Q =203.83 find Q.D and C.Q.D
1 2 3

203.83  174.9
Q.D=
2

Q.D=14.47

Interpretation: The average distance that the lower and the upper quartiles

differ from the median is equal to 14.47 or the lower quartile is less by 14.47

and the upper quartile exceed by 14.47 on average from the median.
2Q.D
C.Q.D 
Q3  Q1
2 * 14.47
C.Q.D 
203.83  174.9
C.Q.D =0.076

4
Interpretation: The proportion of the average distance by which the two

quartiles differ from the median is 0.076 or average distance that the lower

and the upper quartiles differ from the median is 7.6% of the median itself.

Mean Deviation (M.D): it is the arithmetic mean of the absolute deviation of

each observation from the mean, median and mode. Depending up on the type of

averages used we have different mean deviations.

a. Mean deviation about the mean (M.D( X ))


~
b. Mean deviation about the median (M.D( X ))

c. Mean deviation about the mode (M.D( X̂ ))

a. Mean deviation about the mean (M.D( X )):

Ungrouped Data: M.D( X )=


| X  X |
n

Grouped Data: M,D( X ) =


f |XX|
f
~
b. Mean deviation about the median (M.D( X ))
~
~ | X  X |
Ungrouped Data: M.D( X )=
n
~
~
Grouped Data: M.D( X ) =
 f |X X |
f
c. Mean deviation about the mode (M.D( X̂ ))

Ungrouped Data: M.D( X̂ ) =


 | X  Xˆ |
n

Grouped Data: M.D( X̂ ) =


 f | X  Xˆ |
f

Examples 4.4: The following are the number of visit made by ten mothers to
the Local doctor’s surgery 8, 6, 5, 5, 7, 4, 5, 9, 7, 4
Find mean deviation about mean, median and mode.

5
Solutions:
First calculate the three averages
X =6
~
X =5.5
X̂ =5

Xi 4 4 5 5 5 6 7 7 8 9 Total

|Xi – 6| 2 2 1 1 1 0 1 1 2 3 14

|Xi - 5.5| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14

|Xi - 5| 1 1 0 0 0 1 2 2 3 4 14

M.D( X )=
| X  6 |  14  1.4
n 10

~  | X  5.5 | 14
M.D( X )=   1.4
n 10

M.D( X̂ ) =
| X  5 |  14  1.4
n 10

Coefficient of Mean Deviation (C.M.D): the coefficient of mean deviation of a


given average is the ratio of the mean deviation of an average to average about
which deviations are taken.
MD( X )
C.M.D( X )=
X
~
~ MD( X )
C.M.D( X )= ~
X

MD ( Xˆ )
C.M.D( X̂ )=

Exercise 4.2: calculate the C.M.D about the mean, median and mode for the

data in example 4.4 above.

Variance: variance is the "average squared deviation from the mean" both Variance

and Standard Deviation are the most superior and widely used measures of

dispersions and both measure the average dispersion of the observations around

the mean.

6
Population variance (  2 )

For ungrouped data  2 =


 ( X  ) 2

Where N is the population size


N

 f ( Xcm   ) 2

For grouped data  =


2
Where Xcm is the class mark
f
Sample variance (S2)

For ungrouped data S2 =


(X  X ) 2

Where n is the sample size


n 1

For grouped data S2 =


 f ( X cm  X ) 2

Wher e Xcm is the class mark


 f 1
Standard Deviation: Standard deviation is an actual average distance of each

observation from the mean; it is the positive square root of variance.

Population Standard Deviation (δ) =  2

Sample Standard Deviation (S) = S 2

If the standard deviation of the data is small the values are concentrated near

the men and if it large the values are scattered away from the mean.

Examples 4.5: Find the variance and standard deviation of the following sample

data 5, 17, 12, 10.

Solution: first find the mean

X=
 X = 5  17  12  10  11
n 5
Xi 5 17 12 10 total

(Xi- X )2 36 36 1 1 74

2
S =
(X  X ) 2


74
 24.67
n 1 3

S = S 2  24.6  4..97

7
Example 4.6: consider the following grouped frequency distribution of

weight of 75 individuals, find the standard deviation

Weight(kg) Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solution: first find mean, the class mark and the sum squared deviation from

the mean.

X =55

weight Frequency Xcm (Xcm- X )2 f*(Xcm- X )2


40-44 7 42 169 1183
45-49 10 47 64 40
50-54 22 52 9 198
55-59 15 57 4 60
60-64 12 62 49 588
65-69 6 67 144 864
70-74 3 72 289 867
 f =75  f ( X cm  X ) 2
=4400

2
S =
 f ( X cm  X ) 2


4400 4400
  59.46
 f 1 75 - 1 74

S = S 2  59.46  7.71

Interpretation

The standard deviation value S=7.71, indicates that on average each weight of

individuals is far from the mean age ( X =55kg) by 7.71kg.

8
Special properties of Standard deviations

1. If the standard deviation of X1, X2, X3 ,…...Xn is S, then the standard

deviation of

a. X1  K, X2  K, X3  K ,……..Xn  K will also be S

b. KX1, KX2, KX3 ,……..KXn is |K|S

c. b + KX1,b + KX2,b + KX3 ,……..,b + KXn is |K|S

Where: b and k are any constants

Exercise 4.3:

A. The mean and variance of n Tetracycline Capsules X1, X2, …,Xn are known

to be 12gm and 16gm respectively. New set of capsules of another drug

are obtained by the linear transformation Yi = 2Xi – 0.5 (i = 1, 2… n) then

what will be the variance and standard deviation of the new set of

capsules?

B. The mean and the standard deviation of a set of numbers are respectively

500 and 10.

i. If 10 is added to each of the numbers in the set, then what will be

the variance and standard deviation of the new set?

ii. If each of the numbers in the set are multiplied by -5, then what

will be the variance and standard deviation of the new set?

2. If the data are a sample and the distribution is normal or bell-shaped (or

close to it!) or approximately normally distributed, then

 Approximately 68% of the scores in the sample fall within one standard
deviation of the mean i.e. X  S will include approximately 68% of the
data.
 Approximately 95% of the scores in the sample fall within two standard
deviations of the mean i.e. X  2 S will include approximately 95% of the
data.
 Approximately 99.73% of the scores in the sample fall within three
standard deviations of the mean i.e. X  3S will include approximately
99.73% of the data.
9
Even if standard deviation is better than variance, there is however on

difficulty with it. If there are two or more distributions of different variables

(having different units of measurement), there variability cannot be compared

by comparing the values of the standard deviation.

Coefficient of Variation (CV): The coefficient of variation is the ratio of the

standard deviation to the mean and it is expressed as percent.

If two or more distributions differ in their units of measurement, there

variability cannot be compared by any of the absolute measure given before, but

using the coefficient of variation it is possible.


For Population CV= ×100%

S
For sample CV= ×100%
X

 The distribution having less CV is said to be less variable or more

consistent or more uniform.

 The distribution having high CV is said to be more variable or less

consistent or less uniform.

Example 4.7: An analysis of the monthly wages paid (in Birr) to workers in two

firms A and B belonging to the same industry gives the following results

Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121

In which firm A or B is there greater variability in individual wages?

10
Solution: find the CV for both firms
S S
CVA= ×100% CVB= ×100%
X X
10 11
CVA= ×100%=19.05% CVB= ×100%=23.16%
52.5 47.5

Interpretation: Since [Link] < [Link], in firm B there is greater variability in

individual wages or wages paid in firm B are not consistent there is high

variability.

Exercise 4.4:

1. A meteorologist interested in the consistency of temperatures in three

cities during a given week collected the following data. The temperatures

for the five days of the week in the three cities were

City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Which city have the most consistent temperature, based on these data?

2. Given two data sets

Data Set A: 2 Meters, 4 Meters, 6 Meters

Data Set B: 1000 Litters, 800 Litters, 900 Litters

Which data set has less variability?

Standard Scores (Z-scores): foe a given value Z-score gives the deviations

from the mean in units of standard deviation and it tell us how many standard

deviation a given value lie above or below the mean.

 If X is a measurement from a distribution with mean X and

standard deviation S, then its value in standard units is


Xi  
Z for population

Xi  X
Z for the sample
S

11
 Z gives the deviations from the mean in units of standard deviation and it

tell us how many S.D a given value lie above or below the mean.

 It also helps in hypothesis testing

 It is used to compare two observations coming from different groups.

Example 4.8: Two sections were given introduction to statistics examinations.

The following information was given.

Value Section 1 Section 2

Mean 78 90

S. D 6 5

Student A from section 1 scored 90 and student B from section 2 scored

[Link] speaking who performed better?

Solution: first find Z-score of both students


Xi   90  78
ZA   =2
 6
Xi   95  90
ZB   =1
 5

Interpretation: Student A performed better relative to his section because the

score of student A is two standard deviation above the mean score of his

section while, the score of student B is only one standard deviation above the

mean score of his section.

12

You might also like