7/19/2024
Measures of Central Tendency
Researchers are often interested in defining a value that
best describes some attribute of the population. Often
Computing and interpreting this attribute is a measure of central tendency or a
proportion. The three most commonly used measures of
measures of central tendency central tendency are mode, median and mean.
Mode Median
• The mode is the most frequently appearing value in the
population or sample.
• The median is a balance point since it splits the data into
two piles each containing half the values.
• It is the value with a highest frequency.
• To find the median, we arrange the observations in
order from smallest to largest value.
• Example: consider five women having the following
• If there is an odd number of observations, the median is
weights; 100 kg, 100 kg, 130 kg, 140 kg, and 150 kg.
the middle value.
• If there is an even number of observations, the median is
• The value with the highest frequency is 100kg the average of the two middle values.
• Thus, in the sample of five women, the median value
• Thus the mode would equal 100 kg. would be 130 kg; since 130 kg is the middle weight.
7/19/2024
Measures of Variability
Mean
• Measures of dispersion measure how spread out a set
• The sample mean is perhaps the most important of the of data is.
three measures
• They are important for describing the spread of the data,
• It represents the balance point (or centre of gravity) of a or its amount of variation around a central value
distribution
• The mean of a sample or a population is computed by • For example, consider a population of four random
adding all of the observations and dividing by the variables {5, 5 ,5, 5}. Here, each of the random variables
number of observations. are equal, so there is no variation. The set {3, 5, 5, 7}, on
the other hand, has some variation since some random
• Returning to the example of the five women, the mean variables are different.
weight would equal (100 + 100 + 130 + 140 + 150)/5 =
620/5 = 124 kg. • The three parameters that are used to quantify the
amount of variation in a set of random variables are the
range, the variance, and the standard deviation.
The range Standard deviation and variance
• The range is the simplest measure of variation.
• Defined as the difference between the largest and
smallest sample values
• Range = Maximum value - Minimum value
• Therefore, the range of the four random variables (3, 5,
5, 7} would be 7 - 3 or 4.
• As demonstrated, depends only on extreme values and
provides no information about how the remaining data is
distributed.
7/19/2024
Computing standard deviation and variance 1
Computing standard deviation and variance 2
• First find the mean and the
deviations about the mean.
x X-μ (X-μ)2 • The sum of the average squared deviations is called the
• Add up these deviations and sums of squares
find out how far on average 1 1-3= -2 4
the scores deviate from the
mean.
2 2-3= -1 1 • Divide the squared sums by the number of observation
3 3-3= 0 0 to get the average squared deviation or variance. In this
4 4-3= 1 1
example it is 10/5 = 2.
• Whenever the deviations are
added (in order to find the 5 5-3= 2 4
average of the deviations) μ=3 Σ(X-μ)= 0 Σ(X-μ)2 =10
• Root of the variance in order to get the standard
they will always sum to zero
deviation.
(Refer Table).
• The standard deviation is the average deviation about
the mean. For our example we take the square root of 2
• To avoid this, the squared
and find1.41 is the standard deviation
deviations are added to get a
measure of overall variability
in the distribution.
2
Population variance and standard
deviation computational formulae Sample variance and standard deviation
computational formulae
• σ2 = Σ ( Xi - μ )2 / N • The variance of a sample is defined by a slightly
different formula, the numerator is divide by n – 1 instead
• Population standard deviation is given by Root σ2= σ of N
2 • s2 = Σ ( xi - x )2 / ( n - 1 )
► where σ2 is the population variance,
► μ is the population mean, ►where s2 is the sample variance,
► Xi is the ith element from the population, ► x is the sample mean,
► and N is the number of elements in the population. ► xi is the ith element from the sample,
► and n is the number of elements in the sample.
• Thus by definition, the variance of a random variable is
the average squared deviation from the population mean Standard deviation s s2
7/19/2024
Measures of position 1
• Measures of position tell where a specific data value Measures of position 2
falls within the data set or its relative position in
comparison with other data values.
• In a similar way we define the quartiles as the quarter
values of the data set, deciles as the one-tenth values of
• The most common measures of position are the data set and so on.
percentiles, deciles, and quartiles.
• Quartiles, deciles and percentiles (unlike the median
• Quartiles, deciles and percentiles are just a which acts as a measure of central tendency) give us an
generalization of the median. idea about the skewness of the data set.
• Median is the middle value of the data set (assumed to
be arranged in ascending order).
Measures of position 3 Quartiles
• Standard Scores • Data (or the distribution) can be divided into FOUR parts
– A standard score or z score is used when direct and the cut points are called QUARTILES denoted by
comparison of raw scores is impossible. Q1, Q2, Q3.
– A standard score or z score for a value is obtained by ► Q1 is the same as the 25th percentile;
subtracting the mean from the value and dividing the
result by the standard deviation. ► Q2 is the same as the 50th percentile or the median;
► Q3 corresponds to the 75th percentile.
• Percentiles
– Percentiles are position measures used in For example, consider the following 15 numbers
educational and health-related fields to indicate the
position of an individual in a group. 3 6 7 11 13 22 30 40 44 50 52 61 68 80 94
– A percentile, P, is an integer between 1 and 99 such Q1 Q2 Q3
that the Pth percentile is a value where P % of the
data values are less than or equal to the value and
100 – P % of the data values are greater than or
equal to the value.
7/19/2024
Deciles
Quartiles • In a Data Set deciles are the 9 values that divide the sorted
data into 10 equal parts/groups.
• The first quartile is Q1=11 The second quartile is • Accordingly they are called the 1st, 2nd... 9th deciles (also
denoted as D1,D2 ... D9). If X1, X2, X3, .., Xn are the observed
Q2=40 (This is also the Median.) The third quartile is values and are assumed to be arranged in ascending (or
Q3=61 descending order) then the corresponding definition are as
follows:
• (To within 1 datum) One quarter of the data are below
D1 10th percentile =X(10)
Q1, two quarters below Q2, three quarters below Q3
D2 20th percentile =X(20)
D3 30th percentile =X(30)
• One quarter of the data (about) is between Q1 and Q2, D4 40th percentile =X(40)
etc. D5 50th percentile=X(50)
D6 60th percentile =X(60)
D7 70th percentile =X(70)
D8 80th percentile =X(80)
D9 90th percentile =X(90)
Deciles
• Let us consider the following example. • Outliers
• Example 1: Data Set: 1, 2, 4, 6, 7, 9, 10, 12, 14 – An outlier is an extremely high or an
extremely low data value when compared with
• Deciles: the rest of the data values.
D1 = 1, D2 = 2, D3 = 4, – Outliers can be the result of measurement or
D4 = 6, D5 = 7, D6 = 9, observational error.
D7 = 10, D8 = 12, D9 = 14 – When a distribution is normal or bell-shaped,
data values that are beyond three standard
deviations of the mean can be considered
suspected outliers.