0% found this document useful (0 votes)
10 views7 pages

Statistics Handwritten Notes

The document provides an overview of statistical measures of central tendency, including arithmetic mean, geometric mean, harmonic mean, median, and mode, along with their calculations and properties. It also discusses measures of dispersion such as range, mean deviation, standard deviation, and variance, as well as correlation types and covariance. Additionally, it highlights the effects of changes in origin and scale on these statistical measures.

Uploaded by

tiwariji.a12
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Statistics Handwritten Notes

The document provides an overview of statistical measures of central tendency, including arithmetic mean, geometric mean, harmonic mean, median, and mode, along with their calculations and properties. It also discusses measures of dispersion such as range, mean deviation, standard deviation, and variance, as well as correlation types and covariance. Additionally, it highlights the effects of changes in origin and scale on these statistical measures.

Uploaded by

tiwariji.a12
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

NDA shaurya 1.

0 2026
Statistics
Measures of Central Tendency
A single value which describes the characteristic of the entire data is known as the
average. Generally, average value of a distribution lies in the middle part of the
distribution, such type of values are known as measures of central tendency.
The following are the five measures of central tendency
1. Arithmetic Mean
2. Geometric Mean
3. Harmonic Mean
4. Median
5. Mode

Arithmetic Mean
The arithmetic mean (or simple mean) of a set of observations is obtained by dividing
the sum of the values of observations by the number of observations.
(i) Arithmetic Mean for Unclassified (Ungrouped or Raw) Data If there are n
observations, 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 , then their arithmetic mean
𝑥 +𝑥 +⋯+𝑥 ∑𝑛 𝑥
A or 𝑥‾ = 1 2 𝑛
= 𝑖=1 𝑖
𝑛 𝑛
(ii) Arithmetic Mean for Discrete Frequency Distribution or Ungrouped Frequency
Distribution Let 𝑓1 , 𝑓2 , … , 𝑓𝑛 be corresponding frequencies of 𝑥1 , 𝑥2 , … , 𝑥𝑛 . Then,
arithmetic mean
𝑥1 𝑓1 +𝑥2 𝑓2 +⋯+𝑥𝑛 𝑓𝑛 ∑𝑛
𝑖=1 𝑥𝑖 𝑓𝑖
𝐴= = ∑𝑛
𝑓1 +𝑓2 +⋯+𝑓𝑛 𝑖=1 𝑓𝑖
(iii) Arithmetic Mean for Classified (Grouped) Data or Grouped Frequency Distribution
For a classified data, we take the class marks 𝑥1 , 𝑥2 , … , 𝑥𝑛 of the classes, then
arithmetic mean by
∑𝑛
𝑖=1 𝑥𝑖 𝑓𝑖
(a) From Direct Method 𝐴 = ∑𝑛
𝑖=1 𝑓𝑖
(b) From Shortcut Method Or Deviation Method
∑𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝐴 = 𝐴1 + ( ∑𝑛
)ℎ
𝑖=1 𝑓𝑖
where, 𝐴1 = assumed mean, 𝑑𝑖 = deviation = 𝑥𝑖 − 𝐴1 ℎ = width of interval
∑𝑛
𝑖=1 𝑓𝑖 𝑢𝑖
(c) Step Deviation Method is 𝑥‾ = 𝐴1 + ∑𝑛
×ℎ
𝑖=1 𝑓𝑖

Page | 1
where, 𝐴1 = assumed mean
𝑥𝑖 −𝐴1
𝑢𝑖 = step deviation = and ℎ = width of interval.

(iv) Combined Mean If 𝐴1 , 𝐴2 , … , 𝐴𝑟 are means of 𝑛1 , 𝑛2 , … , 𝑛𝑟 observations
respectively, then arithmetic mean of the combined group is called the combined
mean of the observation
𝑛1 𝐴1 +𝑛2 𝐴2 +⋯+𝑛𝑟 𝐴𝑟 ∑𝑟𝑖=1 𝑛𝑖 𝐴𝑖
𝐴= = ∑𝑟𝑖=1 𝑛𝑖
𝑛1 +𝑛2 +⋯+𝑛𝑟

Properties of Arithmetic Mean


(i) Mean is dependent of change of origin and change of scale.
(ii) Algebraic sum of the deviations of a set of values from their arithmetic mean is
zero.
(iii) The sum of the squares of the deviations of a set of values is minimum when
taken from mean.

Geometric Mean
(i) If 𝑥1 , 𝑥2 , … , 𝑥𝑛 be n positive observations, then their geometric mean is defined as
𝐺 = 𝑛√𝑥1 𝑥2 … 𝑥𝑛
(ii) Let 𝑓1 , 𝑓2 , … , 𝑓𝑛 be the corresponding frequencies of positive observations
𝑥1 , 𝑥2 , … , 𝑥𝑛 , then geometric mean is defined as
1
𝑓 𝑓 𝑓
𝐺= (𝑥11 𝑥22 … 𝑥𝑛𝑛 )𝑁
where 𝑁 = ∑𝑛𝑖=1 𝑓𝑖

Harmonic Mean [HM]


The harmonic mean of 𝑛 non-zero observations 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined as
𝑛 𝑛
𝐻𝑀 = 1 1 1 = 1
+ +⋯+𝑥 ∑𝑛
𝑖=1 𝑥
𝑥1 𝑥2 𝑛 𝑖
If their corresponding frequencies are 𝑓1 , 𝑓2 , … , 𝑓𝑛 respectively, then
𝑓1 +𝑓2 +⋯+𝑓𝑛 ∑𝑛
𝑖=1 𝑓𝑖
𝐻𝑀 = 𝑓 𝑓 𝑓 = 𝑓𝑖
(𝑥1 +𝑥2 +⋯+𝑥𝑛 ) ∑𝑛
𝑖=1
1 2 𝑛 𝑥𝑖

Median
The median of a distribution is the value of the middle observation, when the
observations are arranged in ascending or descending order.
(i) Median for Simple Distribution or Raw Data
Firstly, arrange the data in ascending or descending order and then find the
number of observations n.
𝑛+1
(a) If 𝑛 is odd, then ( ) th term is the median.
2

Page | 2
𝑛 𝑛
(b) If 𝑛 is even, then there are two middle terms namely ( ) th and ( + 1) th
2 2
terms.
𝑛 𝑛
Hence, Median = Mean of ( ) th and ( + 1) th observations
2 2
1 𝑛 𝑛
= [( ) th + ( + 1) th] of observations
2 2 2
(ii) Median for Classified (Grouped) Data or Grouped Frequency Distribution
If in a continuous distribution, the total frequency be 𝑁, then the class whose
cumulative frequency is either equal to 𝑁/2 or is just greater than N/2 is called
median class.
For a continuous distribution, median
𝑁
−𝐶
2
𝑀𝑑 = 𝑙 + ×ℎ
𝑓
where, 𝑙 = lower limit of the median class
𝑓 = frequency of the median class
𝑁 = total frequency = ∑𝑛𝑖=1 𝑓𝑖
𝐶 = cumulative frequency of the class just before the median class
ℎ = length of the median class
Note: The intersection point of less than ogive and more than ogive is the median.

Mode
The mode (𝑀𝑂 ) of a distribution is the value at the point about which the observations
tend to be most heavily concentrated. It is generally the value of the variable which
appears to occur most frequently in the distribution.
(i) Mode for a Simple Data or Raw Data
The value which is repeated maximum number of times, is the required mode.
e.g. Mode of the data 70, 80, 90, 96, 70, 96, 96, 90 is 96 as 96 occurs maximum
number of times.
(ii) Mode for Unclassified (Ungrouped) Frequency Distribution
Mode is the value of the variate corresponding to the maximum frequency.
(iii) Mode for Classified (Grouped) Distribution or Grouped Frequency Distribution
The class having the maximum frequency is called the modal class and the middle
point of the modal class is called the crude mode.
The class just before the modal class is called pre-modal class and the class after
the modal class is called the post-modal class.
Mode for classified data (Continuous Distribution) is given by
𝑓0 −𝑓1
𝑀𝑂 = 𝑙 + ×ℎ
2𝑓0 −𝑓1 −𝑓2
where, 𝑙 = lower limit of the modal class
𝑓0 = frequency of the modal class

Page | 3
𝑓1 = frequency of the pre-modal class
𝑓2 = frequency of the post-modal class
ℎ = length of the class interval

Relation between Mean, Median and Mode


(i) Mean - Mode = 3( Mean - Median )
(ii) Mode = 3 Median -2 Mean

Measure of Dispersion
The degree to which numerical data tend to spread about an average value is called the
dispersion of the data. The four measure of dispersion are
1. Range
2. Mean deviation
3. Standard deviation

Mean Deviation [MD]


The arithmetic mean of the absolute deviations of the values of the variable from a
measure of their average (mean, median, mode) is called Mean Deviation (MD). It is
denoted by 𝛿.
∑𝑛
𝑖=1 |𝑥𝑖 −𝑥‾|
(i) For simple (raw) distribution 𝛿 =
𝑛
where, 𝑛 = number of terms, 𝑥‾ = 𝐴 or 𝑀𝑑 or 𝑀𝑂
∑𝑛
𝑖=1 𝑓𝑖 |𝑥𝑖 −𝑥‾|
(ii) For unclassified frequency distribution 𝛿 = ∑𝑛
𝑖=1 𝑓𝑖
∑𝑛
𝑖=1 𝑓𝑖 |𝑥𝑖 −𝑥‾|
(iii) For classified distribution 𝛿 = ∑𝑛
𝑖=1 𝑓𝑖
where, 𝑥𝑖 is the class mark of the interval.
Note: The mean deviation is the least when measured from the median.

Range
The difference between the highest and the lowest observation of a data is called its
range.
i.e.
Range = 𝑋max − 𝑋min

10 Standard Deviation and Variance


Standard deviation is the square root of the arithmetic mean of the squares of
deviations of the terms from their AM and it is denoted by 𝜎. The square of standard
deviation is called the variance and it is denoted by the symbol 𝜎 2 .
(i) For simple distribution

Page | 4
∑𝑛
𝑖=1 (𝑥𝑖 −𝑥‾)
2 1
𝜎=√ = √𝑛 ∑𝑛𝑖=1 𝑥𝑖2 − (∑𝑛𝑖=1 𝑥𝑖 )2
𝑛 𝑛
where, 𝑛 is a number of observations and 𝑥‾ is mean.
(ii) For discrete frequency distribution
∑𝑛
𝑖=1 𝑓(𝑥𝑖 −𝑥‾)
2 1
𝜎=√ = √𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖2 − (∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 )2
𝑁 𝑁
1
Shortcut Method 𝜎 = √𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑑𝑖2 − (∑𝑛𝑖=1 𝑓𝑖 𝑑𝑖 )2
𝑁
where, 𝑑𝑖 = deviation from assumed mean = 𝑥𝑖 − 𝐴 and 𝐴 = assumed mean
(iii) For continuous frequency distribution
∑𝑛
𝑖=1 𝑓𝑖 (𝑥𝑖 −𝑥‾)
2
𝜎=√
𝑁
where, 𝑥i is class mark of the interval.
ℎ 𝑥𝑖 −𝐴
Shortcut Method 𝜎 = √𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑢𝑖2 − (∑𝑛𝑖=1 𝑓𝑖 𝑢𝑖 )2 where, 𝑢𝑖 = ,𝐴 =
𝑁 ℎ
assumed mean and ℎ = width of the class

Correlation
The tendency of simultaneous variation between two variables is called correlation (or
covariation). It denotes the degree of interdependence between variables.

Types of Correlation
1. Perfect Correlation
If the two variables vary in such a manner that their ratio is always constant,
then the correlation is said to be perfect.
2. Positive or Direct Correlation
If an increase or decrease in one variable corresponds to an increase or decrease
in the other, then the correlation is said to be positive.
3. Negative or Indirect Correlation
If an increase or decrease in one variable corresponds to a decrease or increase in
the other, then correlation is said to be negative.

Effects of Average and Dispersion on Change of Orgin and Seale


Change or origin Change of scale
Mean Dependent Dependent
Median Dependent Dependent
Mode Dependent Dependent
Standard Deviation Not dependent Dependent
Variance Not dependent Dependent

Page | 5
Note
(i) Change origin means either subtract or add in observations.
(ii) Change of scale means either multiply or divide in observations.

Covariance
Let (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,2,3 … , 𝑛 be a bivariate distribution, where 𝑥1 , 𝑥2 , …, 𝑥𝑛 are the values
of variable 𝑥 and 𝑦1 , 𝑦2 , … , 𝑦𝑛 those of 𝑦, then the cov (𝑥, 𝑦) is given by
1
(i) cov(𝑥, 𝑦) = ∑𝑛𝑖=1 (𝑥𝑖 − 𝑥‾ )(𝑦𝑖 − 𝑦‾)
𝑛
where, 𝑥‾ and 𝑦‾ are mean of variables 𝑥 and 𝑦.
1 1 1
(ii) cov(𝑥, 𝑦) = ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − ( ∑𝑛𝑖=1 𝑥𝑖 ) ( ∑𝑛𝑖=1 𝑦𝑖 )
𝑛 𝑛 𝑛

Correlation Goefficient
The correlation coefficient 𝑟(𝑥, 𝑦) between the variables 𝑥 and 𝑦 is given
cov(𝑥,𝑦) cov(𝑥,𝑦)
𝑟(𝑥, 𝑦) = or
√var(𝑥)var(𝑦) 𝜎𝑥 𝜎𝑦

Properties of Correlation
(i) −1 ≤ 𝑟 ≤ 1
(ii) If 𝑟 = 1, then coefficient of correlation is perfectly positive.
(iii) If 𝑟 = −1, then correlation is perfectly negative.
(iv) The coefficient of correlation is independent of the change of origin and scale.
(v) Correlation coefficient has no unit and it is a pure number.
(vi) If −1 < 𝑟 < 1, it indicates the degree of linear relationship between 𝑥 and 𝑦,
whereas its sign tells about the direction of relationship.
(vii) If 𝑥 and 𝑦 are two independent variables, then 𝑟 = 0
(viii) If 𝑟 = 0, 𝑥 and 𝑦 are said to be uncorrelated. It does not imply that the two
variates are independent.

Regression
Regression helps to estimate or predict the unknown value of one variable from the
known values of the other related variables.

Lines of Regression
A line of regression is the straight line which gives the best fit in the least square sense
to the given sets of data.

Page | 6
Regression coeficient of 𝒚 on 𝒙 and 𝒙 on 𝒚
The regression coefficient shows that with a unit change in the value of 𝑥 (or 𝑦 )
variable, what will be the average change in the value of y (or 𝑥 ) variable.
It is denoted by 𝑏𝑦𝑥 (or 𝑏𝑥𝑦 ).
𝜎𝑦 cov(𝑥,𝑦)
𝑏𝑦𝑥 = 𝑟 =
𝜎𝑥 𝜎𝑥2
and
𝜎𝑥 cov(𝑥,𝑦)
𝑏𝑥𝑦 = 𝑟 =
𝜎𝑦 𝜎𝑦2

Regression Analysis
Regression Equation Regression equations are the algebraic formulation of regression
lines.
(i) Line of regression of 𝑦 on 𝑥 is
𝜎𝑦
𝑦 − 𝑦‾ = 𝑟 (𝑥 − 𝑥‾)
𝜎𝑥
(ii) Line of regression of 𝑥 on 𝑦 is
𝜎
𝑥 − 𝑥‾ = 𝑟 𝑥 (𝑦 − 𝑦‾)
𝜎𝑦
(iii) Angle between two regression lines is given by
1−𝑟 2 𝜎𝑥 𝜎𝑦 1−𝑟 2
𝜃 = tan−1 ⁡ [( )( 2 )] = tan
−1
⁡[ ]
𝑟 𝜎𝑥2 +𝜎𝑦 𝑏𝑥𝑦 +𝑏𝑦𝑥
𝜋
(a) If 𝑟 = 0, i.e. 𝜃 = , then two regression lines are perpendicular to each other to
2
each other.
(b) If 𝑟 = 1 or -1 , i.e. 𝜃 = 0, then two regression lines coincide.

Properties of the Regression Coefficients


(i) Both regression coefficients and 𝑟 have the same sign.
(ii) Coefficient of correlation is the geometric mean between the regression
coefficients.
(iii) 0 < |𝑏𝑥𝑦 𝑏𝑦𝑥 | ≤ 1, if 𝑟 ≠ 0 i.e. if |𝑏𝑥𝑦 | > 1, then |𝑏𝑦𝑥 | < 1
(iv) Regression coefficients are independent of the change of origin but not of scale.
(v) If two regression coefficient have different sign, then r = 0.
(vi) Arithmetic mean of the regression coefficients is greater than the correlation
coefficient.
𝑏𝑦𝑥 +𝑏𝑥𝑦
i.e. ≥ 𝑟.
2

Page | 7

You might also like