Statistics
Statistics : collection, analysis
and interpretation of data
• Classification of data:
Source of data
Internal external
Primar Second
y ary
Classification of data
• Internal data : Internal data is information that is procured and
consolidated from different branches within your organization
• External data : External data is data that was not collected by your
organization.
• Primary : Data collected actually in the process of investigation by the
investigator (original and first hand information)
• Secondary : data which is already collected by other person
Que9 II:
• I: collecting data from government offices
• II : Collecting data from public libraries
• III: Collecting data from telephonic interviews
• Que 30 II
Presentation of data
• Raw or Ungrouped data : represented in random manner (it is not
prepared in some order)
• Grouped data : arranged in manner (frequency distribution table) like
ascending or descending
Types of frequency distribution:
• Discrete frequency: data represented in a way that exact
measurements of the units are clearly shown
• Continuous Frequency distribution: Data are arranged in classes or
groups which are not exactly measurable
Terms :
• Frequency: Number of observations falling in particular class is called
frequency
• Class Marks:
• Cumulative frequency of class intervals : sum of all frequencies of all
classes upto that class (include the frequency of that class also)
Graphical representation
• Bar diagrams: Length of bar is important, Width don’t have any
specific meaning (but width of each block and space between each
bar kept constant)
No. of vehicles
40
Number of vehicles reg-
35
25
(frequency)
20
istered
Cars Bus scooters Bikes
Type of vehicle
Pie Diagram : Used to represent relative frequency
distribution. A pie diagram consists of a circle divided into
as many sectors or classes in frequency distributions
Frequency
𝐶𝑒𝑛𝑡𝑟𝑎𝑙 𝐴𝑛𝑔𝑙𝑒=
[𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦×360
𝑇𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 ]
Bikes
35% Cars
40%
Scooters
10%
Bus
15%
Histogram : mark off all class intervals along x axis. On each class
interval erect rectangles with heights proportional to the frequency of
corresponding class interval, so that area of rectangle is
proportional to the frequency of class.
Frequency
60
50
40
Axis Title
30
20
10
0
10 to 20 20 to 30 30 to 40 40 to 50
Axis Title
Frequency Polygon
Cumulative frequency curve (Ogive)
MEASUREMENT OF
CENTRAL TENDENCY
• Arithmetic mean
• Median
• Geometric mean
• Harmonic mean
• Mode
Arithmetic mean or Mean
Case I : Individual series
• Or
• Where A= assumed mean
• di=
Example: Find the mean of 6,7,10,12,13,4,8,12.
1. Arithmetic mean or Mean
Case II: For discrete frequency distribution
• Where A= assumed mean
• di=
Find mean
Xi fi
2 2
5 8
6 10
8 7
10 8
12 5
Arithmetic mean or Mean
Case III: For Continuous frequency distribution
• Step 1 : Convert Class interval in observation by using formula –
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠+ 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠
𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙=
2
• Step 2: Repeat the procedure as that of discrete frequency
distribution.
Find mean of following distribution is
Class Frequency
interval
0-10 22
10-20 38
20-30 46
30-40 35
40-50 20
imp
imp
imp
[Link] Mean
• If two sets of observations are given then combined mean for the two
sets can be calculated by-
• Where, = combined mean
1. B Weighted arithmetic mean
• If , , , ,……….. are corresponding weight to the observations , , , ,………..
then
• Weighted arithmetic mean=
2. Median
Case I: Median of an individual series
• Step 1: arrange the given numbers in ascending order
• Step 2:
• If number of terms are odd, then median is observation
• If number of terms are even, then median is AVERAGE of and
observation
Q find the median for following data:
3,9,5,3,12,10,18,4,7,19,21
Median
Case II: Median of an discrete series
• Step 1: arrange the given numbers in ascending order (generally
arranged in ascending order)
• Step 2: Calculate cumulative frequency.
• Step 3 :
• If number of terms are odd, then median is observation
• If number of terms are even, then median is AVERAGE of and
observation
Find median
xi fi CF
3 3 3
6 4 7
9 5 12
12 2 14
13 4 18
15 5 23
21 4 27
22 3 30
Median
Case III: Median of an continuous series
• Step 1: arrange the given numbers in ascending order (generally arranged in ascending order)
• Step 2: Calculate cumulative frequency.
• Step 3: Calculate n/2 (where n = ),
• Step 4 : Select the class in which the n/2 value lies and use formula
• Where,
• l=lower limit of selected class
• f= frequency of selected class
• h= class height
• C= cumulative frequency of previous class
Find the median
Class Frequency
0-10 6
10-20 7
20-30 15
30-40 16
40-50 4
50-60 2
imp
3. Mode:
Case I: Mode of an individual series:
• Value of variable, which is repeated more than other variables of the
series.
• E.g.
• 1,2,3,4,5 (Mode= None)
• 1,2,5,4,6,7,4,4,7,8 (Mode=4)
• 1,2,5,4,6,7,4,4,7,8,7 (Mode= 4,7)
Mode:
Case II: Discrete series
• Value of variable for which the frequency is maximum
xi Fi
1 5
3 3
5 4
7 8
Mode:
Case II: Continuous series
Find the modal group, having maximum frequency, then use formula –
Where,
l= lower limit of modal group
h= size of modal group
= Frequency of modal group
= Frequency of group before to modal group
= Frequency of group next to modal group
Mode of the following distribution
Class interval Frequency
0-20 17
20-40 28
40-60 32
60-80 24
80-100 19
Miscellaneous
4. Geometric mean
Case I: individual distribution
• If , , , ,……….. are n non zero observations then
GM =
Or
GM= antilog (
Geometric mean
Case II: frequency distribution
• GM= antilog []
• If n1 and n 2 are the sizes, G1 and G2 the geometric means of two
series respectively, the geometric mean of the combined group is
defined as follows
5. Harmonic mean
Case I: individual distribution
• If , , , ,……….. are n observations then,
1
𝐻𝑀 = 𝑛
1
𝑛
∑
𝑖=1
( )
1
𝑥𝑖
Harmonic mean
Case II: frequency distribution
1
𝐻𝑀 =
( )
𝑛
1 𝑓𝑖
𝑛 ∑ 𝑥𝑖
∑
𝑖 =1
𝑓 𝑖
𝑖 =1
Q. Harmonic mean of 2, 3,4
•= =
Relation between AM,GM,HM
AM ≥ GM ≥ HM
MEASURES OF DISPERSION
• Range
• Quartile deviation
• Mean deviation
• Standard deviation
1. RANGE
• Difference between maximum and minimum observation is called as
range
• Range = Maximum observation – minimum observation
Coefficient of range=
Calculate range and coefficient of
range
• 1,2,5,4,6,7,4,4,7,8,7
2. Quartile deviation or semi-interquartile
range
• Step 1: Calculate cumulative frequency. Find value of
N/4
• Quartile deviation Q=
• Coefficient of quartile deviation =
• Where, and are first and third quartiles of the distribution
Quartile deviation of the following
data:
xi fi
2 3
3 4
4 8
5 4
6 1
3. Mean deviation
Case I: for individual series
• Mean deviation =
• Where, A= mean or median or mode
3. Mean deviation
Case II: for frequency
distribution
• Mean deviation =
• Where, A= mean or median or mode
• Coefficient of mean deviation
=
4. Standard deviation: (denoted by )
Case I : for ungrouped data
Or
4. Standard deviation: (denoted by )
Case II : for grouped data
Or
Where N=
Variance
Variance
Variance and coefficient of
dispersion
• Variance =
• Coefficient of dispersion (or coefficient of variation or Relative
Standard Deviation ) =
Some important results
Change of origin Change of scale
Mean Dependent Dependent
Median Not dependent Dependent
Mode Not dependent Dependent
Standard Deviation Not dependent Dependent
Variance Not dependent Dependent
Mode = 3 Median – 2 Mean.
Quartile deviation is less affected by extreme values of the series.
Standard deviation ≤ Range i.e., variance ≤ (Range)2.
4
Mean deviation (standard deviation)
5
Normal distribution and skewed
distribution
Normal distribution Right skewed distribution
• mean is being dragged in the direct of the skew. In these situations, the median is
generally considered to be the best representative of the central location of the
data.
Relation between Mean, Median,
Mode for different cases
• Normal Distribution
Cannot be arranged in
any particular order.
Measurement of Data Nominal
Best Central tendency –
Mode
Best measure of spread
– None
Qualitative
Can be arranged in any
particular order.
Measurement of data level
Ordinal (ranking, order, Best Central tendency –
scaling) Median, Mode
Best measure of spread
– Inter Quartile Range
Zero is arbitrary
Interval Mean, Median, Mode
Range, Variance,
standard deviation, IQR
Quantitative
Zero is not arbitrary
Ratio Mean, Median, Mode
Range, Variance,
standard deviation, IQR
Best suitable methods of central
tendency
Sr. No. Condition Best Method
1 data distribution is continuous and symmetrical (normal Mean
distribution)
2 Non skewed data (interval or ratio) Mean
3 Nominal data Mode
4 Skewed data set (interval or ratio) Median
5 data has outliers Median
6 Ordinal data Median
Some important points to be remembered
• Mean –
• For normal distribution all mean, median and mode will show identical value. But most
of the time mean is preferred over remaining two, as it includes all values of data set.
• Mean is the only measure of central tendency where the sum of the deviations of each
value from the mean is always zero.
• mean is not often one of the actual values that you have observed in your data set.
• Values at extremities affect the mean. For e.g. If runs scored by one of the player is
more in some of the matches and in other matches performance is lowest, then due to
these values at higher extremities, value of mean got affected.
• as the data becomes skewed the mean loses its ability to provide the best central
location for the data because the skewed data is dragging it away from the typical
value.