0% found this document useful (0 votes)
20 views164 pages

Comprehensive Guide to Statistics Analysis

The document provides a comprehensive overview of statistics, including data classification into internal and external sources, and the presentation of data through various methods such as bar diagrams and pie charts. It covers measures of central tendency (mean, median, mode) and dispersion (range, standard deviation), along with their calculations for both individual and frequency distributions. Additionally, it discusses the importance of choosing appropriate measures based on data characteristics and distribution types.

Uploaded by

neerajsagora07
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views164 pages

Comprehensive Guide to Statistics Analysis

The document provides a comprehensive overview of statistics, including data classification into internal and external sources, and the presentation of data through various methods such as bar diagrams and pie charts. It covers measures of central tendency (mean, median, mode) and dispersion (range, standard deviation), along with their calculations for both individual and frequency distributions. Additionally, it discusses the importance of choosing appropriate measures based on data characteristics and distribution types.

Uploaded by

neerajsagora07
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Statistics

Statistics : collection, analysis


and interpretation of data
• Classification of data:
Source of data

Internal external

Primar Second
y ary
Classification of data
• Internal data : Internal data is information that is procured and
consolidated from different branches within your organization

• External data : External data is data that was not collected by your
organization.
• Primary : Data collected actually in the process of investigation by the
investigator (original and first hand information)

• Secondary : data which is already collected by other person


Que9 II:
• I: collecting data from government offices
• II : Collecting data from public libraries
• III: Collecting data from telephonic interviews

• Que 30 II
Presentation of data
• Raw or Ungrouped data : represented in random manner (it is not
prepared in some order)

• Grouped data : arranged in manner (frequency distribution table) like


ascending or descending
Types of frequency distribution:
• Discrete frequency: data represented in a way that exact
measurements of the units are clearly shown

• Continuous Frequency distribution: Data are arranged in classes or


groups which are not exactly measurable
Terms :
• Frequency: Number of observations falling in particular class is called
frequency

• Class Marks:

• Cumulative frequency of class intervals : sum of all frequencies of all


classes upto that class (include the frequency of that class also)
Graphical representation
• Bar diagrams: Length of bar is important, Width don’t have any
specific meaning (but width of each block and space between each
bar kept constant)
No. of vehicles

40
Number of vehicles reg-

35
25
(frequency)

20
istered

Cars Bus scooters Bikes


Type of vehicle
Pie Diagram : Used to represent relative frequency
distribution. A pie diagram consists of a circle divided into
as many sectors or classes in frequency distributions
Frequency

𝐶𝑒𝑛𝑡𝑟𝑎𝑙 𝐴𝑛𝑔𝑙𝑒=
[𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦×360
𝑇𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 ]
Bikes
35% Cars
40%

Scooters
10%
Bus
15%
Histogram : mark off all class intervals along x axis. On each class
interval erect rectangles with heights proportional to the frequency of
corresponding class interval, so that area of rectangle is
proportional to the frequency of class.

Frequency
60

50

40
Axis Title

30

20

10

0
10 to 20 20 to 30 30 to 40 40 to 50
Axis Title
Frequency Polygon
Cumulative frequency curve (Ogive)
MEASUREMENT OF
CENTRAL TENDENCY
• Arithmetic mean
• Median
• Geometric mean
• Harmonic mean
• Mode
Arithmetic mean or Mean
Case I : Individual series

• Or

• Where A= assumed mean


• di=
Example: Find the mean of 6,7,10,12,13,4,8,12.
1. Arithmetic mean or Mean
Case II: For discrete frequency distribution

• Where A= assumed mean


• di=
Find mean
Xi fi
2 2
5 8
6 10
8 7
10 8
12 5
Arithmetic mean or Mean
Case III: For Continuous frequency distribution
• Step 1 : Convert Class interval in observation by using formula –
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠+ 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠
𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙=
2

• Step 2: Repeat the procedure as that of discrete frequency

distribution.
Find mean of following distribution is
Class Frequency
interval
0-10 22
10-20 38
20-30 46
30-40 35
40-50 20
imp
imp
imp
[Link] Mean
• If two sets of observations are given then combined mean for the two
sets can be calculated by-

• Where, = combined mean


1. B Weighted arithmetic mean
• If , , , ,……….. are corresponding weight to the observations , , , ,………..
then

• Weighted arithmetic mean=


2. Median
Case I: Median of an individual series
• Step 1: arrange the given numbers in ascending order
• Step 2:
• If number of terms are odd, then median is observation

• If number of terms are even, then median is AVERAGE of and


observation
Q find the median for following data:
3,9,5,3,12,10,18,4,7,19,21
Median
Case II: Median of an discrete series
• Step 1: arrange the given numbers in ascending order (generally
arranged in ascending order)
• Step 2: Calculate cumulative frequency.
• Step 3 :
• If number of terms are odd, then median is observation

• If number of terms are even, then median is AVERAGE of and


observation
Find median
xi fi CF
3 3 3
6 4 7
9 5 12
12 2 14
13 4 18
15 5 23
21 4 27
22 3 30
Median
Case III: Median of an continuous series
• Step 1: arrange the given numbers in ascending order (generally arranged in ascending order)
• Step 2: Calculate cumulative frequency.
• Step 3: Calculate n/2 (where n = ),
• Step 4 : Select the class in which the n/2 value lies and use formula

• Where,
• l=lower limit of selected class
• f= frequency of selected class
• h= class height
• C= cumulative frequency of previous class
Find the median
Class Frequency
0-10 6
10-20 7
20-30 15
30-40 16
40-50 4
50-60 2
imp
3. Mode:
Case I: Mode of an individual series:
• Value of variable, which is repeated more than other variables of the
series.

• E.g.
• 1,2,3,4,5 (Mode= None)
• 1,2,5,4,6,7,4,4,7,8 (Mode=4)
• 1,2,5,4,6,7,4,4,7,8,7 (Mode= 4,7)
Mode:
Case II: Discrete series
• Value of variable for which the frequency is maximum

xi Fi
1 5
3 3
5 4
7 8
Mode:
Case II: Continuous series
Find the modal group, having maximum frequency, then use formula –

Where,
l= lower limit of modal group
h= size of modal group
= Frequency of modal group
= Frequency of group before to modal group
= Frequency of group next to modal group
Mode of the following distribution
Class interval Frequency
0-20 17
20-40 28
40-60 32
60-80 24
80-100 19
Miscellaneous
4. Geometric mean
Case I: individual distribution
• If , , , ,……….. are n non zero observations then
GM =

Or

GM= antilog (
Geometric mean
Case II: frequency distribution
• GM= antilog []

• If n1 and n 2 are the sizes, G1 and G2 the geometric means of two


series respectively, the geometric mean of the combined group is
defined as follows
5. Harmonic mean
Case I: individual distribution
• If , , , ,……….. are n observations then,

1
𝐻𝑀 = 𝑛
1
𝑛

𝑖=1
( )
1
𝑥𝑖
Harmonic mean
Case II: frequency distribution

1
𝐻𝑀 =

( )
𝑛
1 𝑓𝑖
𝑛 ∑ 𝑥𝑖

𝑖 =1
𝑓 𝑖
𝑖 =1
Q. Harmonic mean of 2, 3,4

•= =
Relation between AM,GM,HM

AM ≥ GM ≥ HM
MEASURES OF DISPERSION
• Range
• Quartile deviation
• Mean deviation
• Standard deviation
1. RANGE
• Difference between maximum and minimum observation is called as
range

• Range = Maximum observation – minimum observation

Coefficient of range=
Calculate range and coefficient of
range
• 1,2,5,4,6,7,4,4,7,8,7
2. Quartile deviation or semi-interquartile
range
• Step 1: Calculate cumulative frequency. Find value of
N/4

• Quartile deviation Q=

• Coefficient of quartile deviation =

• Where, and are first and third quartiles of the distribution


Quartile deviation of the following
data:
xi fi
2 3
3 4
4 8
5 4
6 1
3. Mean deviation
Case I: for individual series
• Mean deviation =

• Where, A= mean or median or mode


3. Mean deviation
Case II: for frequency
distribution
• Mean deviation =

• Where, A= mean or median or mode

• Coefficient of mean deviation


=
4. Standard deviation: (denoted by )
Case I : for ungrouped data

Or
4. Standard deviation: (denoted by )
Case II : for grouped data

Or

Where N=
Variance

Variance
Variance and coefficient of
dispersion
• Variance =

• Coefficient of dispersion (or coefficient of variation or Relative


Standard Deviation ) =
Some important results
Change of origin Change of scale
Mean Dependent Dependent
Median Not dependent Dependent
Mode Not dependent Dependent
Standard Deviation Not dependent Dependent
Variance Not dependent Dependent

Mode = 3 Median – 2 Mean.

Quartile deviation is less affected by extreme values of the series.

Standard deviation ≤ Range i.e., variance ≤ (Range)2.


4
 Mean deviation  (standard deviation)
5
Normal distribution and skewed
distribution

Normal distribution Right skewed distribution


• mean is being dragged in the direct of the skew. In these situations, the median is
generally considered to be the best representative of the central location of the
data.
Relation between Mean, Median,
Mode for different cases
• Normal Distribution
Cannot be arranged in
any particular order.

Measurement of Data Nominal


Best Central tendency –
Mode

Best measure of spread


– None
Qualitative
Can be arranged in any
particular order.

Measurement of data level


Ordinal (ranking, order, Best Central tendency –
scaling) Median, Mode

Best measure of spread


– Inter Quartile Range

Zero is arbitrary

Interval Mean, Median, Mode

Range, Variance,
standard deviation, IQR
Quantitative
Zero is not arbitrary

Ratio Mean, Median, Mode

Range, Variance,
standard deviation, IQR
Best suitable methods of central
tendency
Sr. No. Condition Best Method
1 data distribution is continuous and symmetrical (normal Mean
distribution)
2 Non skewed data (interval or ratio) Mean
3 Nominal data Mode
4 Skewed data set (interval or ratio) Median
5 data has outliers Median
6 Ordinal data Median
Some important points to be remembered
• Mean –
• For normal distribution all mean, median and mode will show identical value. But most
of the time mean is preferred over remaining two, as it includes all values of data set.

• Mean is the only measure of central tendency where the sum of the deviations of each
value from the mean is always zero.

• mean is not often one of the actual values that you have observed in your data set.

• Values at extremities affect the mean. For e.g. If runs scored by one of the player is
more in some of the matches and in other matches performance is lowest, then due to
these values at higher extremities, value of mean got affected.

• as the data becomes skewed the mean loses its ability to provide the best central
location for the data because the skewed data is dragging it away from the typical
value.

You might also like