lOMoARcPSD|53811382
Statistics
Business Statistics (University of Kerala)
Scan to open on Studocu
Studocu is not sponsored or endorsed by any college or university
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
STATISTICS SOME IMPORTANT POINTS
F distribution is coined by George W Snedewr in Honour of Sir Ronald A
Fisher.
Chi square (non- parametric test) concept given by Karl Pearson.
Concept of normal distribution is given by De Mouire
Concept of Regression is given by Sir Francis Galton in 1877.
Concept of T- distribution is given by WS Gooset.
Data which are collected for the first time are primary data.
Data is only quantitative.
Secondary data are second hand data which are in the form of published &
unpublished.
Category of secondary data are also called paper source.
Median and mode are positional average.
Athematic mean, geometric, harmonic & weighted average mean are
mathematical average.
∑ known as capital SIGMA. Property of arithmetic mean-:
1) Sum of all observations of the given set of observations from their arithmetic mean is
zero.
2) Combined mean= n1x1 + n2x2/n1+n2
3) The sum of square of the deviations of the given set of observations is minimum when
taken from the arithmetic mean.
4) Mean is affected by both change of scale and change of origin.
5) AM> GM> HM (AM-arithmetic mean, GM- geometric mean, HM- harmonic mean)
6) Mode = 3median – 2mean
7) One dimensional 1d diagram are those which have only length. Examples are line
diagram, multiple bar diagram, compound or cluster bar diagram, sub divided bar diagram
are also called Component Bar Diagram, Percentage Bar Diagram, Deviational
Bar Diagram.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Two dimensional diagram are those in which both length and breadth are
present [Link] histogram, area diagrams, rectangles, square, circles and pie
diagram.
OGIVE CURVE and frequency polygon are also 2d diagrams.
OGIVE represent cumulative frequency and histogram frequency distribution and
frequency polygon means many angled diagrams.
3D, three dimensional diagrams are those cubes, sphere, cylinders and cuboid.
Numerical characteristics of population are parameters and sample are sample
statistics/ estimators.
Random sampling method is also called probability sampling method.
Non random sampling method is also called non-probability sampling method.
Random sampling methods are-
1) Simple random sampling method
2) Stratified random sampling method
3) Systematic sampling method
4) Cluster sampling method
5) Multistage sampling
Non random sampling methods are-
1) Convenience sampling or chunk or incidental sampling
2) Judgment sampling
3) Purposive sampling
4) Quota sampling
5) Snow ball sampling
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
The larger the sample the more accurate will be the research.
Increasing the sample size decreases the sample error.
Sample size N = 100
Principles of sampling-
1) Law of statistical regularity
2) Principle of inertia of large number
3) Principles of persistence of small numbers
4) Principle of validity
5) Principle of optimization
Type 1 error by rejecting a true null hypothesis and it is also known as
producer error, alpha or level of significance.
Type 2 error by accepting the false null hypothesis also called consumer error (1-
β) beta, power function of test or power curve, power of test.
Standard error (SE) is standard deviation of the distribution of the sample
mean. S.E = σ/√Ŷ
In a normal distribution curve, since the curve is a bell shaped &
symmetrical i.e. mean=median=mode
Total area under normal probability curve is 1 (.5 + .5)
Since curve is symmetrical co efficient of kurtosis is 3 mesocratic.
Range of distribution is ∞to ∞ but practically it is 60σ.
Point of inflexion is x= +- µσ
Leptokurtic <3, Platokurtic>, Mesocratic =3.
Area µ+-1σ= 68.27%, µ+-2σ= 95.45%, µ+- 3σ= 99.73%
Z (STANDARD NORMAL DISTRIBUTION) = X-µ/Σ
Concept of BINOMIAL DISTRIBUTION is given by JAMES
BERNOULLI.
Concept of POISSON DISTRIBUTION is given by SIMEON POISSON.
In this the value of mean and variance is (0,0)
Standard deviation is also known as root Mean Square Deviation.
Standard deviation is affected by change of scale & independent of
change of origin.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Positively skewed= mean>median>mode.
Negatively skewed= mode>median>mean.
Balance pattern= mean=median=mode.
Skewness means lack of symmetry or asymmetrical distribution.
Concept of co efficient of Skewness given by Karl Pearson.
Confidence interval 95%= 1.96, 99%= 2.56/2.58
Z test is also called standard normal variable test, standard normal
deviate test, approximation test, large-scale test.
Conditions to be applied in z test is-
1) N> 30
2) N≤30 (Standard deviation of population mean is given)
One tailed test is also known as direction test or right tailed test. F test and
chi square test is one tailed test.
Two tailed test is called left tailed test as direction is not mention.
Conditions to apply t test-
1) N≤30 standard deviation of sample mean is given.
2) To check the difference in mean.
T test is also called t distribution & student t test, exact test, small test.
Conditions to accept & reject hypothesis-:
1) Table value> calculated value= accept
2) Table value< calculated value= reject
Chi square is a non- parametric test
Chi square lies from 0 to ∞
Conditions to apply chi square test:-
1) Population mean & sample mean is not given in the question.
2) Degree of freedom (df-1) as df starts from t test.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Chi square test is also known as:-
1) Goodness of fit accumulation.
2) Contingency table.
3) Quantitative variables.
4) Co efficient of association.
Degree of freedom = (row-1) (column -1)
F test in which value of numerator is always greater than denominator.
Conditions to apply f test:-
1) Population mean, sample mean & standard deviation is not given in the
question.
2) It will talk about two mean.
3) Its value lies between 0 to ∞
Concept of correlation is given by Karl Pearson.
Correlation denotes from R and its values lies between -1 to 1
Spearman Correlation Given By Edward Spearman.
Edward Spearman Denotes Correlation From P (Rho)
P= 1- 6∑d2/n (n2-1)
If tied rank the formula will be p= 1-6∑d2/n (n2-1) + m (m2-1)/12
Karl Pearson correlation formula is cov (xy)/σx.σy.
Correlation is independent of both change of scale & origin.
Regression is affected by change of scale & independent of change of
origin.
R2 is coefficient of determination.
Coefficient values lies between 0 to 1.
R2= bxy* byx
Regression shows a causual effect i.e. cause & effect relationships.
Parametric test: - Z Test, T Test, F Test.
Non Parametric Test Or Distribution Free Test: - Sign Test, Median Test,
Mann Whitney UTest, runtest,[Link],Chi Square Test.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
PAIRED T TEST is used in management training, special coaching’s, productivity of
Crop Before & After.
Paired T Test Is Also Known As Bivariate Normal Distribution.
Nominal scale=mode used, ordinal scale= mode & median, interval & ratio scale
= mean, median, mode.
Kendall coefficient – In 1955 rank correlation co-efficient evaluate the degree
of similarity between two sets of ranks given to a same set of objects, non-
parametric test.
Friedman test= 0 to 1.
TOPIC WISE POINTS
‘Statistics‘ means numerical presentation of facts. Its meaning is
divided into two forms - in plural form and in singular form. In
plural form, ‗Statistics‘ means a collection of numerical facts or data
example price statistics, agricultural statistics, production statistics,
etc. In singular form, the word means the statistical methods with
the help of which collection, analysis and interpretation of data are
accomplished.
What is meant by ‘Data’?
Data refers to any group of measurements that happen to interest
us. These measurements provide information the decision maker
uses. Data are the foundation of any statistical investigation and the
job of collecting data is the same for a statistician as collecting stone,
mortar, cement, bricks etc. is for a builder
Statistics is a body of methods of obtaining and analyzing data in order to base
decisions on them.
Statistics refers either to quantitative information or to a method of dealing
with quantitative information.
All statistics are numerical statements of facts but all numerical statements of
facts are not statistics.
Statistics is both arts & science.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
MEASURES OF CENTRAL TENDENCY
The central tendency of a variable means a typical value around
which other values tend to concentrate; hence this value
representing the central tendency of the series is called measures of
central tendency or average. According to Clark, “Average is an
attempt to find one single figure to describe whole of figures.”
Average is defined as an attempt to find one single figure to describe
whole figure.
Average is frequently referred to as a Measure Of Central Tendency.
Measures of central value are also popularly known as measures of
central tendency because its value lies between two extreme values.
Types of average-:
Arithmetic mean – simple mean & weighted mean
1)
2) Median
3) Mode
4) Geometric mean
5) Harmonic mean
Arithmetic mean- Arithmetic Mean (X) : The most popular and
widely used measure of representing the entire data by one value is
known as arithmetic mean. Its value is obtained by adding together
all the items and by dividing this total by the number of items.
Its value is obtained by adding together all the items and by dividing this
total by the number of items.
AM= X1+X2+X3…..XN/N or∑x/n
For correcting incorrect value of ARITHMETIC MEAN is – from incorrect ∑x deduct
wrong items and add correct items and then divide the correct with nth observation.
The use of median and mode would be better in open end distributions because of the
difficulty of ascertaining lower limit & upper limit in open end distributions it is
suggested that in such distributions arithmetic mean should not be used.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Mathematical properties of Arithmetic mean-
1) The sum of the deviations of the items from the arithmetic means is always zero.
2) Mean is characterized as a point of balance i.e. the sum of the positive deviations from
it is equal to the sum of the negative deviations from it.
3) The sum of the squared deviations of the items from arithmetic mean is minimum that
is less than the sum of the squared deviations of the items from any other value.
4) Combined mean= x1+x2+x3….xn/n1+n2
5) Mean is affected by both change of scale and change of origin. As if a constant value k is
multiply in a series then effect on mean is (mk) and if the constant value is subtracted
in a series (m-k)
Median & Mode is positional average
Arithmetic mean, harmonic mean, geometric mean and weighted average mean
is a mathematical average.
Uses of mean in index number and in standardized birth & death rate.
MEDIAN- Median is that value of the variable which divides the group
into two equal parts, one part comprising all values greater than, and
the other all values less than the median.
The middle value in the distributions.
It is just the 50th percentile value below which 50% of the values in the
sample fall.
Median Is Called The Positional Average
If N is odd then median is an actual value with the remainders of the series in two
equal parts on either side of it. If N is even the median is a derived figure, half the sum
of the two middle values
Odd= middle value
Even = n+1/2th
Mathematical property of median is-
1) The sum of the deviations of the items from median, ignoring signs is the least.
Uses of median- in open end distributions, it is more satisfactory measure of the
central tendency than the mean.
Most appropriate average dealing with qualitative data.
Quartiles= 4 equal parts, deciles= 10 equal parts, percentiles= 100 equal parts.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Median can be determined by graphic method also by OGIVES.
MODE
The MODE or the modal value is that value in a series of observation which
occurs with the greatest frequency.
The mode is often said to be the value which occurs most often that is with the
highest frequency.
Mode is the value which has the greatest frequency density in its immediate
neighborhood. For this reason mode is also called the most typical or fashionable
value of distributions.
For determining mode count the number of times the various values repeat
themselves & the value occurring the maximum number of times is the modal
value.
When there are two or more values having the same maximum frequency one
cannot say which is the modal value & hence mode is said to be ill defined. Such a
series is also known as bimodal or multimodal.
Where mode is ill defined its value may be ascertained by the formula based
upon relationship between mean, median, mode, Mode= 3median-2mean. This
measure is called the empirical mode.
We can locate mode graphically using histogram and frequency polygon.
Mode is used in open end distributions/ qualitative phenomenon.
Mode is the most meaning measure of central tendency in case of highly
skewed or non- normal distribution, as it provides the best indication of the
maximum concern.
Relationship among mean, median and mode is, Mode= 3median- 2mean.
GEOMETRIC MEAN
It is defined as the nth root of the product of n items or values.
Properties of geometric mean are –
1) The product of the values of series will remain unchanged when the
value of geometric mean is substituted for each individual value.
2) The sum of the deviations of the logarithms of the original
observations above or below the logarithm of the geometric mean is
equal. This also means that the value of the geometric mean is such as to
balance the ratio deviations of the observations from it. Because of this
property this measures of central value is especially adopted to average
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
ratios, rates of change& logarithmically distributed series.
Uses- to find average percentage increase in sales, production,
population, in construction of index number.
Geometric mean is not computed when there are both negative &
positive values in a series or one more of the values is zero.
HARMONIC MEAN
The Harmonic Mean is based on the reciprocals of the numbers
averaged, it is defined as the reciprocal of the arithmetic mean of the
reciprocal of the individual observation.
HM = N/(1/x1+1/x2+1/x3….1/xn)
Uses – It is useful for computing the average rate of increase in profits
of a concern or average speed at which a journey has been performed
or the average price at which an article has been sold. The rate usually
indicates the relation between two different types of measuring units
that can be expressed reciprocally.
Weighted harmonic mean= ∑w/∑(w/x)
Relationship among the averages- AM> GM>HM
HYPOTHESIS TESTING
A statistical hypothesis is some assumption or statement which may or
may not be true, about a population or equivalently about the probability
distribution characterizing the given population which we want to test on
the basis of the evidence from a random sample.
NULL HYPOTHESIS is the hypothesis which is tested for possible
rejection under the assumption that it is true. It is denoted by Ho.
ALTERNATIVE HYPOTHESIS is any hypothesis which is complementary to the
null hypothesis. It is very important to explicitly state the alternative hypothesis is
respect of any Null Hypothesis Ho, because the acceptance or rejection of Ho is
meaningful only if it is being tested against a rival hypothesis.
We make type 1 error by rejecting a true null hypothesis. It is also called
producer error, level of significance, Alpha.
We make type 2 error by accepting a false null hypothesis. It is also called
consumer error, beta, and power function of test (1-β)
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Z- Test
It is any statistical test for which the distribution of the test for which
the distribution of the test statistics under the Null Hypothesis Can be
approximated by a normal distribution. Because of the central limit
theorem, many tests statistics are approximately normally distributed
for large sample.
Z- test is also known as Standard Normal Variable Test Or Standard Normal
Deviate Test.
Conditions for applying z-test-
1) N>30
2) N≤30, standard deviation of population mean is given.
Z= mean - µ/σ/√n. or µ/σ/ S.E (Standard Error).
Conditions for acceptance and rejection of Null Hypothesis:
1) If table value > calculated value, we accept the hypothesis.
2) If table value< calculated value, we reject the hypothesis
1.96 – 5% confidence interval.
Location tests are the most familiar Z- Test.
Z- Test is also known as Standard Normal Test, Approximate Test, and
Large Sample Test.
T- TEST
T-TEST is any statistical hypothesis test in which the test statistics
follows a student T Distribution if the Null Hypothesis is supported. It
can be used to determine if two sets of data are significantly different
from each other & is most commonly applied when the statistics would
follow a normal distribution if the value of a sampling term in the test
statistics were known.
Welch’s T –Test.
T- test is given by W S Gooset 1908.
Impaired independent sample t test.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Conditions for applying t-test are :-
1) N≤30 Standard deviation of mean is given.
2) To check the difference in mean.
T- Test is also known as Student T Distribution, Exact Test Or Small Sample
Test.
Degree of freedom start from this test df= n-1
CHI SQUARE TEST
A Chi Square Test also referred as to x2 test is any statistical hypothesis test in
which the sampling distribution of the test Statistic is a Chi Squared distribution
when the Null Hypothesis is true.
It is introduced by Karl Pearson.
Value of x2 can never be negative.
Pearson’s Chi Square Test Also Known As The Chi Square Goodness Of Fit
Test & Chi Squared Test For Independence.
Yates Correction For Continuity – To Reduce The Error In Approximation Frank
Yates Suggested A Correction For Continuity That Adjusts The Formula For Pearson
Chi Squared Test By Subtracting 0.5 From The Difference Between Each Observed
Value& Its Expected Value In A 2*2 Contingency Table.
Mean of Chi Square= v
Variance of Chi Square = v2
It is interested in dealing with more than two populations
It enable us to test whether more than two population proportions are
equal.
The chi square test distribution is known by its only parameter numbers
of degrees of freedom.
Df = ( row-1) ( column -1)
It should be noted that the chi square test only tells us whether two principles of
classification are significantly related or not & not measure of the degree or form of
relationship.
The arrangement of data according to attributes in cells is called a contingency
table.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Chi square is also known as:-
1) Goodness of fit accumulation
2) Contingency table
3) Quantitative variables
4) Co efficient of association.
Conditions for applying chi square test:-
1) Population & sample mean is not given in the question.
2) It talks about degree of freedom.
Chi square= n.s2/σ2
Chi square lies between 0 to ∞
It is a non- parametric test.
One Tailed Test- In this test direction is mentioned (< >), it is also known as
right tailed test ( f test and chi square test is one tailed test).
Two Tailed Test- In this direction is not mentioned (= ≠)it is also known as
left tailed test.
F TEST
F- Test Is Mainly Arise When The Models Have Been Shifted To The Data
Using To Least Square.
Coined By George W Snedewr In Honour Of Sir Ronald A Fisher.
(ANOVA) Analysis Of Variance.
F test never be in negative because of square and numerator is always
greater than denominator.
F test= s1 square/s2 square
Df (n1-1) (n2-1)
Conditions For Applying F- Test Is:-
1) Population mean, sample mean & standard deviation not given in the
question.
2) Will talk about two sample mean
3) Lies 0 to ∞
4) It is one tailed test
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Paired-T test
Paired-T- Test may be applied to verify the necessity of a costly management
training for its sales personnel by recording the sales of the selected trainees
before and after the management training or the validity of special coaching for a
group of educationally backward students by verifying their progress before &
after the coaching programme or the increase in productivity due to the
application of a particular kind of fertilizer by recording the productivity of a
crop before & after applying this particular fertilizers & so on.
Paired -T Test is also known as Bivariate Normal Distribution.
PARAMETRIC TEST
T- Test, F-Test, Z- Test Are Called Parametric Test.
Conditions of parametric tests are as follows: -
1) The population from which the samples have been withdrawn should be
normally distributed this is known by the term assumptions of normality
2) The variables involved must have been measured in interval or ratio’s
scale.
3) The observations must be independent the inclusions or exclusions of any case
in the sample should not unduly affect the results of the study.
4) These populations must have the same variance or in special cases must have a
known ratio of variance. This is called Homoscedasticity i.e. equal variance.
However in many cases where these above conditions are not met, it is always advisable
to make use of non -parametric test for comparing samples and to make inferences or to
test the significance or trust worthiness of the computer statistics. In other words the
use of non- parametric test is recommended in the following situations-
1) Where n is quite small
2) When assumptions like normality of the distributions of scores in the population
are doubtful. It is the characteristics of non- parametric test which enables them
to be called distribution free test.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
3) When the measurement of data is available either in the form of ordinal or
nominal scale.
Non – Parametric Test are typically simpler & easier to be carried out, there use
should be restricted to those situations in which the required conditions for using
parametric test ate met.
Non- parametric test are less powerful (less able to detect a true difference
when it exists) than parametric test in the same situations.
Non parametric tests are as follows:-
1) Sign test. 2) Median test. 3) Mann Whitney u test. 4) Run test. 5) Ks test. 6)
Chi square test.
MEASURES OF DISPERSION
Dispersion is the measure of the variation of the items.
A measure of dispersion or variation is one that measures the extent to which these
are differences between individual observations & some central or average value. In
measuring variation we shall be interested in the amount of the variation or its degree but
not in the direction.
It is important to measures the reliability.
Methods of studying dispersion are as follows-:
1) The range
2) The interquartile range or the quartile deviation
3) The mean deviation or the average deviation
4) The standard deviation or the root square mean deviation
5) The Lorenz curve
Range- It is the difference between the largest item and the smallest item.
Range= highest- lowest
Co efficient of range = highest- lowest/highest+ lowest
Inter Quartile Range Or Quartile Deviation- It represents the difference between
the third quartile and the first quartile. Inter quartile range = Q3-Q1. Quartile deviation
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
= Q3-Q1/2. Co efficient of quartile deviation= Q3-Q1/Q3+Q1.
Percentile range is also used as measure of dispersion. Percentile range=
p90-p10. Semi percentile range =p90-p10/2
Range and quartile deviation they do not show the Scatterness around as
average.
Mean deviation- The mean deviation is also known as the Average
Deviation. It is the average difference between the items in a distribution
from the median or mean of that series. It is advantage in taking the
deviations from median because the sum of the deviations of items from
median is minimum when signs are ignored. The arithmetic mean is more
frequently used in calculating the value of average deviations & this is the
reason it is also called mean deviation. Mean deviation (MD) = ∑;(d)/n,
d= (x-a), a= assumed, co efficient of mean deviation = MD/Median if taken
from median and if taken from mean the MD/Mean. The greatest
drawback of this method is that algebraic signs are ignored while taking
the deviations of the items as it makes the method non-algebraic. It is
especially effective in reports presented to the general public or to groups
not familiar with statistical methods.
Standard deviations- This concept was introduced by Karl Pearson in 1893. It is
also known as Root Mean Square Deviations for the reason that it is the square root of
the means of the squared deviations from the arithmetic mean. It is denoted by small
Greek letter σ sigma.
The standard deviations measures the absolute dispersion or variability of a
distribution, the greater the amount of dispersion or variability the greater the standard
deviation, for the greater will be the magnitude of the deviations of the values from their
mean. A small SD means a high degree of uniformity of the observations as well as
homogeneity of a series, a large standard deviation means just the opposite.
Difference between Mean deviation and standard deviation-
1) Algebraic signs are ignored while calculating mean deviation whereas in the calculation
of standard deviation signs are taken in to a/c.
2) Mean deviation can be computed either from median or mean but standard deviation is
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
always computed from arithmetic mean because the sum of the squares of the
deviations of items from arithmetic mean is least.
Population standard deviation is denoted by σ whereas sample standard
deviation is denoted by s.
Standard deviation is affected by change of scale & independent of change of
origin.
Mathematical properties of standard deviations are as follows-
1) It is possible to compute combined mean of two or more than two groups, similarly we
can also compute combined standard deviation of two or more group.
2) The standard deviation of the first n natural numbers obtained from σ= √1/12
(n2-1)
3) The sum of the squares of deviations of items in the series from their arithmetic mean
is minimum. The sum of the squares of the deviations of items of any series from a value
other than the arithmetic mean would always be greater this is the reason why
standard deviation is always computed from the arithmetic mean.
4) For symmetrical distributions
Mean +- 1σ= 68.27%, mean +- 2σ= 95.45%, mean +- 3σ= 99.73%
5) In normal distribution there is a fixed relationships between the three most commonly
used measures of dispersion. The Q.D is smallest, the MD next & SD is greatest. QD=
2/3σ, MD= 4/5σ so, QD> MD> SD
Co efficient of variation- Relative measures of SD is known as Co-Efficient Of
Variation. This measures developed by Karl Pearson. Co efficient of variation is greater is
said to be more variable or less consistent, less uniform, less stable or less homogenous. On
the other hand the series for which co efficient of variation is less is said to be less
variable or more consistent more uniform, more stable or more homogenous. It is
denoted by C.V = σ/x*100.
Variance= square of standard deviation σ2. Smaller the value of σ2 the lesser the
variability or greater the uniformity in the population.
Standard deviation is the best measure of variation.
Correcting of incorrect value of SD= SD – wrong value+ right value then divide by
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
number of observation.
LORENZ CURVE- It is devised by MAX O LORENZ. It is a graphic method of studying
dispersion. This curve was used by him for the first time to measure the distribution of
wealth & income. The most common use of this curve is the study of the degree of
inequality in the distribution of income & wealth between countries or between
different periods of time. It is a cumulative percentage curve in which the percentage of
items is combined with the percentage of other things as a wealth, profits & turnover.
As it is a graphical method, in this there is a line OP which is known as line of equal
distribution. The line OP will make an angle of 45%. For any given distribution the
curve will never cross the line of equal distribution. It will always lie below OP unless
the distribution is uniform in which case it will coincide with OP. The greater the
variability the greater is the distance of the curve from OP. Thus a measure of variability of
the distribution is provided by the distance of the curve of the cumulated percentages of
the given distribution from the line of equal distribution.
SKEWNESS, MOMENTS &KURTOSIS
When a series is not symmetrical it is said to be Asymmetrical Or Skewed.
Skewness refers to the lack of symmetry.
In a symmetrical distribution the value of mean, median & mode coincide. The
spread of the frequencies is the same on the both sides of the center point of the
curve. Mean= median= mode.
A distribution which is not symmetrical is called a skewed distribution & such a
distribution could either be positively skewed or negatively skewed.
Symmetrical distribution= mean= median= mode
Positive skewed distribution= mean> median> mode. (Skewed right)
Negatively skewed distribution- mode> median>mean (skewed left)
In a moderately symmetrical distributions the interval between the mean & the
median is approximately 1/3rd of the interval between the mean & the mode. It is this
relationship which provides means of measuring the degree of skewness.
Dispersion is concerned with the amount of variation rather than with its direction.
Skewness tell us about the direction of the variation or the departure from
symmetry.
Measures of Skewness are dependent upon the amount of dispersion.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Skewness is present if-
1) Mean, median, Mode do not coincide.
2) When data are plotted on a graph they do not give the normal bell shaped.
3) The sum of positive deviations from median is not equal to the sum of the
negative deviations.
4) Quartiles are not equidistant from the median.
Frequencies are not equally distributed at points of equal deviations from
mode.
Measures of Skewness- It tell us the direction & extent of asymmetry in a series.
Absolute measure of skewness and relative measure of skewness.
Skewness can be measured in absolute terms by taking the difference between mean &
mode in same unit. Absolute Skewness= x-mode, x>mode= positive skewed , x< mode=
negative skewed
If the absolute differences were expressed in relation to some measure of the spread
of values in their respective distributions the measures would be relative.
Four important measures of relative skewness are as follows-
1) Karl Pearson coefficient of skewness
2) The bowley’s coefficient of skewness
3) The Kelley’s coefficient of skewness.
4) Measure of skewness based on moments
1) Karl Pearson co efficient of skewness- also known as PEARSONIAN COEFFICIENT.
Developed by Karl Pearson. It is based upon the difference between mean & mode &
is divided by standard deviation to give a relative measure. Skp=mean- mode/sd
Mean= median= mode- coefficient of Skewness is 0
Mean> median> mode- positive coefficient of skewness
Mean<median<mode – negative coefficient of skewness
Moderately skewed distribution- mode= 3median-2mean
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
2) Bowley’s coefficient of skewness- Prepared By Bowley’s, it is based on quartiles,
quartiles measures of skewness. Skb= q3+q1- 2median/q3-q1. Its value lies between -1
to 1. It is useful in open end distributions and extreme values. Bowley’s measure values is
limited between -1 and 1 while Pearson measures as no such limit.
3) Kelley’s coefficient of skewness-It is based on the formula for measuring skewness
that is based upon the 10th deciles & 90th percentiles. Percentiles SKk= p10+p90-
2median/p90-p10. Deciles skk= d1+d9-2median/d9-d1
MOMENTS
Moment is refers to the measure of force with respect to its tendency to provide
rotation. The strength of tendency depends on the amount of force and the distance
from the origin of the point at which the force is exerted.
Moment= ∑fx/n , f= force, x= distance.
KURTOSIS
It is a Greek word means bulginess.
It refers to the degree of flatness or Peakedness in the region about the mode
of a frequency curve.
If a curve is more peaked than a normal curve- leptokurtic
If a curve is more flat topped than the normal curve- Platykurtic
The normal curve-Mesokurtic
The condition of Peakedness or flat Toppedness itself is known as
kurtosis or excess.
THEORETICAL DISTRIBUTIONS:-
BINOMIAL DISTRIBUTIONS
It is also known as Bernoulli distribution.
Developed by Jacob Bernoulli.
Binomial distribution is probably distribution expressing the probability of one set of
dichotomous alternatives i.e. success or failure
The mean of Binomial distribution is np & standard deviation √npq (p= success,
q= failure)
Mean of binomial distribution = np
Standard deviation of binomial distribution= √npq
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Variance of binomial distribution= npq
POSSION DISTRIBUTION
It is a discrete probability distribution & is very widely used in statistical work.
Originated by Simeon Denis Poisson.
It deals with counting the number of occurrence of a particular event in a specific
time interval or region or space.
Mean of poisson distribution = m
Standard deviation of poisson distribution=√m or µ2= m
Mean and variance is (o,o)
NORMAL DISTRIBUTION
The normal distribution also called the Normal Probability Distribution happens
to be most useful theoretical distribution for continuous variables.
It was first discovered by De Moivre. It was also known to be Laplace, it has
been credited to Karl Gauss.
The normal distribution is also known as Gaussian distribution (Gaussian
law of error).
Topography of Normal distribution is given by W J YODEN .
The type of random variable which can take an infinite number of values is called a
continuous random variable & the probability distribution of such a variable is called
continuous probability distribution.
Normal distribution is one of the versatile continuous probability distribution.
Properties of Normal distribution are as follows-
1) The normal curve is symmetrical about the mean (Skewness=0). If the curve
were folded along its vertical axis the two halves would coincide. The number of
cases below the mean in a normal distribution is equal to the number of cases
above themean, which makes the mean and median coincide.
2) The height of normal curve is at its maximum at the mean. Hence the mean &
mode of the normal distribution coincide. Thus mean, median & mode all are
equal.
3) There is one maximum point of the normal curve which occurs at the mean. The
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
height of the curve declines as we go in either direction from the mean. The curve
approaches nearer and nearer to the base but it never touches it i.e. the curve is
asymptotic to the base on either side. Hence its range is unlimited or infinite in
both directions.
4) There is only one maximum point, the normal curve is unimodal i.e. it has
only one mode.
5) The points of inflexion i.e. the point where the change in curvature occurs
are x+-σ
6) The first and third quartiles are equidistant from the median.
7) The mean deviation is 4/5th or more precisely 0.7979 of the standard
deviation.
8) Area under normal curve
Mean +-1σ= 68.27%, Mean +-2σ= 95.45%, Mean +-3σ= 99.73%, 1.96σ= 95%,
2.5758σ= 99%
Coefficient of skewness is 0
Coefficient of kurtosis is 3 mesocratic.
Since the mean= median= mode= µthe ordinate at x=µdivides the whole into two equal
parts. Further since total area under normal probability curve is 1. The area to the right
ordinate as well as to the left of ordinate at x=µ is 1 (0.5+0.5)
No question of the curve lies below the x-axis since the probability can never be
negative.
The range of probability distribution is from -∞ to ∞ but practically 6σ.
All odds moments of normal distribution is zero µ2n+1=0.
Point of inflexion of the normal curve are at x=+-µσ, they are equidistant from
mean at a distant of standard deviation.
Standard normal distribution or standard normal variance z= x-µ/σ, x= sample mean,
µ= population mean, σ= standard deviation.
Properties of probability for a normal distribution are
1) Px= 1/√2πσ.e (-z)2/2
2) Mean and variance is (0, 1) a normal curve with 0 mean & unit standard deviation
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
is known as the standard normal curve.
CORRELATION AND REGRESSION CORRELATION
If two quantities vary in such a way that movement in one are accompanied by
movements in the other, these quantities are correlated.
The correlation analysis refers to the techniques used in measuring the closeness
of the relationship between the variables.
Correlation analysis deals with the association between two or more variables.
Correlation analysis attempts to determine the degree of relationship between
variables.
Correlation is an analysis of the covariation between two or more variables.
Coefficient of correlation is one of the most widely used and also one of the most
widely abused in the sense that the correlation measures nothing but the strength of
linear relationship and that it does not necessarily imply a cause & effect relationships.
Karl Pearson has given the concept of correlation.
Correlation denotes from “r”
Correlation lies between -1 to 1
Correlation analysis help in determining the degree of relationships between two or
more variables it does not tell us anything about cause and effect relationships.
Correlation does not necessarily imply causation or functional relationship though the
existence of causation always implies correlation. It establishes only covariation.
Correlation observed between variables that cannot conceivably be casually related
is called spurious or nonsense correlation.
Types of correlation are as follows-
1) Positive or negative correlation
2) Simple, partial, multiple correlation
3) Linear and non-linear correlation
Positive or negative correlation- If both the variables are varying in the same
direction i.e. if as one variable is increasing the other on an average is also increasing it is
known as positive correlation. On the other hand if the variables are varying in opposite
directions i.e. as one variable is increasing the other is decreasing or vice versa,
correlation is said to be negative.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Simple, partial or multiple correlation- when one, two variables are studied it is a
problem of simple correlation, when three or more variables are studied it is a problem of
either multiple or partial correlation. In multiple correlation three or more variables are
studied simultaneously.
Linear or non-linear correlation- if the amount of change in one variable tends to be
a constant ratio to the amount of change in the other variable then correlation is said to be
linear. Correlation would be called non-linear or curvilinear if the amount of change in
one variable does not bear a constant ratio to the amount of change in the other
variable.
METHODS OF STUDYINGCORRELATION
Followings are the methods of correlation
1) Scatter diagram method
2) Graphic method
3) Karl Pearson coefficient of correlation.
4) Rank correlation
5) Concurrent deviation method
6) Method of least squares
1) Scatter diagram method- The simplest device for ascertaining whether two
variables are related is to prepare dot chart called scatter diagram. The greater the
scatter of the plotted points on the chart the lesser is the relationship between the two
variables. The more closely the points come to a straight line, the higher the degree
of relationships.
If all the points lie on the straight line falling from the lower left hand corner to the
upper right hand, correlation is said to be perfect correlation r= +1
If all the points are lying on a straight line rising from upper left hand to the corner
right hand correlation is said to be perfect negative r= -1
If the plotted points lie on a straight line parallel to the x-axis or in haphazard manner
it shows absence of any relationship between the variables and it is called no
correlation r=0
Perfect positive r= +1, perfect negative= -1, positive r>0, negative r<0, no
correlation = 0.
As much as relationships come closer to zero it is called weak correlation or low
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
degree correlation.
As much as relationships come closer to 1 it is called strong correlation or high
degree correlation.
2) Graphic method- when values are plotted on a graph paper we obtain two curves, one
for x variable and another for y variables. If both the curves drawn on the graph are
moving in the same direction (either up or down) correlation is said to be positive. On the
other hands if the curves are moving in the opposite direction, correlation is said to
be negative.
3) Karl Pearson coefficient of correlation or product moment coefficient of
correlation Karl Pearson method popularly known as Pearsonian co-efficient of
correlation.
The Pearsonian co-efficient of correlation is denoted by the symbol r,
r=∑xy/nσx.σy.
This method is to be applied only where the deviations of items are taken from
actual means and not from assumed means.
Value of co-efficient of correlation lies between -1 to 1.
The co-efficient of correlation describes not only the magnitude of correlation
but also its decision. r= ∑xy /√∑x2.∑y2.
The coefficient of correlation is said to be a measure of covariance between two series.
The covariance of two series x & y, covariance = ∑xy/n
In order to find out the value of correlation coefficient, first we calculate covariance &
then in order to convert it to a relative measure we divide the covariance by the standard
deviation of the two series. The ratio so obtained is called Karl Pearson’s coefficient.
Correlation is independent of change of scale & origin.
R= ∑xy /√σx.σy.
R= cov (xy)/σx.σy.
Probable error= [Link]= 0.6745, 1-r2Ϯ/√n
Standard error= S.E R = 1- r2/√n
Co-efficient of Determination
Square of co-efficient of correlation is called co-efficient of determination.
Co-efficient of determination= r2
R2= explained variance/ total variance.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Co-efficient of determination (r2) means the percentage of variation in the (y)
dependent variable which is explained by the independent variable (x).
Y = ∞+βx where y is dependent variable, ∞ is intercept, β is slope, x is
independent variable.
Co-efficient of determination lies between 0 and 1.
R2= [Link]
The ratio of unexplained variance to total variance is frequently called the co-
efficient of non-determination (k2)
Square root of non-determination is called co-efficient of alienation or k.
Properties of coefficient of correlation are as follows-
1) The coefficient of correlation lies between -1 to 1.
2) The coefficient of correlation is independent of change of scale & origin.
3) The coefficient of correlation is the geometric mean of two regression
coefficient r= √[Link]
4) The degree of relationship between two variable is symmetric rxy= ryx.
RANK CORRELATION COEFFICINT
EDWARD SPEARMAN has developed this.
Sometimes we are required to examine the extent of association between two
ordinary scaled variables such as two rank orderings.
A measure to ascertain the degree of association between the ranks of the two
variables x and y is called rank correlation.
Spearman denotes it by p(rho)
P= 1-ϲ∑d2/n3-n
Features are as follows-
1) The sum of the differences or ranks between two variables shall be zero, ∑d=0
2) It is distribution free or non-parametric
3) If ranks are equal then p=1- 6∑d2/n(n2-1)+m(m2-1)/12.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
CONCURRENT DEVIATION METHOD
It is the simplest method
To find out the direction of change of x variables & y variables.
Rc=+-√(2c-n)/n
C= concurrent deviations
When we observe numerical data in relation to time the set of observations so
obtained is known as time series.
The limits of the population correlation are given by r+- P.E
REGRESSION ANALYSIS
It reveals the average relationship between two variables and this make
possible estimation or prediction.
The meaning of the term regression is the act of returning or going back.
The term Regression was first used by Sir Francis Galt on in 1877.
The line describing the tendency to regress or going back was called by Galton a
Regression line.
It is the measure of the average relationship between two or more variables
in terms of the original units of the data.
To study the functional relationships between the variables and thereby provide a
mechanism for predictions and forecasting.
It is a statistical device with the help of which we are in a position to estimate (or
predict) the unknown values of one variable from known values of another variables.
Y = a+bx, Y dependent variable we are trying to predict, x= independent variable
which is used topredict.
Geometric mean of two regression co-efficient gives co-efficient of correlation.
R= √[Link]
Regression is affected by change of scale & independent of change in origin.
R2= √[Link]
R2= [Link]
= cov(xy/σx2. Cov(xy)/σy
= cov2(xy)/σx2.σy2
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Difference between Correlation & Regression
CORRELATION REGRESSION
1) Correlation simply tells the 1) Regression mean stepping back or
relationship between the two or returning to the average value i.e. it’s
more variables which vary simply tells average relationship
together between two variables.
2) Correlation coefficient tells 2) Regression analysis aims at
the degree of relationships establishing the functional
between two variable, r= xy= ryx relationships between two variables
[Link] (not symmetric).
3) Correlation need not imply 3) Regression analysis clearly indicate
cause & effect relationship the cause & effect relationships.
between two variables.
4) Correlation coefficient is a 4) Regression coefficient bxy & byx
relative measure of the linear are absolute measures representing
relationship between x and y & is the change in the value of variable y
independent of units of for a unit change in the value of
measurement. It value lies between - variable,
1 to 1. its value lies between 0 & 1.
5) There may be nonsense 5) There is no such things as non-sense
correlation between two variables regression.
e/g intelligent & weight called
spurious correlations.
6) Correlation analysis is confined6) Regression analysis includes linear
to the study of linear relationshipsas well as non-linear relationships
between variables between variables.
7) Independent of both 7)Regression is independent of
change ofscale & change of origin change of origin but not of scale.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Both regression co-efficient wills have the same sign i.e. either they will be positive or
negative. It is never possible that one of the regression co-efficient is negative & other
positive.
Since the value of the co-efficient of correlation cannot exceed one, one of the
regression co-efficient must be less than one or, in other words, both the regression
co-efficient cannot be greater than 1.
The coefficient of correlation will have the same sign as that of regression co-efficient
i.e. if regression co-efficient have a negative sign, r will also be negative and if regression
coefficient have a positive sign, r would be positive.
Since bxy=rσx/σy we can find out any of the four values given the other three.
Regression coefficient are independent of change of origin but not of scale.
When the data represent a sample from a larger population, the least square line is a
best estimate of the population regression line. Regression equation of x on y, Xc= a+By
The standard error of estimates measures the dispersion about an average line
called the regression line syn= √∑(y-y)2/n or syn= σy√1-r2.
PRINCIPLES OF SAMPLING
1) Law of statistical regularity- According to this law a group of objects chosen at
random from larger group tens to possess the characteristics of that large group.
2) Principle of inertia of large number- It states that as the sample size increases the
result tends to be more reliable & accurate keeping other things constant.
3) Principle of persistence of small numbers- According to this principle if some of the
items in a population possess markedly distinct characteristic from the remaining items
then this tendency would be revealed in the sample value also rather this tendency of
persistence will be there even if the population size is increased or even in the case of
large sample.
4) Principle of validity- A sample design is termed as valid if it enables us to
obtain valid tests & estimates about the population parameters
5) Principle of optimization- this principle stresses the need of obtaining
optimum results in terms of efficiency cost of the sample design with the source
available at our disposal.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
Type 1 error- when we reject the true null hypothesis it is also known as
producer error, level of significance and alpha.
Type 2 error- when we accept the wrong null hypothesis it is also known as
consumer error, ;β-1), power function test or power curve beta.
Standard error- standard deviation of the distribution of the sample mean
is known as standard error. S.E=σ/√N.
TYPES OF SAMPLING PROBABILITY SAMPLING/ RANDOM SAMPLING
1) Random sampling- In random sampling we select the sample randomly i.e. there is
equal chance of selecting every item but it is a case of without replacement. If we select
ball from 10 balls then 1/10, if we select 1 ball keeping 1 ball outside then 1/9.
2) Simple random sampling- In simple random sampling we select the sample
randomly and there is equal chance of selecting every item and replacement occurs so
size remain the same. Eg if we select 1 ball from ten balls after replacement
then 1/10(tibbet tables/lottery)
3) Stratified sampling- In this type of sampling we convert heterogeneous data in
homogenous form and then select the sample randomly (strata means layers e.g. if we
separate boys & girls and then select)
4) Systematic sampling–In this type of sampling we follow a system for collecting a
sample on our own and rest of the sample are automatically selected at equal gal from
each other.
5) Cluster sampling- It is also known as area sampling, In this type of sampling we make
groups out of heterogeneous data and then select the groups randomly.
6) Multi stage sampling- In this type of sampling we use same or different method
of sampling to study the cases.
NON RANDOM SAMPLING/ NON PROBABILITY SAMPLING
1) Purposive sampling- In this type of sampling the conclusion is
predetermined and then we select the sample accordingly.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
2) Convenience sampling- It is also known as Chunk Sampling, Incidental Sampling
and in this type of sampling we get the sample in a convenient way from collections and
guidance.
3) Judgmental sampling- In this type of sampling we collect our sample on the basis of
experience, expert knowledge and accordingly to judge.
4) Quota sampling- In this type of sampling quota is fixed for every enumerators and
they have to collect the sample by using any biased method.
5) Snowball sampling- In this type of sampling we study the rare cases such as aids
patients and then accordingly to their reference we collect the sample.
COLLECTION OF DATA
The information collected from various sources which can be expressed in
quantitative form for a specific purpose is called data.
Types of Data–
1) Primary data – It is the original data which are collected for the first time for a
specific purpose e.g. population census data collected by government in a
country.
2) Secondary data – It Is those which have already been collected by some other
agency and which have already been processed. It is in the published and
unpublished. Sources is important for collecting data. Documentary source of data
is also known as paper source.
MISCELLENEOUS POINTS
1. Maximum value of correlation is-1
2. Spearman's method is the method of calculating coefficient of
correlation by-- Charles Spearman.
3. Graph of variables having linear relation will be--Straight Line
4. The files required to maintain general ledger records include--Detail
posting file.
5. In which files, the records are organized in sequence and an index table
is used to speed up access to the records without requiring a search of
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
the entire file? Indexed Sequential file.
6. Correlation between income and demand is-Positive.
7. F distribution is coined by George W Snedewr in Honour of Sir
Ronald a fisher.
8. Chi square (non parametric test) concept given by Karl pearson.
9. Concept of normal distribution is given by De Mouire and person
involved in this are Laplace, Gauss And W J Yoden.
10. Concept of regression is given by Sir Francis Galton In 1877.
11. Concept of T- distribution is given by WS Gooset.
12. Data which are collected for the first time are primary data.
13. Data is only quantitative.
14. Secondary data are second hand data which are in the form of
published & unpublished.
15. Category of secondary data are also called paper source.
16. Median and mode are positional average.
17. Arithmetic mean , geometric, harmonic & weighted average mean
are mathematical average.
18. ∑ know as capital sigma.
19. Property of arithmetic mean-:
> Sum of all observations of the given set of observations from their
arithmetic mean is zero.
> Combined mean= n1x1 + n2x2/n1+n2
> The sum of square of the deviations of the given set of observations is
minimum when taken from the arithmetic mean.
> Mean is affected by both change of scale and change of origin.
> AM> GM> HM ( AM- arithmetic mean, GM- geometric mean, HM-
harmonic mean)
> Mode = 3median – 2mean
> One dimensional 1d diagram are those which have only length.
Examples are line diagram, multiple bar diagram, compound or cluster
bar diagram, sub divided bar diagram are also called component bar
diagram, percentage bar diagram, deviation bar diagram.
20. Two dimensional diagram are those in which both length and
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
breadth are present 2D. Examples are histogram, area diagrams,
rectangles, square, circles and pie diagram.
21. Ogive curve and frequency polygon are also 2d diagrams.
22. Ogive represent cumulative frequency and histogram frequency
distribution and frequency polygon means many angled diagrams.
23. 3D , three dimensional diagrams are those cubes, sphere, cylinders and
cuboid.
24. Numerical characteristics of population are parameters and sample
are sample statistics/ estimators.
25. Random sampling method is also called probability sampling
method.
26. Non random sampling method is also called non probability
sampling method.
27. Random sampling methods are-
>Simple random sampling method
>Stratified random sampling method
>Systematic sampling method
>Cluster sampling method
>Multistage sampling
28. Non random sampling methods are-
>Convenience sampling or chunk or incidental sampling
> Judgement sampling
> Purposive sampling
>Quota sampling
>Snow ball sampling
29. The larger the sample the more accurate will be the research.
30. Increasing the sample size decreases the sample error.
31. Sample size N = 100
32. Principles of sampling-
>Law of statistical regularity
>Principle of inertia of large number
>Principles of persistence of small numbers
>Principle of validity
>Principle of optimization
33. Type 1 error by rejecting a true null hypothesis and it is also known
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
as producer error, alpha or level of significance.
34. Type 2 error by accepting the false null hypothesis also called
consumer error ( 1- β) beta, power function of test or power curve, power
of test.
35. Standard error ( SE) is standard deviation of the distribution of the
sample mean. S.E = σ/√n
36. In a normal distribution curve, since the curve is a bell shaped &
symmetrical i.e mean=median=mode
37. Total area under normal probability curve is 1 ( .5 + .5)
38. Since curve is symmetrical co efficient of kurtosis is 3 mesocurtic.
39. Range of distribution is ∞to ∞ but practically it is 6σ.
40. Point of inflexion is x= +- µσ
41. Leptokurtic <3, platokurtic>, mesocurtic =3.
42. Area µ+-1σ= 68.27%, µ+- 2 σ= 95.45%, µ+- 3σ= 99.73%
43. Z( standard normal distribution) = x-µ/σ
44. Concept of binomial distribution is given by James Bernoulli
45. Concept of poisson distribution is given by Simeon Poisson . In this
the value of mean and variance is (0,0)
46. Standard deviation is also known as root Mean Square Deviation.
47. Standard deviation is affected by change of scale & independent of
change of origin.
48. Positively skewed= mean>median>mode.
49. Negatively skewed= mode>median>mean.
50. Balance pattern= mean=median=mode.
51. Skewness means lack of symmetry or asymmetrical distribution.
52. Concept of co efficient of skewness given by Karl Pearson.
53. Confidence interval 95%= 1.96, 99%= 2.56/2.58
54. Z test is also called standard normal variable test, standard normal
deviate test, approximation test, large scale test.
55. Conditions to be applied in z test is-
N> 30.
N≤30 ( standard deviation of population mean is given).
56. One tailed test is also known as direction test or right tailed test. F
test and chi square test is one tailed test.
57. Two tailed test is called left tailed test as direction is not mention.
58. Conditions to apply t test-
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
N≤30 standard deviation of sample mean is given.
To check the difference in mean.
59. T test is also called t distribution & student t test, exact test, small
test.
60. Conditions to accept & reject hypothesis-:
1) Table value> calculated value= accept
2) Table value< calculated value= reject
61. Chi square is a non- parametric test
62. Chi square lies from 0 to ∞
63. Conditions to apply chi square test:-
>Population mean & sample mean is not given in the question.
>Degree of freedom ( df-1) as df starts from t test.
64. Chi square test is also known as :-
>Goodness of fit accumulation
> Contingency table
>Quantitative variables
>Co efficient of association
65. Degree of freedom = ( row-1) ( column -1)
66. F test in which value of numerator is always greater than
denominator.
67. Conditions to apply f test:-
>Population mean, sample mean & standard deviation is not given in the
question.
>It will talk about two mean.
>Its value lies between 0 to ∞
68. Concept of correlation is given by Karl pearson.
69. Correlation denotes from R and its values lies between -1 to 1.
70. Spearman correlation given by Edward spearman.
71. Edward spearman denotes correlation from p ( rho)
72. P= 1- 6∑d2/n (n2-1)
73. If tied rank the formula will be p= 1-6∑d2/n(n2-1) + m(m2-1)/12
74. Karl pearson correlation formula is cov( xy)/σx.σy
75. Correlation is independent of both change of scale & origin.
76. Regression is affected by change of scale & independent of change
of origin.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])
lOMoARcPSD|53811382
77. R2 is coefficient of determination.
78. Coefficient values lies between 0 to 1.
79. R2= bxy* byx
80. Regression shows a causual effect i.e cause & effect relationships.
81. Parametric test:- z test, t test, f test.
82. Non parametric test or distribution free test: - sign test, median test,
Mann whitney u test, run test, K.S test, and Chi square test.
83. Paired t test is used in management training, special coachings,
productivity of crop before & after.
84. Paired t test is also known as bivariate normal distribution.
85. Nominal scale=mode used, ordinal scale= mode & median, interval
& ratio scale = mean, median, mode.
86. Minimum value of correlation is-1
87. Files held on a storage device are identified by a special block of
data held as the first block on the file.
This block is called the---file label.
88. Graph of variables having non-linear relation will be--curved
89. Horizontal curve represents the value of coefficient of correlation to
be--zero.
90. In case there is no relation between two variables, value of
coefficient of correlation will be--0
91. Karl Pearson's coefficient of correlation method of measuring
correlation is--Mathematical
92. Correlation between price and demand is-Negative.
93. The collection of integrated and related master files is known as---
Database.
94. The difference between the actual value of the figure and its
approximated value is called statistical error.
95. A frequency distribution obtained by the simultaneous classification
of data according to two characteristics is known as a bivariate frequency
distribution.
96. Index Numbers are specialized averages designed to measure the
change in a group of related variables over a period of time. Index
numbers have today become one of the most widely used statistical
devices and there is hardly any field where they are not used.
Downloaded by GOONER MOGLANIA (skibidikirmada@[Link])