0% found this document useful (0 votes)

18 views10 pages

Basic Statistics Overview

Statistics is the science of collecting, analyzing, interpreting, and presenting data, divided into descriptive and inferential statistics. Key concepts include populations and samples, types of data (qualitative and quantitative), scales of measurement, and measures of central tendency and dispersion. Additional topics covered are moments, skewness, kurtosis, and the theory of attributes, which help in understanding data distributions and classifications.

Uploaded by

dhritimedigeshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views10 pages

Basic Statistics Overview

Uploaded by

dhritimedigeshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

STATISTICS

Introduction to statistics
Statistics is the science concerned with developing and studying methods for collecting,
analyzing, interpreting and presenting empirical data. It is the art of learning from data.
1. Descriptive statistics: The part of statistics concerned with the description and
summarization of data is called descriptive statistics. It includes methods like
classification, tabulation, and measures of central tendency (such as averages) and
variability (such as the range and standard deviation)."
2. Inferential statistics: The part of statistics concerned with the drawing of conclusions
from data is called inferential statistics.
Population: a population is the entire set of individuals, items, or data points that share
certain characteristics and are the subject of a study.
 In a study about college students' study habits, the population could be all college students.
 In a medical study on the effects of a new drug, the population could be all individuals
with a specific health condition.
Sample: A subgroup or subset of the population that will be studied in detail is called a
sample. A smaller group of members of a population selected to represent the population. It
should ideally represent the characteristics of the entire population

PYQ: Give examples of:

i. finite population and its sample
ii. Infinite population and its sample
Ans: A finite population is a population where all the members are known and can be
counted. E.g. all the employees of a company, all the students in a school.
Possible samples: group of 100 students across all grades, random selection of 50 employees
An infinite population is a population that is either too large to count or cannot be measured
precisely.
Examples of theoretically infinite population: the number of throws of dice, all potential
customers of a product
Possible samples: Rolling the dice 100 times and recording the outcome, a survey of 1000
randomly chosen individuals

Data: data refers to individual pieces of factual information collected through observation,
measurement, or experimentation.
1. Qualitaive Data: Non-numerical information that describes qualities or
characteristics. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g.,
collie, shepherd, terrier) would be examples of qualitative or categorical variables.
2. Quantitative Data: Numerical information that represents quantities or amounts like
the population of a city or the amount of grains produced per year on a
farm .Quantitative variables can be further classified as discrete or continuous. If a
variable can take on any value between its minimum value and its maximum value, it
is called a continuous variable; otherwise, it is called a discrete variable.
Scales of Measurement
1. Nominal scale: The nominal scale is the most basic level of measurement, used for
labeling or categorizing data without implying any quantitative value or order.
Examples: Gender (male, female), eye color (blue, green, brown), types of fruits
(apple, banana, orange).
2. Ordinal scale: categorizes data into ordered or ranked categories, where the order
matters but the intervals between categories are not equal or defined. Generally
ordinal scale is used when we want to measure the attitude scores towards the level of
liking, satisfaction, preference, etc.
Examples: Class rankings (1st, 2nd, 3rd), levels of satisfaction (satisfied, neutral,
dissatisfied)
Mathematical operations cannot be performed in these scales
3. Interval scale: The interval scale represents ordered data with equal distances
(intervals) between values, but it lacks a true zero point. The difference between two
variables has a meaningful result. The values can only be added and subtracted but
not multiplied or divided.
Examples: Temperature in Celsius or Fahrenheit (where 0 doesn’t mean “no
temperature”)
4. Ratio scale: The ratio scale is the most advanced scale of measurement, including
ordered data with equal intervals and an absolute zero. All mathematical operations
(addition, subtraction, multiplication, division) are meaningful. Zero represents the
absence of the property measured. Examples: Height, weight, age, distance, income.
For instance, 0 kg means no weight, and a weight of 10 kg is twice as much as 5 kg.
Frequency: The number of times a value is occuring/repeating in a data set
Frequency distribution: Representation of frequencies in a tabular form wherein
data is organised into categories or intervals, along with their respective frequencies.
Frequency distribution: The proportion of times a particular value occurs in relation
to the total number of observations.
frequency of a value
Relative frequency=
total number of observations
Class interval: A class interval is a range of values used to group continuous or large sets of
data into smaller, more manageable categories or "classes.
A class interval may be:
(a) exclusive – wherein the upper limit of class interval is not included in series
(b) inclusive – wherein both the upper as well as lower limit are included in the series
Data that is arranged in class intervals is called grouped data.
Grouped data can be visualised by histogram, frequency polygon and ogive curve

Cumulative frequency: running total of frequencies up to a certain class interval in a dataset.

Ogive curve - An ogive is a type of line graph that represents cumulative frequencies. It can
be used to display either "less than" or "more than" cumulative frequency distributions.
 For a "less than" ogive, plot the cumulative frequency against the upper boundary of each
class interval and connect the points with a smooth curve.
 For a "more than" ogive, plot the cumulative frequency against the lower boundary of each
class interval and connect the points.
The point where the less than and more than ogive curves intersect is the
MEDIAN.

PYQ: Write a short note on histogram. Which average can be obtained from it? Explain
the method of finding out this average from a histogram.
Ans. Histogram – it is like a bar graph where the adacent bars are touching to emphasize the
continuous nature of the data. It is a graphical representation of data that organizes a group of
data points into specified ranges. It is used to display the frequency distribution of a dataset.
The x-axis represents the intervals and th y-axis represents the frequency of occurences
within each interval.
The mode can be obtained from a histogram. It represents the value or range of values that
occurs most frequently in a dataset. In a histogram, it corresponds to the tallest bar, which
represents the class interval with the highest frequency.
Frequency polygon – It is created by joining the mid-points of eaach bar of a histogram.

Measures of Central Tendency

A measure of central tendency is a statistical value that represents the center or typical value
of a dataset. It provides a single value that summarizes the data by indicating where most
values in the dataset cluster. In statistics, the three most common measures of central
tendency are mean, median and mode.
1. Mean – It is the arithematic average. The sample mean is defined by sum of all values in a
dataset divided by the number of values.

Mean is affected by extreme scores. Sometimes, mean is a value which is not present in the
series.
For continuous data;

Shortcut method (Assumed mean method)

where di = xi – A and A is any arbitrary value from x

Properties of arithematic mean

1. Algebraic sum of the deviations of a set of values from their mean is zero. The
deviations are the differences between the data values and the sample mean. The
value of the ith deviation is xi − x.
2. The sum of the square of a deviation of a set of values is minimum when A = x
3. Mean of a composite series (Combined mean)

Geometric mean
Harmonic mean – It is the reciprocal of the arithematic mean of the reciprocal of the values
in a data set. The harmonic mean is a measure of central tendency for data expressed as rates
such as kilometers per hour, tonnes per day, kilometers per litre etc

Median - It is the value that divides the data set into two equal halves.
For discrete data; Order the data values from smallest to largest. If the number of data values
is odd, then the sample median is the middle value in the ordered list; if it is even, then the
sample median is the average of the two middle values
For continuous data;
h N
M d =l+ ( −C)
f 2
where l = lower limit of median class
h = magnitude of median class
f = frequency of m.c.
C = c.f. preceding the m.c.

Mode - defined as the value that occurs most frequently in the data
For continuous data;

Mode=3 Med ian−2 Mean

Measures of Dispersion
A measure of dispersion (also called a measure of variability or spread) is a statistical value
that describes how much the data values in a dataset vary or spread out from the central value
(mean, median, or mode). There are two basic kinds of a measure of dispersion (i) Absolute
measures and (ii) Relative measures. The absolute measures of dispersion are used to measure
the variability of a given data expressed in the same unit, while the relative measures are used
to compare the variability of two or more sets of observations.

PYQ: Explain the term ‘dispersion’

Ans Dispersion refers to the degree to which data points in a dataset are spread out or
scattered around a central value. It measures the variability or diversity within the dataset. A
smaller dispersion indicates that the data points are closely clustered around the central value,
while a larger dispersion suggests they are spread out.

1. Range - The range is the simplest measure of dispersion, calculated as the difference
between the highest and lowest values in the dataset.
merit = simplicity of calculation
demerit = It utilizes only the maximum and the minimum values of variable in the
series and gives no importance to other observations
2. Quartile deviation
Quartile = quartiles are values that divide a complete given set of observations into
four equal parts.
The quartile deviation is also known as the semi-interquartile deviation and is given
by:
(QD) = (Q3 – Q1) / 2.
For ungrouped data;
Q1 = value of (n+1)/4 th observation
Q3 = value of 3(n+1)/4th observation

For grouped data

Q1 = L1 +(N/4-C1)/f1 x h
Q3 = L3 + (3N/4 + C3)/f3 x h
Quartile deviation is definitely a better measure than the range as it makes use of 50% of data.
But since it ignores the other 50% of the data, it can not be regarded as reliable measure.

3. Mean deviation
average of the sum of the absolute values of deviation from any arbitrary value viz. mean,
median, mode, etc.
Since mean deviation is based on all the observations, it is a better measure of dispersion than
range or quartile deviation. But the step of ignoring the signs of the deviations (xi – A) creates
artificiality and renders it useless for further mathematical treatment.

MEAN DEVIATION STANDARD DEVIATION

Algebraic signs are ignored Signs are taken into account
Can be computed either from mean or Always computed from the arithmetic mean
median

4. Standard deviation
For grouped data:

For ungrouped data:

* Note: X =( x ¿¿ i−x )¿

Standard deviation is also known as root mean square deviation for the reason that it is the
square root of the means of the squared deviations from the arithematic mean. The standard
deviation measures the absolute dispersion or variability of a distribution.
Mathematical properties of standard deviation

Standard deviation of combined series:

Where d 1=(x ¿¿ 1−x)¿

and d 2=( x ¿¿ 2−x )¿
And x is the combined mean of the series

Variance
It is defined as the square of standard deviation
1 2
Var = σ 2= Σ f i (x ¿¿ i−x) ¿
N

MOMENTS
Let the symbol x be used to represent the deviation of any item in a distribution from
the arithmetic mean of that distribution. The arithmetic mean of the various powers of
these deviations in any distribution is called the moments of the distribution. The
moments about mean are called central moments and are denoted by μ.
In a symmetrical distribution all odd moments i.e. μ1, μ3 etc. would always be zero

Moments about arbitrary origin

Moments about arbitrary origin (A) are also called ‘raw moments’ and are denoted by
'
μ

For the sake of simplicity, moments are first calculated about an arbitrary origin. If we want
to obtain moments about mean, we can do so with the help of the following relationships:

SKEWNESS
It refers to lack of symmetry. When a distribution is asymmetrical, it is called a
skewed distribution.
In a symmetrical distribution: mean=median=mode – Bell shaped curve
Negatively
Skewed
Normal
Curve
Curve
Positively
Skewed
Measures of Skewness
1. Absolute measures:
Sk = mean – median
Sk = mean – mode
Sk = Q3 + Q1 – 2Median (when skewness is based on quartiles, absolute skewness is
given by this formula)
2. Relative measures of skewness
a. Karl Pearson’s coeff.
b. Bowley’s coeff.
c. Kelly’s coeff.
d. Measure of skewness based on moments
These measures of skewness are mainly used for making comparisons between two or more
distributions.
a. Karl Pearson’s coefficient of skewness:

b. Bowley’s coeff of skewness

PYQ: Define Bowley’s coefficient of skewness. What are its limits?

Ans Bowley’s measure of skewness is based on quartiles. In a symmetrical
distribution the first and third quartiles are equidistant from the median i.e. in a
symmetrical distribution, thr third quartile is the same distance over the median as the
first quartile is below it. If the distribution is positively skewed, the top 25% of values
will tend to be farther from the median than the bottom 25% i.e. Q3 will be farther
from median than Q1 is from the median and the reverse for negative skewness.
The Bowley’s coefficient is limited to values between -1 and +1.

c. Kelly’s coeff of skewness

(D= decile)

(P = percentile)
d. Based on moments

γ 1=± √ β1
and

KURTOSIS
In Greek, kurtosis mean ‘bulginess’. In stats, it refers to the degree of flatness or
peakedness of a frequency curve.
Kurtosis is the degree of peakedness of a distribution usually taken relative to a normal
distribution.
If a curve is more peaked = leptokurtic
If a curve is more flat-topped = platykurtic
Normal curve = mesokurtic
Measures of kurtosis

if β 2=3 , γ 2=0, curve is normal i.e. mesokurtic

if β 2> 3 , γ 2 >0 , curve is leptokurtic
if β 2< 3 , γ 2 <0 , curve is platykurtic

THEORY OF ATTRIBUTES
Qualitative characteristic of an individual is called attribute. It is denoted by capital letters A,
B, C and their absence by Greek letters α,β,γ.
When the variable x is not a measurable numeric quantity, we use the theory of attributes.

Classification:
1. If only one attribute is studied, the population is divided into two classes according to
its presence or absence and such classification is termed as dichotomous
classification.
2. If a class is divided into more than two classes such classification is termed as
manifold classification.

Class frequency: The number of observation within a particular class is known as the
frequency of that class. It is denoted through bracket. As the frequency of class A is denoted
by (A), similarly the frequencies of AB or α by (AB) or α respectively.

If the number of attributes is n, the total number of class frequencies will be 3 n

Order of Class: Combination of a class of n attributes is called as nth order class or order will
depend on the number of classes involved in the [Link], A, B, α, β etc. are first
order classes. AB,
αβ are second order classes. ABC, αβγ are third order classes

The frequencies of the classes of highest order are called as ultimate class frequencies.
The frequency of a lower order class can always be expressed in terms of the higher order
class frequencies.
N = (A) + (α) = (B) + (β)
(A) = (AB) + (Aβ)
(α) = (αB) + (αβ)
(B) = (AB) + (αB)
(β) = (Aβ) + (αβ)

Consistency of attributes
σ
Coefficient of variation = ∗100
x
A - CVA
B - CVB
If CVA>CVB = A is less consistent
If CVA<CVB = A is more consistent
( AB)≯ ( A)

Theorem: Necessary and sufficient condition for consistency of data is that the ultimate
class frequency should be non-negative

Independency of Attributes
The attributes are said to be independent the presence or absence of the attribute does not
affect the presence or absence of the other.
Criterion 1: Proportion of attribute A is equal in B and β. Proportion of B is equal in A and α.

( A β)
∧( AB )
( AB) (β) (αB )
= =
(B) ( A) (α )
Criterion 2: If A and B are independent, the proportion of AB’s in total population N is equal
to the product of proportion of individual attributes.

(A )
⇒ ∗(B)
( A)(B) (AB) N
( AB )= . =
N N N

Criterion 3:

Association of attributes
Two attributes A and B are said to be associated if they are not independent. i. e. they are
related in some way or the other

( A)(B)
If ( AB ) > then they are positively associated
N
( A)(B)
If ( AB ) < then they are negatively associated
N
( A)(B)
δ=( AB )−
N
if δ >0 , A and B are +vely associated
if δ <0, A and B are –vely associated

Yule’s coefficient of association of attributes:

Definitions
No ratings yet
Definitions
14 pages
Understanding Data and Statistics Basics
No ratings yet
Understanding Data and Statistics Basics
8 pages
SB Niazi
No ratings yet
SB Niazi
38 pages
Understanding Statistics Fundamentals
No ratings yet
Understanding Statistics Fundamentals
29 pages
Statistics Glossary: Grouped & Ungrouped
No ratings yet
Statistics Glossary: Grouped & Ungrouped
3 pages
Engineering Data Analysis Overview
100% (2)
Engineering Data Analysis Overview
4 pages
Understanding Population and Sampling in Statistics
No ratings yet
Understanding Population and Sampling in Statistics
13 pages
Understanding Descriptive Analytics
No ratings yet
Understanding Descriptive Analytics
6 pages
Introduction to Basic Statistics Concepts
No ratings yet
Introduction to Basic Statistics Concepts
27 pages
Essential Guide to Statistics and Data Analysis
No ratings yet
Essential Guide to Statistics and Data Analysis
35 pages
Business Statistics and Data Analysis
No ratings yet
Business Statistics and Data Analysis
14 pages
Stats Full Notes
No ratings yet
Stats Full Notes
81 pages
Introduction to Statistics Course Overview
No ratings yet
Introduction to Statistics Course Overview
187 pages
Frequency Distributions and Analysis
No ratings yet
Frequency Distributions and Analysis
8 pages
Descriptive Statistics Overview Guide
No ratings yet
Descriptive Statistics Overview Guide
3 pages
Basic Statistical Concepts Explained
No ratings yet
Basic Statistical Concepts Explained
9 pages
Understanding Probability and Statistics
No ratings yet
Understanding Probability and Statistics
4 pages
Repackaging Weights of Raisin Bags
No ratings yet
Repackaging Weights of Raisin Bags
35 pages
Measure of Central Terndency
No ratings yet
Measure of Central Terndency
17 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
176 pages
Understanding Statistics and Data Analysis
No ratings yet
Understanding Statistics and Data Analysis
7 pages
Probability and Statistics Basics
No ratings yet
Probability and Statistics Basics
58 pages
Advance Stats
No ratings yet
Advance Stats
16 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
17 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
63 pages
Understanding Statistics and Data Types
No ratings yet
Understanding Statistics and Data Types
6 pages
STA301 Statistics Short Notes Guide
No ratings yet
STA301 Statistics Short Notes Guide
33 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
52 pages
STA301 Short Notes Midterm
No ratings yet
STA301 Short Notes Midterm
33 pages
Data Collection and Statistical Methods
No ratings yet
Data Collection and Statistical Methods
4 pages
Data Management and Descriptive Statistics
No ratings yet
Data Management and Descriptive Statistics
50 pages
Statistical Analysis Techniques Overview
No ratings yet
Statistical Analysis Techniques Overview
90 pages
Data Management and Descriptive Statistics
No ratings yet
Data Management and Descriptive Statistics
49 pages
Understanding Data Types in Statistics
No ratings yet
Understanding Data Types in Statistics
15 pages
Statistical Concepts and Principles Overview
No ratings yet
Statistical Concepts and Principles Overview
5 pages
Univariate Data Descriptive Statistics
No ratings yet
Univariate Data Descriptive Statistics
15 pages
Key Concepts in Statistics Explained
No ratings yet
Key Concepts in Statistics Explained
81 pages
Understanding Basic Statistics Concepts
No ratings yet
Understanding Basic Statistics Concepts
4 pages
AI and Machine Learning Fundamentals
No ratings yet
AI and Machine Learning Fundamentals
92 pages
Lecture 1 INTRODUCCION
No ratings yet
Lecture 1 INTRODUCCION
28 pages
Understanding Variables in Statistics
No ratings yet
Understanding Variables in Statistics
63 pages
Introduction to Statistics and Probability
No ratings yet
Introduction to Statistics and Probability
5 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
40 pages
Statistics: Descriptive vs Inferential Analysis
No ratings yet
Statistics: Descriptive vs Inferential Analysis
18 pages
Research Basics Brief
No ratings yet
Research Basics Brief
13 pages
Class Notes Descrpitives and Inferential Analysis
No ratings yet
Class Notes Descrpitives and Inferential Analysis
8 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
92 pages
Data Types in Statistical Analysis
No ratings yet
Data Types in Statistical Analysis
34 pages
Session 1 New
No ratings yet
Session 1 New
56 pages
Statistical Analysis Fundamentals
No ratings yet
Statistical Analysis Fundamentals
5 pages
Understanding Descriptive Statistics Basics
No ratings yet
Understanding Descriptive Statistics Basics
28 pages
Data Management in Statistics Guide
No ratings yet
Data Management in Statistics Guide
3 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
3 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Data Presentation and Tabulation Techniques
No ratings yet
Data Presentation and Tabulation Techniques
3 pages
SimVoi: Monte Carlo Simulation in Excel
No ratings yet
SimVoi: Monte Carlo Simulation in Excel
44 pages
Data Analysis and Interpretation Methods
No ratings yet
Data Analysis and Interpretation Methods
34 pages
Overview of Matplotlib for Python
No ratings yet
Overview of Matplotlib for Python
45 pages
Data Visualization Methods Explored
No ratings yet
Data Visualization Methods Explored
7 pages
Mathematics 7 Quarterly Exam Guide
No ratings yet
Mathematics 7 Quarterly Exam Guide
9 pages
Descriptive Statistics Overview
100% (1)
Descriptive Statistics Overview
80 pages
Data Visualization and Frequency Analysis
No ratings yet
Data Visualization and Frequency Analysis
3 pages
Bunching Estimation of Elasticities in Stata
No ratings yet
Bunching Estimation of Elasticities in Stata
28 pages
002 Applied Geostatistics For Reservoir Char-Halaman-27-61 PDF
No ratings yet
002 Applied Geostatistics For Reservoir Char-Halaman-27-61 PDF
35 pages
Enabling Minor Ticks in Matplotlib
No ratings yet
Enabling Minor Ticks in Matplotlib
61 pages
Quality Improvement and Problem Solving
No ratings yet
Quality Improvement and Problem Solving
53 pages
Nonparametric & Semiparametric Models
No ratings yet
Nonparametric & Semiparametric Models
325 pages
Mathematics in Modern World Overview
No ratings yet
Mathematics in Modern World Overview
8 pages
Visual Guide To Machine Learning
No ratings yet
Visual Guide To Machine Learning
349 pages
R Arithmetic Operations and Assignments
100% (3)
R Arithmetic Operations and Assignments
132 pages
Business Statistics Course Outline
No ratings yet
Business Statistics Course Outline
9 pages
Understanding Graphical Data Representation
No ratings yet
Understanding Graphical Data Representation
12 pages
SPSS Graphs and Data Visualization Guide
No ratings yet
SPSS Graphs and Data Visualization Guide
11 pages
Final Exam: Assessment in Learning 1
No ratings yet
Final Exam: Assessment in Learning 1
5 pages
Data Visualization Techniques in Engineering
No ratings yet
Data Visualization Techniques in Engineering
39 pages
KJSEA 2025 Mathematics Exam Guide
No ratings yet
KJSEA 2025 Mathematics Exam Guide
2 pages
Statistics & Analytics
No ratings yet
Statistics & Analytics
65 pages
Histogram and Frequency Polygon Questions
100% (1)
Histogram and Frequency Polygon Questions
11 pages
Frequency Distribution and Statistics Analysis
No ratings yet
Frequency Distribution and Statistics Analysis
23 pages
Green Belt Project Storyboard Guide
100% (1)
Green Belt Project Storyboard Guide
29 pages
Real-Life Box Plots in Grade 11 Stats
No ratings yet
Real-Life Box Plots in Grade 11 Stats
1 page
Understanding Data Types and Analysis
No ratings yet
Understanding Data Types and Analysis
4 pages
Probability Mass Functions in Statistics
No ratings yet
Probability Mass Functions in Statistics
7 pages
African Journal of Marine Science - Instructions For Authors
No ratings yet
African Journal of Marine Science - Instructions For Authors
4 pages

Basic Statistics Overview

Uploaded by

Basic Statistics Overview

Uploaded by

STATISTICS

PYQ: Give examples of:

Cumulative frequency: running total of frequencies up to a certain class interval in a dataset.

Measures of Central Tendency

Shortcut method (Assumed mean method)

where di = xi – A and A is any arbitrary value from x

Properties of arithematic mean

Mode=3 Med ian−2 Mean

PYQ: Explain the term ‘dispersion’

For grouped data

MEAN DEVIATION STANDARD DEVIATION

For ungrouped data:

Standard deviation of combined series:

Where d 1=(x ¿¿ 1−x)¿

Moments about arbitrary origin

b. Bowley’s coeff of skewness

PYQ: Define Bowley’s coefficient of skewness. What are its limits?

c. Kelly’s coeff of skewness

if β 2=3 , γ 2=0, curve is normal i.e. mesokurtic

If the number of attributes is n, the total number of class frequencies will be 3 n

Yule’s coefficient of association of attributes:

You might also like