Unit 1 Stats

The document provides an overview of descriptive and inferential statistics, detailing methods for summarizing and analyzing data. It covers key concepts such as mean, median, mode, percentiles, and measures of dispersion, along with examples and formulas for calculation. Additionally, it discusses data visualization techniques and the importance of understanding data distribution, including skewness and kurtosis.

Uploaded by

chandra.prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views32 pages

Unit 1 Stats

Uploaded by

chandra.prakash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Descriptive Statistics

Unit - I
Descriptive & Inferential Statistics
It organizes,
analyses and
tests and
presents the
data in a
Meaningful way.
It compares,
tests and
predicts the
data.
Descriptive statistics summarize your current dataset and Inferential
statistics use sample data to make generalizations about a larger
population.

Inferential Statistics
Mean
The mean provides a measure of central location for the data. If the
data are for a sample, the mean is denoted by 𝑥,ҧ if the data are for a
population, the mean is denoted by 𝜇.
For a sample with n observations {𝑥1 , 𝑥2 , … . 𝑥𝑛 }, the sample mean is
σ𝑛
𝑖=1 𝑥𝑖
given by 𝑥ҧ =
𝑛
σ𝑛
𝑖=1 𝑥𝑖
Population mean 𝜇= .
𝑁
Example: For a given data set 12, 14, 11, 12, 12, 12, 15, 17, 22, 15, 12
mean=154/11=14.
Weighted Mean
Weighted Mean is an average computed by giving
different weights to some of the individual values. If all the
weights are equal, then the weighted mean is the same
as the arithmetic mean.
σ 𝑤𝑖 𝑥𝑖
𝑥ҧ = , 𝑤𝑖 = 𝑤𝑒𝑖𝑔ℎ𝑡 𝑓𝑜𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑖.
σ 𝑤𝑖
Question 1: Suppose a marketing firm surveys 1,000 households to
determine the average number of TVs each household owns. The data
show many households with two or three TVs and a smaller number
with one or four. Every household in the sample has at least one TV and
no household has more than four. Find the mean number of TVs per
household.
Number of TVs per Household Number of Households

1 73

2 378

3 459

4 90
Q2) Consider the following purchase of a raw
material over the past three months. Calculate the
mean cost per pound of the raw material.
Purchase Cost per Pound($) Number of pounds

1 3.00 1200

2 3.40 500

3 2.80 2750

4 2.90 1000

5 3.25 800
Although mean is one of the most frequently used measures of central
tendency, we should be careful about taking decisions based on mean
value of the data.
Mean=95,000
At first glance, this seems like a reasonably high salary, suggesting the
employees are well-compensated. However, this mean is heavily
influenced by the exceptionally high salary of Employee J ($500,000),
which skews the average.
Sort the salaries
The median salary is $52,500, which is much lower than the mean. This
value gives a better indication of what a typical employee in this
company earns, as it is not affected by the extreme salary of Employee
J.
This example shows that the mean can be misleading in cases where
there are outliers or a skewed distribution.
Median
The median is the value in the middle when the data is arranged in
ascending order (smallest to largest).
• For an odd number of observations, the median is the middle value
• For even, the median is the average of the two middle values.
Example: The number of deposits in a branch of a bank in a week is
given below.
Day 1 2 3 4 5 6 7
Number 245 326 180 226 445 319 260
of
deposits

Median=260
Consider the following data:
220,180,235,240,270,260,250,425,300,500
Median=255

The median is more stable than the mean value, as adding a new
observation, the median may not change significantly. However, the
drawback of the median is that it is not calculated using the entire
dataset like in the case of the mean. We are just looking for the
midpoint instead of using the actual values of the data.
Mode is the most frequently occurring value in the dataset.

• If the data consists of two modes, we say that that data are bimodal.
• If the data contains more than two modes, we say that the data are
multimodal.
Question: A small local bakery wants to analyze the sales of its popular
cupcakes over the past month. They have recorded the number of
cupcakes sold each day. Here are the daily sales figures:
25,30,22,28,30,27,25,26,23,25,28,30,24,22,26,27,29,26,27,25.
Find mean, median, and mode.
Percentiles
A percentile is a term that describes how a score compares to
other scores from the same set.
Diff b/w percentage & percentile:
The percentage score reflects how well the student did on the
exam itself; the percentile score reflects how well he did in
comparison to other students.
• we say that a student scored 100 "percent" if and only if he had
scored 100/100.
• we say that a student scored 100 "percentile" if all the students
(100% students) scored less than him.
Steps to calculate percentile
Percentile is calculated by the ratio of the number of values
below ‘x’ to the total number of values.
• Arrange the data in ascending order

Q1):The scores of 10 students are 49, 47, 38, 58, 60, 65, 70, 80, 79, 92.
Using the percentile formula, calculate the percentile for score 70?
Q2): The weights of 10 people were recorded in kg as 35, 41, 42, 56,
58, 62, 70, 71, 90, 77. Find percentile for the weight 58 kg?
When percentile is given:
• Arrange all data values in the data set in ascending order
• Calculate n
• If n is not a integer, round up. The next integer greater than n denotes
the position of the percentile
• If n is an integer, then the percentile is an average of the values in the
positions n and n+1.

Q3): In a college, a list of scores of 10 students is announced. The

scores are 56, 45, 69, 78, 72, 94, 82, 80, 63, 59. Using the percentile
formula, find the 70th percentile.
Q4): Find the 50th & 85th percentile for the salary data :
3850,3950,4050,3880,3755,3710,3890,4130,3940,4325,3920,3880.
Deciles
Deciles correspond to special values of percentile that divide the data
into 10 equal parts. The first decile contains 10% of the data, second
decile contains 20% of the data and so on.
Quartiles
Quartiles divide the data into 4 equal parts. The first quartile 𝑄1
contains 25% of the data, 𝑄2 contains 50% of the data (median), 𝑄3
accounts for 75% of the given data.

Q5): Find 𝑄1 , 𝑄2 , 𝑄3 for the data given in question 4.

Q6): Consider a sample with data values of 27,25,20,15,30,34,28,25.
Find 25th, 50th and 75th percentiles.
Measures of dispersion (Measures of Variability)
Measures of variability are useful in identifying how close the records
are to the mean value and outliers in the data.
Variability in the data are measured using the following measurements:
• Range
• Interquartile range
• Variance
• Standard Deviation
• Co-efficient of variation
Inter-Quartile Range(IQR)
IQR is a measure of the distance between Quartile 1 and Quartile 3 in
the dataset. It measures the spread of the middle 50% of the data.
For the datapoints in Q4) (salary data) , Q3=4000, Q1=3865.
IQR=135
A smaller IQR indicates that the middle 50% of the data points are
close to each other. This means there is less variability in the middle
50% of the data. This indicates a moderate spread of salaries in the
middle 50% of your dataset.
Standard deviation:
A standard deviation is a measure of how dispersed the data is
in relation to the mean.
There are six steps for finding the standard deviation by hand:
• List each score and find their mean.
• Subtract the mean from each score to get the deviation from the
mean.
• Square each of these deviations.
• Add up all of the squared deviations.
• Divide the sum of the squared deviations by n – 1 (for a sample)
or N (for a population).
• Find the square root of the number you found.
Find the S.D. of the data given in Q4.
Co-efficient of Variation:
The coefficient of variation (CV) is a relative measure of variability that
indicates the size of a standard deviation in relation to its mean. It is
also known as the relative standard deviation (RSD).

Normal Distribution:
A normal distribution, also known as a Gaussian distribution or a bell
curve, is a common way to describe how values in a dataset are
distributed.
Skewness
• Positive skewness is when the distribution takes place so that we get
a long tail towards the right side of the graph. This is called a right-
skewed graph,
• In this distribution, the mean is greater than the median, which is
greater than the mode. That is, we get mean > median > mode.
Negative skewness is when the distribution takes place so that we get
a long tail towards the left side of the graph. This is called a left-
skewed [Link] mode > median > mean.
Right Skewness:
Examples: Income distribution- In many countries, the distribution of
income is right-skewed. While most people earn an average or below-
average income, a small number of individuals earn exceptionally high
incomes, creating a long tail on the right.
Real estate pricing: Housing prices in a city or region often exhibit right
skewness. Most houses may be priced within an affordable range, but
there are a few luxury properties priced much higher, extending the tail
to the right.
Left Skewness:
Time spent on a task by experts: The time taken by experts to
complete a specific task might be left-skewed. Most experts can
complete the task quickly, but a few may take longer, extending the tail
to the left.
Employee retirement age: In many companies, the ages at which
employees retire may be left-skewed. Most employees retire around a
standard retirement age, but some might retire earlier due to health
issues or personal choices, leading to a long tail on the left.
Kurtosis
Kurtosis is a measure of the peak of the distribution and indicates how
high the distribution around the mean. It indicates whether the
distribution is flat, normal or peaked shape.
Kurtosis is another measure of shape that goes by the shape of the tail.
That is, whether the tail of the distribution is heavy or light.
Formula

A kurtosis value <3 represents a platykurtic distribution

A kurtosis value >3 represents a leptokurtic distribution
A kurtosis value =3 represents a standard normal distribution
(mesokurtic).
Cross-sectional data- consists of several variables recorded at the same
time.
• examining the GDP of different countries in a single year
• comparing the financial statements of companies at a fixed date
Time Series Data - is recorded over consistent intervals of time.
• Monthly subscribers
• Weather records
• Inflation Rates: Monthly or yearly inflation rates.
• Stock Prices: Daily closing prices of a company’s stock over several years.
• GDP: Quarterly or annual Gross Domestic Product figures of a country.
• Sales Data: Weekly or monthly sales revenue of a retail store.
Data Visualization
Bar Chart: A bar chart is a frequency chart for qualitative (categorical)
data summarized in a frequency, relative frequency, or percent
frequency distribution.
Pie Chart: Used to show the relative freq. and percent freq. for
categorical data.
Dot Plot: Used to show the distribution of the quantitative data over
the entire range of data(horizontal axis).
Histogram: Used to show the frequency distribution of the
quantitative data over a set of class intervals.
Scatter Plot: A scatter diagram is a graphical display of the relationship
between two quantitative variables. Scatter plots are used to observe
relationships between variables.
Read about types of correlation.
Self study: Side by side bar chart, stacked bar chart

Descriptive Statistics in Economics
No ratings yet
Descriptive Statistics in Economics
45 pages
Class 8 Statistics Practice Set 11.3
No ratings yet
Class 8 Statistics Practice Set 11.3
32 pages
Statistics in Data Science Overview
No ratings yet
Statistics in Data Science Overview
155 pages
Statistical Methods in Social Sciences
No ratings yet
Statistical Methods in Social Sciences
69 pages
Descriptive Statistics Overview Guide
No ratings yet
Descriptive Statistics Overview Guide
48 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
23 pages
Statistical Analysis for Business Course
No ratings yet
Statistical Analysis for Business Course
35 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
5 pages
Understanding Measures of Central Tendency
No ratings yet
Understanding Measures of Central Tendency
102 pages
Numerical Measures in Data Analysis
No ratings yet
Numerical Measures in Data Analysis
46 pages
Numerical Descriptive Measures Explained
No ratings yet
Numerical Descriptive Measures Explained
33 pages
Numerical Measures for Data Analysis
No ratings yet
Numerical Measures for Data Analysis
48 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
30 pages
Descriptive Statistics: Numerical Measures
No ratings yet
Descriptive Statistics: Numerical Measures
7 pages
Understanding Measures of Dispersion
No ratings yet
Understanding Measures of Dispersion
17 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
13 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
74 pages
Descriptive Statistics in Biostatistics
No ratings yet
Descriptive Statistics in Biostatistics
43 pages
Statistics: Mean, Median, Mode, Variance
No ratings yet
Statistics: Mean, Median, Mode, Variance
16 pages
Central Tendency and Variation Explained
No ratings yet
Central Tendency and Variation Explained
62 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
76 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
10 pages
Basic Statistical Data Descriptions
No ratings yet
Basic Statistical Data Descriptions
133 pages
Central Tendency and Data Dispersion
No ratings yet
Central Tendency and Data Dispersion
63 pages
Understanding Measures of Variability
No ratings yet
Understanding Measures of Variability
34 pages
4 Measures of Central Tendency, Position, Variability
No ratings yet
4 Measures of Central Tendency, Position, Variability
49 pages
Midterm Lesson 2. Measures of Central Tendency To Z-Score
No ratings yet
Midterm Lesson 2. Measures of Central Tendency To Z-Score
40 pages
Lesson3 Descriptive Statistics Reviewer
No ratings yet
Lesson3 Descriptive Statistics Reviewer
12 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
64 pages
STA 102: Measures of Location in Statistics
No ratings yet
STA 102: Measures of Location in Statistics
15 pages
Descriptive Statistics in Data Analysis
No ratings yet
Descriptive Statistics in Data Analysis
49 pages
Statistical Process Control Tools Guide
No ratings yet
Statistical Process Control Tools Guide
152 pages
Understanding Descriptive Statistics Basics
No ratings yet
Understanding Descriptive Statistics Basics
34 pages
Business Statistics Assignment Insights
No ratings yet
Business Statistics Assignment Insights
6 pages
Statistics Essentials for Data Science
100% (2)
Statistics Essentials for Data Science
27 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
83 pages
Descriptive and Inferential Statistics Guide
No ratings yet
Descriptive and Inferential Statistics Guide
41 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
67 pages
Descriptive Statistics: Measures & Examples
No ratings yet
Descriptive Statistics: Measures & Examples
15 pages
Numerical Descriptive Measures Explained
No ratings yet
Numerical Descriptive Measures Explained
23 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Measures of Variability Explained
No ratings yet
Measures of Variability Explained
8 pages
Descriptive and Inferential Statistics Basics
No ratings yet
Descriptive and Inferential Statistics Basics
22 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
63 pages
Quadratic Mean Calculation Guide
No ratings yet
Quadratic Mean Calculation Guide
66 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
46 pages
Understanding Numerical Measures in Statistics
No ratings yet
Understanding Numerical Measures in Statistics
15 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
11 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
17 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
45 pages
Introduction To Statistics Simple Notes
No ratings yet
Introduction To Statistics Simple Notes
29 pages
Central Tendency and Variability Measures
No ratings yet
Central Tendency and Variability Measures
65 pages
Descriptive Statistics: Key Measures Explained
No ratings yet
Descriptive Statistics: Key Measures Explained
5 pages
Eli's Treasure Game: Points Analysis
No ratings yet
Eli's Treasure Game: Points Analysis
35 pages
Frequency Distribution and Dispersion Measures
No ratings yet
Frequency Distribution and Dispersion Measures
4 pages
Understanding Central Tendency in Data
No ratings yet
Understanding Central Tendency in Data
53 pages
Statistical Methods Course Overview
No ratings yet
Statistical Methods Course Overview
59 pages
Business Statistics Overview and Concepts
No ratings yet
Business Statistics Overview and Concepts
46 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
38 pages
Inside Sales Expertise in Hyderabad
No ratings yet
Inside Sales Expertise in Hyderabad
2 pages
Community and Academic Certificates Summary
No ratings yet
Community and Academic Certificates Summary
12 pages
Validity Types in Selection Techniques
No ratings yet
Validity Types in Selection Techniques
4 pages
Understanding Design for Six Sigma (DFSS)
No ratings yet
Understanding Design for Six Sigma (DFSS)
3 pages
WMSU CSM Students' Cybercrime Awareness
No ratings yet
WMSU CSM Students' Cybercrime Awareness
3 pages
New Product Development Stages Explained
No ratings yet
New Product Development Stages Explained
19 pages
Identifying Types of Claims in Texts
No ratings yet
Identifying Types of Claims in Texts
10 pages
Ghana Police Integrity Education Needs
No ratings yet
Ghana Police Integrity Education Needs
13 pages
White Spot Lesions - Diagnosis and Treatment - A Systematic Review
No ratings yet
White Spot Lesions - Diagnosis and Treatment - A Systematic Review
18 pages
West Bengal State University Jora Syallabus
0% (1)
West Bengal State University Jora Syallabus
5 pages
Cultus 1 Mona Baker and Andrew Chesterman
No ratings yet
Cultus 1 Mona Baker and Andrew Chesterman
30 pages
Reconstitutable Suspension of Azithromycin & Ambroxol
No ratings yet
Reconstitutable Suspension of Azithromycin & Ambroxol
6 pages
Research Methodology Overview
No ratings yet
Research Methodology Overview
18 pages
SMM527 Business Research Project Guide
No ratings yet
SMM527 Business Research Project Guide
9 pages
Islamic Education for Deaf Students
No ratings yet
Islamic Education for Deaf Students
25 pages
Fundamentals of Nursing Overview
No ratings yet
Fundamentals of Nursing Overview
18 pages
Descriptive Testing in Sensory Evaluation
50% (2)
Descriptive Testing in Sensory Evaluation
49 pages
Cleaning Validation Protocol Overview
92% (12)
Cleaning Validation Protocol Overview
15 pages
Proposal and Report Writing Guide
No ratings yet
Proposal and Report Writing Guide
9 pages
Evolution of Foresight J (1999-2022)
No ratings yet
Evolution of Foresight J (1999-2022)
19 pages
Strategic Leadership for AI in Higher Ed
No ratings yet
Strategic Leadership for AI in Higher Ed
11 pages
Data Modeling Principles Overview
100% (1)
Data Modeling Principles Overview
21 pages
HPE SWOT Analysis and Competitive Insights
No ratings yet
HPE SWOT Analysis and Competitive Insights
4 pages
Demand & Supply Planning Consultancy Portfolio
No ratings yet
Demand & Supply Planning Consultancy Portfolio
4 pages
Statistical Analysis Techniques
No ratings yet
Statistical Analysis Techniques
10 pages
French Geography in the 20th Century
No ratings yet
French Geography in the 20th Century
4 pages
Supervised Text Segmentation Model
No ratings yet
Supervised Text Segmentation Model
5 pages
Understanding Knowledge Exchange in Social Science
No ratings yet
Understanding Knowledge Exchange in Social Science
8 pages
Systems Simulation for Optimization Techniques
No ratings yet
Systems Simulation for Optimization Techniques
67 pages
Dr. Sultan Mahmood's CV: Food Security Expert
No ratings yet
Dr. Sultan Mahmood's CV: Food Security Expert
9 pages