0% found this document useful (0 votes)

15 views29 pages

Statistics for Data Science Overview

The document discusses key concepts in statistics including measures of central tendency like mean, median and mode. It explains how to calculate and apply these measures to data sets. It also covers variance, standard deviation, covariance and correlation which are measures used to understand the spread and relationship of different variables.

Uploaded by

Sakshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views29 pages

Statistics for Data Science Overview

Uploaded by

Sakshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

STATISTICS FOR DATA

SCIENCE
STATISTICS

• Statistics is the discipline that concerns the collection, organization,

analysis, interpretation, and presentation of data. In applying statistics to a
scientific, industrial, or social problem, it is conventional to begin with a
statistical population or a statistical model to be studied.
CENTRAL TENDENCY

• The central tendency is stated as the statistical measure that represents the
single value of the entire distribution or a dataset. It aims to provide an
accurate description of the entire data in the distribution.
• In statistics, the central tendency is the descriptive summary of a data set.
Through the single value from the dataset, it reflects the centre of the data
distribution. Moreover, it does not provide information regarding individual
data from the dataset, where it gives a summary of the dataset
MEAN
• The mean, median and mode are the three commonly used measures of central tendency.
• Mean is the average of the given numbers and is calculated by dividing the sum of given
numbers by the total number of numbers.
• Mean = (Sum of all the observations/Total number of observations)
• Example: What is the mean of 2, 4, 6, 8 and 10?
• Solution: First, add all the numbers.
• 2 + 4 + 6 + 8 + 10 = 30
• Now divide by 5 (total number of observations).
• Mean = 30/5 = 6
• Calculate the mean from the data showing marks of students in a class in
a test: 40, 50, 55, 78, 58.
• A class consists of 50 students, out of which 30 are girls. The mean of
marks scored by girls in a test is 73 (out of 100), and that of boys is 71.
Determine the mean score of the whole class.
• A batsman has a certain average of 9 matches. In the 10th match, he
scores 100 runs and his average increased by 8 runs. What is his new
average?
Find the mean salary of 60 workers of a
factory from
Salary (in Rs.)
the following
Number of workers
table:
3000 16
4000 12
5000 10
6000 8
7000 6
8000 4
9000 3
10000 1
Total 60
MEdian

• The median is the middle number in a sorted, ascending or descending list of numbers and can be more descriptive of that
data set than the average. It is the point above and below which half (50%) the observed data falls, and so represents the
midpoint of the data.
• The median is the middle number in a sorted list of numbers and can be more descriptive of that data set than the
average.
• The median is sometimes used as opposed to the mean when there are outliers in the sequence that might skew the
average of the values.
• If there is an odd amount of numbers, the median value is the number that is in the middle, with the same amount of
numbers below and above.
• If there is an even amount of numbers in the list, the middle pair must be determined, added together, and divided by two
to find the median value.
• In a normal distribution, the median is the same as the mean and the mode.
If the number of observations n is an odd number, then the median is represented by the numerical value of x,
corresponds to the positioning point of n+1 / 2 in ordered observations. That is,
Median = value of (n+1 / 2)th observation in the data array
If the number of observations n is an even number, then the median is defined as the arithmetic mean of the middle values in
the array That is
• The export of agricultural product in million dollars from a country during
eight quarters in 1974 and 1975 was recorded as 29.7, 16.6, 2.3, 14.1, 36.6,
18.7, 3.5, 21.3
Find the median of the given set of values
• The number of rooms in the seven five stars hotel in Chennai city is 71, 30,
61, 59, 31, 40 and 29. Find the median number of rooms
Cumulative Frequency
In a grouped distribution, values are associated with frequencies.
The cumulative frequencies are calculated to know the total number of items above or below a certain limit.
This is obtained by adding the frequencies successively up to the required level.
This cumulative frequencies are useful to calculate median, quartiles, deciles and percentiles.

Median for Discrete grouped data

We can find median using following steps

i. Calculate the cumulative frequencies
ii. Find (N+1)/2, where N=Σf=total frequencies
iii. Identify the cumulative frequency just greater than (N+1)/2
iv. The value of x corresponding to that cumulative frequency is the (N+1)/2 median.
Solution:

The cumulative frequency greater than

30.5 is [Link] value of x corresponding to
38 is 40. The median weight of the
• The following data are the weights of students
students isin
40akgs
class. Find the median weights o
the students
MODE

• The mode is one of the values of the measures of central tendency. This
value gives us a rough idea about which of the items in a data set tend to
occur most frequently.
• For example, you know that a college is offering 10 different courses for
students. Now, out of these, the course that has the highest number of
registrations from the students will be counted as the mode of our given
data (number of students taking each course).
VARIANCE

• The term variance refers to a statistical measurement of the spread between

numbers in a data set. More specifically, variance measures how far each number in
the set is from the mean (average), and thus from every other number in the set.
Variance is often depicted by this symbol: σ2. It is used by both analysts and
traders to determine volatility and market security.

• The square root of the variance is the standard deviation (SD or σ), which helps
determine the consistency of an investment’s returns over a period of time.
• Variance is a measurement of the spread between numbers in a data set.
• In particular, it measures the degree of dispersion of data around the sample's
mean.
• Investors use variance to see how much risk an investment carries and whether it
will be profitable.
• Variance is also used in finance to compare the relative performance of each asset
in a portfolio to achieve the best asset allocation.
• The square root of the variance is the standard deviation.
Let’s say the heights (in mm) are 610, 450, 160, 420,
310.
Mean and Variance is interrelated. The first step is
finding the mean which is done as follows,
Mean = ( 610+450+160+420+310)/ 5 = 390
So the mean average is 390 mm.
To calculate the Variance, compute the difference of
each from the mean, square it and find then find the
average once again.
So for this particular case the variance is :
= (2202 + 602 + (-230)2 +302 + (-80)2)/5
= (48400 + 3600 + 52900 + 900 + 6400)/5
Final answer : Variance = 22440
Q. Find the variance of the numbers 3, 8, 6, 10, 12, 9,
11, 10, 12, 7.
STANDARD DEVIATION

• Standard deviation is a statistic that measures the dispersion of a dataset

relative to its mean and is calculated as the square root of the variance. The
standard deviation is calculated as the square root of variance by
determining each data point's deviation relative to the mean.

• If the data points are further from the mean, there is a higher deviation
within the data set; thus, the more spread out the data, the higher the
standard deviation.
• Standard deviation measures the dispersion of a dataset relative to its mean.
• It is calculated as the square root of the variance.
• Standard deviation, in finance, is often used as a measure of a relative riskiness of
an asset.
• A volatile stock has a high standard deviation, while the deviation of a stable blue-
chip stock is usually rather low.
• As a downside, the standard deviation calculates all uncertainty as risk, even when
it’s in the investor's favor—such as above-average returns.
FORMULA
• Find the variance and standard deviation of the following scores on an exam: 92,
95, 85, 80, 75, 50
• Find the standard deviation of the average temperatures recorded over a five-day
period last winter: 18, 22, 19, 25, 12
• Find the variance and standard deviation of the following scores on an exam: 92,
95, 85, 80, 75, 50
• Find the variance and standard deviation for the five states with the most covered
bridges: Oregon: 106 Vermont: 121 Indiana: 152 Ohio: 234 Pennsylvania: 347
Difference between variance, covariance and
correlation

• Variance tells us how much a quantity varies w.r.t. its mean. Its the spread of data
around the mean value. You only know the magnitude here, as in how much the
data is spread.

• Covariance tells us direction in which two quantities vary with each other.

• Correlation shows us both, the direction and magnitude of how two quantities vary
with each other.
COVARIANCE

• Covariance is a measure of the relationship between two random variables,

in statistics. The covariance indicates the relation between the two variables
and helps to know if the two variables vary together. In the covariance
formula, the covariance between two random variables X and Y can be
denoted as Cov(X, Y).
• Covariance formula
• Covariance formula for population:
• Where,
• XiXi is the values of the X-variable
• YiYi is the values of the Y-variable
• ¯¯¯¯¯XX¯ is the mean of the X-variable
• ¯¯¯¯YY¯ is the mean of the Y-variable
• nn is the number of data points
Question: The table below describes the rate of economic growth (xi) and the rate of return on the S&P 500 (yi). Using the covariance formula, determine whether
economic growth and S&P 500 returns have a positive or inverse relationship. Before you compute the covariance, calculate the mean of x and y.

Economic Growth % (xi) S&P 500 Returns % (yi)

2.1 8
2.5 12
4.0 14
3.6 10
• Find covariance for following data set x = {2,5,6,8,9}, y = {4,3,7,5,6}
• Using the covariance formula, find covariance for following data set x =
{5,6,8,11,4,6}, y = {1,4,3,7,9,12}.
What is a Correlation?

• Correlation refers to a process for establishing whether or not relationships

exist between two given variables.
• So, through this coefficient, one can get a general idea about whether or not
two variables are related
• A correlation coefficient very close to zero, but either positive or negative,
will imply little or no relationship between the two variables. Correlation
coefficient close to +1 means an increase in one of the variables being
associated with increases in the other variable.
PEARSON CORRELATION COEFFICIENT

• The covariance of two variables divided by the product of their standard deviations gives
Pearson’s correlation coefficient. It is usually represented by ρ (rho).
• ρ (X,Y) = cov (X,Y) / σX.σY.
• Here cov is the covariance. σX is the standard deviation of X and σY is the standard deviation of
Y. The given equation for correlation coefficient can be expressed in terms of means and
expectations.
• ρ(X,Y)=E(X−μx)(Y−μy)σx.σy
• μx and μy are mean of x and mean of y respectively. E is the expectation.
•
x 50 51 52 53 54
y 3.1 3.2 3.3 3.4 3.5
• Example 1: Calculate the Correlation coefficient of given data:
x 50 51 52 53 54
y 3.1 3.2 3.3 3.4 3.5
xy 155 163.2 171.6 180.2 189
2
x 2500 2601 2704 2809 2916
2
y 9.61 10.24 10.89 11.56 12.25
x 12 15 18 21 27
• Calculate the Correlation coefficient of given data:
y 2 4 6 8 12

Central Tendency & Dispersion in Statistics
No ratings yet
Central Tendency & Dispersion in Statistics
31 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
64 pages
Mean, Variance, and Std Dev Calculations
No ratings yet
Mean, Variance, and Std Dev Calculations
16 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
11 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
32 pages
Central Tendency and Dispersion Explained
100% (1)
Central Tendency and Dispersion Explained
30 pages
Normal Distribution and Central Tendency
No ratings yet
Normal Distribution and Central Tendency
44 pages
Measures of Variability in Data Analysis
No ratings yet
Measures of Variability in Data Analysis
5 pages
Standard Deviation Formulas Explained
No ratings yet
Standard Deviation Formulas Explained
10 pages
Understanding Statistical Measures
No ratings yet
Understanding Statistical Measures
17 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
32 pages
Fundamental Statistical Concepts
No ratings yet
Fundamental Statistical Concepts
33 pages
Understanding Measures of Dispersion in Statistics
No ratings yet
Understanding Measures of Dispersion in Statistics
16 pages
Understanding Measures of Central Tendency
No ratings yet
Understanding Measures of Central Tendency
54 pages
Central Tendency & Dispersion Explained
No ratings yet
Central Tendency & Dispersion Explained
35 pages
Statistics in Data Science Overview
No ratings yet
Statistics in Data Science Overview
155 pages
Variance and Standard Deviation Explained
No ratings yet
Variance and Standard Deviation Explained
11 pages
Introduction To Statistics Simple Notes
No ratings yet
Introduction To Statistics Simple Notes
29 pages
Statistical Measures for Missing Data
No ratings yet
Statistical Measures for Missing Data
6 pages
Biostatistics: Averages & Dispersion
No ratings yet
Biostatistics: Averages & Dispersion
85 pages
Data Management Review Descriptive Statistics
No ratings yet
Data Management Review Descriptive Statistics
62 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
10 pages
CH - 2 Stastical Data Analysis
No ratings yet
CH - 2 Stastical Data Analysis
46 pages
Central Tendency and Variation Measures
No ratings yet
Central Tendency and Variation Measures
24 pages
Statistical Methods in Social Sciences
No ratings yet
Statistical Methods in Social Sciences
69 pages
Understanding Measures of Variability
No ratings yet
Understanding Measures of Variability
48 pages
Data Analysis Techniques Overview
No ratings yet
Data Analysis Techniques Overview
30 pages
Understanding Mean and Variance
No ratings yet
Understanding Mean and Variance
3 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
5 pages
Central Tendency and Dispersion Explained
No ratings yet
Central Tendency and Dispersion Explained
28 pages
Statistical Measures in Research Methodology
No ratings yet
Statistical Measures in Research Methodology
6 pages
Descriptive Statistics Overview: T-Test & Measures
No ratings yet
Descriptive Statistics Overview: T-Test & Measures
6 pages
Central Tendency and Variability Measures
No ratings yet
Central Tendency and Variability Measures
13 pages
Basic Econometrics Course Overview
No ratings yet
Basic Econometrics Course Overview
86 pages
L 12 Biostat Session 2
No ratings yet
L 12 Biostat Session 2
30 pages
Understanding Central Tendency Measures
No ratings yet
Understanding Central Tendency Measures
9 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
80 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
30 pages
Understanding Standard Deviation in Research
No ratings yet
Understanding Standard Deviation in Research
27 pages
Descriptive Statistics: Central Tendency & Dispersion
No ratings yet
Descriptive Statistics: Central Tendency & Dispersion
7 pages
Understanding QBM in Business Statistics
No ratings yet
Understanding QBM in Business Statistics
62 pages
Probability and Statistics Course Overview
No ratings yet
Probability and Statistics Course Overview
12 pages
College Textbook Price Distribution Analysis
No ratings yet
College Textbook Price Distribution Analysis
87 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
46 pages
Mean, Median, Mode, and Dispersion Calculations
No ratings yet
Mean, Median, Mode, and Dispersion Calculations
15 pages
Central Tendency and Variation Explained
No ratings yet
Central Tendency and Variation Explained
12 pages
Summary Statistics: Central Tendency & Dispersion
No ratings yet
Summary Statistics: Central Tendency & Dispersion
26 pages
Understanding Central Tendency in Statistics
No ratings yet
Understanding Central Tendency in Statistics
7 pages
Biostatistics Concepts and Applications
No ratings yet
Biostatistics Concepts and Applications
67 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
16 pages
Understanding Measures of Dispersion
No ratings yet
Understanding Measures of Dispersion
34 pages
Univariate Analysis in Statistics
No ratings yet
Univariate Analysis in Statistics
63 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
58 pages
Understanding Frequency Distribution in Statistics
No ratings yet
Understanding Frequency Distribution in Statistics
13 pages
Essential Statistics for Data Science
No ratings yet
Essential Statistics for Data Science
93 pages
Research
No ratings yet
Research
13 pages
03 Numerical Summaries of Data
No ratings yet
03 Numerical Summaries of Data
30 pages
Central Tendency & Dispersion in Statistics
No ratings yet
Central Tendency & Dispersion in Statistics
6 pages
Mean Calculation for Successes in Sample
No ratings yet
Mean Calculation for Successes in Sample
510 pages
ROI Comparison in Cement & Sanitary Sectors
No ratings yet
ROI Comparison in Cement & Sanitary Sectors
40 pages
How to Write an Effective Case Study
No ratings yet
How to Write an Effective Case Study
12 pages
Mcom Project
No ratings yet
Mcom Project
73 pages
Overview of Financial Systems
No ratings yet
Overview of Financial Systems
111 pages
Commodity CFD Key Information Guide
No ratings yet
Commodity CFD Key Information Guide
22 pages
Monthly NAV Trends for Mutual Funds
No ratings yet
Monthly NAV Trends for Mutual Funds
14 pages
Hisrich's Entrepreneurship Insights
100% (1)
Hisrich's Entrepreneurship Insights
40 pages
(Marketing Strategy Collection) Minahan, Stella. - Ogden-Barnes, Steve-Sales Promotion Decision Making - Concepts, Principles, and Practice-Business Expert Press (2015)
No ratings yet
(Marketing Strategy Collection) Minahan, Stella. - Ogden-Barnes, Steve-Sales Promotion Decision Making - Concepts, Principles, and Practice-Business Expert Press (2015)
160 pages
Asset Liability Management in Banking
No ratings yet
Asset Liability Management in Banking
15 pages
Securities Listing Procedure Overview
No ratings yet
Securities Listing Procedure Overview
5 pages
Understanding Money Supply Mechanics
100% (1)
Understanding Money Supply Mechanics
39 pages
Solar PV Technoeconomic Analysis Guide
No ratings yet
Solar PV Technoeconomic Analysis Guide
3 pages
Funds Flow Statement Analysis of CCD
No ratings yet
Funds Flow Statement Analysis of CCD
37 pages
Constitutional Basis for Taxation Authority
No ratings yet
Constitutional Basis for Taxation Authority
19 pages
Home Office and Branch Accounting Overview
No ratings yet
Home Office and Branch Accounting Overview
4 pages
Qicl Vensat Ddschedule Jan2025
No ratings yet
Qicl Vensat Ddschedule Jan2025
3 pages
Alpha Electronics Supply Chain Strategy
0% (1)
Alpha Electronics Supply Chain Strategy
17 pages
Renuka Holdings PLC Earnings Review
No ratings yet
Renuka Holdings PLC Earnings Review
12 pages
Risks in International Business Transactions
No ratings yet
Risks in International Business Transactions
9 pages
Unit 22 Test Answers - Destination C1
No ratings yet
Unit 22 Test Answers - Destination C1
5 pages
Feasibility of Shrimp Farming in Bahawalnagar
No ratings yet
Feasibility of Shrimp Farming in Bahawalnagar
2 pages
Understanding Going Concern Value
No ratings yet
Understanding Going Concern Value
23 pages
Brazil's Economic Resilience and Growth
No ratings yet
Brazil's Economic Resilience and Growth
4 pages
Kudumbashree Famous Bakers Project Report
No ratings yet
Kudumbashree Famous Bakers Project Report
21 pages
GE 9 Cell Matrix Analysis of Nestle
No ratings yet
GE 9 Cell Matrix Analysis of Nestle
29 pages
How Uber Makes Money - CB Insights Research PDF
No ratings yet
How Uber Makes Money - CB Insights Research PDF
1 page
Bonds and Stocks: Investment Basics
No ratings yet
Bonds and Stocks: Investment Basics
3 pages
Multivariate Kelly Criterion for Portfolios
No ratings yet
Multivariate Kelly Criterion for Portfolios
15 pages
Neo-Colonialism and Global Domination
No ratings yet
Neo-Colonialism and Global Domination
15 pages
Understanding Financial Derivatives
No ratings yet
Understanding Financial Derivatives
18 pages

Statistics for Data Science Overview

Uploaded by

Statistics for Data Science Overview

Uploaded by

STATISTICS FOR DATA

• Statistics is the discipline that concerns the collection, organization,

Median for Discrete grouped data

We can find median using following steps

The cumulative frequency greater than

• The term variance refers to a statistical measurement of the spread between

• Standard deviation is a statistic that measures the dispersion of a dataset

• Covariance is a measure of the relationship between two random variables,

Economic Growth % (xi) S&P 500 Returns % (yi)

• Correlation refers to a process for establishing whether or not relationships

You might also like