Key Sampling and Statistical Terms
Key Sampling and Statistical Terms
In Statistical Inference, a sample statistic is referred to as the point estimator of the corresponding population parameter .
Since it is usually impossible to get all the numbers for a population, we have to use sample statistics to reasonably estimate
the population parameter!
Example: calculate an sample average of number of salmon in Puget Sound and use it as the estimate of the population average
of number of salmon in Puget Sound - "We can't count all the fish in the sea!!"
Measures of Location:
Location
Location locates, or positions, a data point against the full data set such as:
The Mean calculation which calculates a central value, a typical value, a value near the middle,
The Percentile calculation that can say things like "for that particular number, 75% of the other values are below that value
and 25% of the other values are below that value".
The Z-Score, which tells you the relative position of a number in units of Standard Deviation.
Geometric Mean
Used to calculate the average compounding rate per period for % change or % growth numbers. The average compounding
rate per period is the rate that can be used in multiplicative growth formulas to calculate an end amount from a begin
amount across equal size time periods. In finance it is used to calculate the "Average Compounding Rate Per Period" for an
investment. Use one of two formulas:
Geometric Mean Formula 1:
Use when you are given all the "Growth Rates" or "Rates of Change" or "% Change" amounts across all equal sized
periods
𝑛
𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 𝑀𝑒𝑎𝑛 1 = 1 + 𝐺𝑅1 ∗ 1 + 𝐺𝑅2 ∗ ⋯ 1 + 𝐺𝑅𝑛 , then subtract 1
Growth Factor = 1 + Growth Rate = Factor you can multiply by Begin Value to get End Value. Growth Factor ALWAYS >= 0
In Excel use:
GEOMEAN(RangeOfGrowthRates+1) = 1 + Geometric Mean = 1 + Average Compounding Rate per Period
GEOMEAN(RangeOfGrowthRates+1)-1 = Geometric Mean = Average Compounding Rate per Period
Geometric Mean Formula 2:
Use when you are given the Begin Value, End Value and the number of periods
𝐸𝑛𝑑𝑉𝑎𝑙𝑢𝑒
𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 𝑀𝑒𝑎𝑛 2 = ^(1/n)-1
𝐵𝑒𝑔𝑖𝑛𝑉𝑎𝑙𝑢𝑒
In Excel use:
RRI function. Function arguments: RRI(n,PV,FV) = RRI(NumberOfPeriods,BegValue, EndValue) = Geometric Mean =
Average Compounding Rate per Period
Formula: =(EndValue/BegValue)^(1/NumberOfPeriods)-1 = Geometric Mean = Ave. Compounding Rate per Period
MUST use Geometric Mean (not arithmetic mean) if you want the true "average" compounding rate per period
Arithmetic Mean overestimates
Arithmetic Mean is for additive processes. Geomean is for multiplicative processes
In finance, when calculating End Values, use Geometric Mean; however, arithmetic mean is used in some situations like for
Standard Deviation, Correlation, and other calculations that do not require true "average" compounding rate per period.
In Excel use: VAR.P function for population data and VAR.S for sample data.
Standard Deviation = SD
Standard Deviation uses all the data points, not just some like Range and Interquartile Range
Standard Deviation does not have squared units (like Variance) and is thus easier to interpret; the standard deviation has
the same units as the data.
Population Standard Deviation = "little sigma" = s ; Sample Standard Deviation = s
The sample standard deviation is a point estimator of the population standard deviation
Interpretation of Standard Deviation:
A numerical measure that says how much variability/dispersion there is in the data in relation to the mean
Standard Deviation is like an average of the deviations
Standard Deviation tells us how fairly the mean represents its data points
Standard Deviation tells us how clustered the data points are around the mean
For financial assets standard deviation is a measure of risk or fluctuation in asset value
Math Formulas: σ(𝑥𝑖 − 𝜇)2 σ(𝑥𝑖 − 𝑥) lj 2
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝜎 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑠 =
𝑁 𝑛−1
In Excel use: STDEV.P function for population data and STDEV.S for sample data.
Coefficient of Variation
In Excel create math formula = (Standard Deviation)/Mean = Coefficient of Variation
Coefficient of Variation converts Standard Deviation to Standard Deviation per unit of Mean so you can compare:
1) Data in different units.
or
2) Data in the same units, but the means are far apart.
Coefficient of Variation answers the question: "For every one unit of mean, what is the Standard Deviation?"
In Finance you see the inverse formula = (Average Return)/(Standard Deviation) = Return for 1 unit of risk
The UNIQUE array function can deliver unique set of items from a table, a column, or a row. A unique set of items is when you
select and list only items that occur exactly one time in the data set.
The SORT array function can sort a row, a column or a table in ascending or descending order. The default sort order is
ascending (smallest to biggest).
The SEQUENCE array function generates a sequence of numbers in a row, a column or a table, based on the formula inputs for
the number of rows in the final sequence, the number of columns in the final sequence, a start value and an increment value.
The FILTER array function allows you to filter a set of values to show only that values that meet a logical test. The array
argument contains the values that you want to filter. The include argument requires an array of TRUE and FALSE values (same
dimension as array argument values) to indicate which values to keep (TRUE) and with ones to filter out (FALSE).
Other Worksheet Functions used in Video:
The COUNTIFS function makes a conditional count calculation based on one or more logical tests. The criteria_range1
argument contains the full range with all the conditional items. The criteria1 argument contains one or more conditions for
counting items from the criteria_range1 argument. You can place one or more conditions into the criteria1 argument: when you
place one condition into the criteria1 argument, the COUNTIFS function delivers a single answer, but when you place more than
one condition into the criteria1 argument, the COUNTIFS function spills an array of answers into the worksheet, one for each
condition. If you need to use a comparative operator with the condition, you must join the comparative operator to the cell
with the condition, like: "<"&J27. You can have up to 127 pairs of criteria_rangeN criteriaN arguments that will run an AND
Logical Test to make the conditional count calculation.
Comparative Operator Note:
* When using comparative operators in functions like COUNTIFS, SUMIFS, AVERAGEIFS, MINIFS and MAXIFS, you must join the
comparative operator to the cell with the condition, like: ">"&J28.
* But when you use a comparative operator in a formula that makes a direct logical test formula calculation, you do not use
quotes or an ampersand (join operator), like: CPAScoreTable[CPA Test 1]<J27. Example of this note is in video #15.
Unpivot Power Query feature to convert an Improper Data Set to a Proper Data Set:
The Improper Data Set has a unique list of elements from a single variable column show as column headers, with data points
below each column header. The Improper Data Set is more difficult to deal with when performing data analysis. The Proper
Data Set lists only the variable fields and does not show elements from a variable field as column headers. A Proper Data Set
shows only variable fields, in the CPA example: CPATest and Score. The feature that you use to convert the Improper Data Set
to a Proper Data Set is the Unpivot feature.
σ(𝑥𝑖 − 𝜇)2 lj 2
σ(𝑥𝑖 − 𝑥)
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠 2 =
𝑁 𝑛−1
σ(𝑥𝑖 − 𝜇)2 lj 2
σ(𝑥𝑖 − 𝑥)
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝜎 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑠 =
𝑁 𝑛−1
These two not required for this class: Xbar = "Xbar" Sample Mean
Alternative for variability measure called Mean Absolute Error
µ = "Mew" Population Mean
σ 𝑥𝑖 − 𝑥ҧ s = "s" Sample Standard Deviation
𝑀𝐴𝐸 = s = "sigma" Sample Standard Deviation
𝑛
xi = "x sub i" Particular Value
Alternative to calculate Sample SD ∑ = "Sigma" Greek Letter used for "adding"
n = "n" Count of sample items
N = "n" Count of population items
σ 𝑥𝑖2 − 𝑛𝑥ҧ 2
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑠
𝑛−1
80 85 90 95 100 55 60 65 70 75 80 85 90 95
History Test Scores Calculus Test Scores
Z-score measures the relative location of a particular x in the data set (as compared to the mean), in units of standard deviation.
The Improper Data Set has a unique list of elements from a single variable column show as column
headers, with data points below each column header. The Improper Data Set is more difficult to deal
with when performing data analysis. The Proper Data Set lists only the variable fields and does not show
elements from a variable field as column headers. A Proper Data Set shows only variable fields, in the
CPA example: CPATest and Score. The feature that you use to convert the Improper Data Set to a Proper
3 Data Set is the Unpivot feature.
4
5 Improper Data Set with a Pivoted Column (Field) Proper Data Set with only Variable Fields
6
7 CPA Test 1 CPA Test 2 CPA Test 3 CPA Test 4 CPA Test 5 CPATest Score
8 22 28 43 35 55 CPA Test 1 22
9 49 14 32 45 53 CPA Test 2 28
10 70 5 52 61 30 CPA Test 3 43
11 13 65 5 37 68 CPA Test 4 35
12 52 80 51 52 58 CPA Test 5 55
13 44 67 62 55 31 CPA Test 1 49
14 18 81 56 45 61 CPA Test 2 14
15 52 47 31 7 57 CPA Test 3 32
16 49 51 47 44 87 CPA Test 4 45
17 37 67 12 57 60 CPA Test 5 53
18 78 66 64 58 66 CPA Test 1 70
19 75 81 61 50 66 CPA Test 2 5
20 70 63 49 53 64 CPA Test 3 52
21 51 15 67 39 32 CPA Test 4 61
22 50 54 34 34 62 CPA Test 5 30
23 70 50 89 60 17 CPA Test 1 13
1919 CPA Test 2 50
1920 CPA Test 3 13
1921 CPA Test 4 50
1922 CPA Test 5 61
1923
1924