Inferential Statistics
Inferential statistics is a field that uses analytical tools to infer conclusions about a population by examining random samples.
Use permutation, combination, and probability basics to find an estimation's likelihood.
Understand probability distribution such as binomial distribution, normal distribution, standard normal distribution, etc., to estimate the variability of
occurrence of an event.
Understand sampling and sampling distribution to simplify the process of statistical inference when a large number of samples is drawn from the
population.
Apply the central limit theorem to safely assume that the sampling distribution of the mean will be normal in most cases.
Apply the central limit theorem for interval estimation to calculate an interval of possible (or probable) values of an unknown population parameter.
Common Interview Questions
Inferential Statistics Why Inferential Statistics?
1. What does Inferential Statistics mean?
Little to no information about the
2. What are the different types of Distributions? population sample
3. How can the sample data be drawn from the population? Probability Randomly drawn sample from the
population set
4. What is the difference between inferential statistics and Permutation and combination
Analysis of data set
descriptive statistics? Events
Types of events More information about the
5. Explain Confidence level, margin of error, and confidence population data set
Addition and multiplication rule
interval Types of probability
6. What does Confidence Level signify?
7. What is the use of the central limit theorem? Confidence Intervals
8. Can confidence interval be negative? Probability Distributions
Sampling
9. What are the techniques applied to gather sample data? Expected value Sampling distribution
Binomial distribution Properties of sampling distribution
Normal distribution Central limit theorem
Standard normal distribution
Interval estimation
Inferential Statistics
Permutation: Probability: Bayes' Theorem: Helps to calculate the probability of one event when other one already
Probability refers to the chances of occurrence occured
Arranging r objects out of n distinct objects
in nPr ways of a given event
n! P(B\A)P(A)
Ordering has significance P = Probability of Event A= P (A\B) = ; P(B) ≠ 0
n r (n - r)! P(B)
No. of ways an event can occur
P(A) = Probability Distribution:
Combination: No. of all possible outcomes
Selecting r objects out of n distinct objects A mathematical function that gives the probabilities of occurrence of different possible outcomes
in nCr ways Important Terminology: for an experiment.
n!
Ordering is not important nCr = Experiment: Results in well-defined Histogram of values Histogram of values
r!(n - r)! outcomes. [Link] a coin
0.4
200
Example: Four-letter words formed using the Event: Any collection of outcomes of an
letters of the word UPGRAD experiment
Frequency (N = 1000)
0.3
150
Relative density
Step 1: Select 4 letters from UPGRAD Random experiment: Do not know the exact
outcome but know the set of all possible
6C4 = 15 ways (Combination) outcomes
0.2
100
Step 2: Arrange the selected 4 letters Important Terminology:
4P4 = 24 ways (Permutation)
0.1
1. 0≤ P(E) ≤ 1
50
Ans: Number of four-lettered words is 2. Sum of all possible outcomes is 1, i.e.,
15x24 = 360 (Σ P(Ei)= 1)
0.0
0
3. Probability of an impossible event is 0. -3 -2 -1 0 1 2 3 4 -3 -2 -1 0 1 2 3 4
Random values Random values
4. Probability of an event can never be negative.
Rules of Probability: Expected value:
The value of X that we would ’expect‘ to get after performing an experiment an infinite number of
S Addition: P(A U B) = P(A) + P(B) - P(A ∩ B)
times
Complement: P(AC) = 1 - P(A) Also called as expectation, mean or weighted average of probability
Conditional: P(B|A) = P(A ∩ B) / P(A)
A-B A∩B B-A Multiplication: P(A) = P(B|A) x P(A ∩ B) EV(X) = x1 X P(x1) + x2 X P(x2) + x3 X P(x3) + . . . + xn X P(x)
If A and B are independent, P(A ∩ B) = P(A)xP(B) Where X = {x1, x2, x3, . . . , xn}
Types of Events 0.2 $2 million 0.3 $3 million
Independent event: Probability of occurrence Project A Project B
S is the experiment, P(S) = 1 of two or more events is not affected by each 0.8 $500,000 0.7 $200,000
other
Disjoint/Mutually exclusive event: Events EV (Project A) = [0.2 × $2,000,000] + [0.8 × $500,000] = $8,00,000
cannot occur at the same time EV (Project B) = [0.3 × $3,000,000] + [0.7 × $200,000] = $1,040,000
Inferential Statistics
Probability Distribution Continuous Probability Distribution
Frequently used concepts:
Density Function
Discrete Probability Distribution Continuous Probability Distribution
Cumulative Distribution Function: A distribution that plots the cumulative probability of
Binomial distribution Normal distribution X against X
Bernoulli’s distribution Standard normal distribution Probability Density Function: A function in which the area under the curve gives the
Poisson distribution Students T distribution cumulative probability
Chi-square distribution
0.06
1
Discrete Probability Distribution 0.9 0.05
Probability Density
Cumulative Probability
0.8
Binomial Distribution 0.7 0.04
Determines the probability of observing a specified number of successful outcomes in a specified 0.6
0.5 0.03
number of trials.
0.4
0.28 0.02
Where n - total number of trials, 0.3 28
Area under the
P(X = r) = Cr(p) (1-p)
n r n-r 0.2 0.01
p - is the probability of success, 0.1
28 20 curve = 0.28
r - is the number of successes 0 0
15 20 25 30 35 40 45 50 15 20 25 30 35 40 45 50
X (commute time) X (commute time)
Application: Tossing a coin 20 times to see how many tails occur
Not applicable: Tossing a coin until a head appears
Continuous Probability Distribution
Cumulative probability:
e ( (
99.7% of the data are within 1 x-μ 2
Cumulative probability of X, denoted by F(x), is defined as the probability of the variable being
3 standard deviations of the mean 1 - 2 σ
less than or equal to x. 95% within 2 standard deviations PDF =
68% within 1 standard
deviation
σ 2π
F(x) = P(X< x) μ = Mean of the distribution
σ = Standard deviation
Z-score (Standardise normal variable):
How many standard deviations away from
the mean is your random variable is given by
x-μ
Z= σ
Inferential Statistics
Central Limit Theorem Sampling Distribution:
If sufficiently large random samples are drawn from the population with replacement , then the Sampling Distribution’s Mean (μ¯X) = Population Mean (μ)
distribution of the sample means will be approximately normally distributed. Sampling Distribution’s Standard Deviation (Standard error) = σ/√n
Population For n > 30, the Sampling Distribution becomes a normal distribution.
Confidence Interval
Population Mean (μ) = Sample Mean (x) + Margin of error
A sample with sample size n, mean x and standard deviation S. Now, the y% confidence
interval (i.e., the confidence interval corresponding to a y% confidence level) for μ would
be given by the range:
Sampling Inference
S
CI = x ± z where, Z* is the Z-score associated with a y%
n confidence level
The probability associated with the claim is called the confidence level
1. The maximum error made in a sample mean is called the margin of error
2. The final interval of values is called the confidence interval. [Here, it is the range]
Sample
Population/Sample Term Notation Formula
Number of items/elements in the
Population Size N
population
Population i
Population Mean μ Σ i == 1N Xi
(X1, X2, X3, ....., XN)
N
i
Population Variance σ2 Σ i == 1N (Xi - µ)2
N
Number of items/elements in the
Sample Size n
sample
Sample
i
(X1, X2, X3, ....., Xn) Sample Mean X Σ i == 1n Xi
(Sample of Population) n
i=n
Sample Variance S2 Σ i = 1 (Xi - X)2
n-1