STATISTICAL INFERENCE
Dr. Samuel Adjorlolo
OUTLINE
• Fundamentals of Statistical Inference
• Hypothesis Testing
INTRODUCTION
Inferential statistics allow researchers to draw
conclusions about population parameters, based on
statistics from a sample.
Inferences are based on likelihood, hence there is a
risk of error
Fundamentals of Probability
• There are two mutually exclusive possibilities in testing the
effectiveness of transcutaneous nerve treatment for
alleviating pain;
The experimental treatment is not effective in reducing pain
• This is called the null hypothesis (H0)
The experimental treatment is effective in reducing pain
• This is called the alternative hypothesis (H1)
Probability of (consecutive) Event
• P(event) = number of ways the specified event can
occur/total number of possible events
• Probability of obtaining heads when a coin is tossed is 1/2 =
0.5
• The probability of obtaining 6 when a die is played is 1/6 =
0.17
Probability of Consecutive Events
• The multiplicative law of probability:
p(A and then B) = p(A)*p(B);
• A is the first independent event
• B is the second independent event
• Probability of obtaining two consecutive heads when a coin is
tossed = 0.5*0.5 = .25
• What is the probability of obtaining ten consecutive heads?
Toss No. Possible Outcomes and Probabilities
1 H = .500
2 HH = .250 This table can be used to test the
3 HHH = .125
hypothesis that a coin tossing is
4 HHHH = .063
biased
5 HHHHH = .031
H0: The coin is fair
6 HHHHHH = .016
H1: The coin is biased
7 HHHHHHH = .008
8 HHHHHHHH = .004
What is the probability of
9 HHHHHHHHH = .002
obtaining 10 consecutive heads?
10 HHHHHHHHHH = .001
• We can conclude that the coin tossing is bias.
• This analogy extend to the research arena where
researchers often based their decisions by consulting
some probability tables or values.
Sampling Distribution and Error
• A sample statistic does not necessarily equate the
corresponding population parameter because of sampling
error
• Sampling error reflects the tendency of statistics to fluctuate
from one sample to another
• It is the difference between the obtained sample value and
population parameter
Central Limit Theorem
• Stipulates that the mean of the sampling distribution is
identical to the population mean of raw scores.
• That is, as sample size increases, its mean approximate the
population mean
Standard Error of the Mean (SEM)
• It is referred to as the standard deviation of a sampling
distribution of the mean.
• The error implies that various sample means of a
population contain some error as estimates of the
population mean.
• Standard signifies that SEM is an index of the average
amount of error for all possible sample means
Standard Error of the Mean (SEM)
• The smaller the SEM, the more accurate are the
sample means as estimates of the population value.
Example:
• Sx̅ = SD/√N
Let SD = 5 and N= 25
• Where
– Sx̅ = estimated SEM SEM= 1
– SD = standard deviation of the sample
– N = number of cases in the sample How can we decrease
the SEM?
ESTIMATION OF PARAMETERS
• Inferential statistics address two broad goals:
• Estimate the value of population parameters
– can be either point estimates and/or interval
estimates
• Test hypothesis (most widely used)
Point Estimation
• Calculating a single value to estimate the parameter.
– E.g., a mean of 10 could be taken as representing a
population mean
• Major challenges
– It offers no context for interpreting their accuracy
– It gives no information regarding the probability that it is
correct or close to the population mean.
Interval Estimation
• Calculating a range of value that has a high probability of
containing the population value.
• This involves constructing a confidence interval (CI)
with values at the boundaries serving as confidence
limits
• E.g., CI: 59-63 (59 and 63 are lower and upper limits
respectively).
Calculating Confidence Interval
• When the population SEM is known
• In calculating CI, confidence level, sample mean and
SEM are used.
When 99% is used
instead to reduce error
• E.g., 95% CI = x̅ ± (1.96* σx̅ )
by substitution
• Let x̅ = 61 and population SEM = 1
99% CI = 61 ± (2.58*1)
• 95% CI = 61 ± (1.96*1)
• 95% CI = 59.04≤µ≤62.96 99% CI = 58.42≤µ≤63.58
Calculating Confidence Interval
• When the population SEM is not known
• t distribution is used to calculate the CI
• t distribution is similar to normal distribution but the
former is affected by sample size.
• There is t distribution for different sample sizes and
confidence level.
Calculating Confidence Interval
• When the population SEM is not known
• E.g., 95% CI = x̅ ± (t* sample SEM)
• Let N= 25, SD = 5, x̅ = 61 and sample SEM = 1, t = 2.06
• 95% CI = 61 ± (2.06*1)
• 95% CI = 58.94≤µ≤63.06
HYPOTHESIS TESTING
• Begins with the assumption that H0 is true, and it
involves making a decision to accept or reject H0
• H0 always state absence of, no difference, no effect, or
innocence
• Only H1 is stated in research work by researchers
• Researchers are interested in H1 but only via testing Ho
ERRORS IN RESEARCH
True Ho False Ho
Accept Ho Correct decision Type II error
Probability = 1- α Probability is β
Correct decision
Type 1 error Probability = 1 - β
Reject Ho
Probability = α
Controlling the Risk of Type I Error
• Type I errors can be controlled through level of
significance (α)
• α =.05 implies the risk of rejecting a true H0 five times in
100 samples.
• α=.01 implies the risk of rejecting a true H0 once in 100
samples.
• How about α = .001?
Controlling the Risk of Type II Error
• This is affected by several factors;
Sample size
Research design
Type of statistical test
Strength of relationship between variables
DECREASING TYPE I & II ERRORS
• There is a trade-off between decreasing type I or type
II errors
• Establishing a strict α to decrease
type I increases the risk of type II
error
STEPS IN HYPOTHESIS TESTING
Determine the appropriate test statistic
Establish the level of significance
Determine to use one-tailed or two-tailed test
Calculate the test statistic
Determine the degrees of freedom
Compare the computed test value against a tabled
value
Determine Appropriate Test Statistic
• The selection of statistical test depends on several
factors;
The nature of comparison
The number of groups being compared
Level of measurement of IV and DV
Statistical assumptions
Establish the Level of Significance
• Determine the criterion for the decision to reject H0
before the analyses are undertaken.
• e.g., α = .05, .01 .001
Determine Whether One-tailed or Two-Tailed Test
• Two tailed test uses both tails of a sampling distribution to
determine the critical region for rejecting H0
– E.g., There would be a significant difference between males and
females in medical nursing exam (non-directional
hypothesis)
• A one-tailed test uses one tail of a sampling distribution to
determine the critical region for rejecting H0
– E.g., Males will perform significantly better than females in
medical nursing exam (directional hypothesis)
Calculate the Test Statistic
• Compute the value of the test statistic using the right
formula.
• The value of the test statistic is called “statistic
obtained”
– E.g., t-obtained, F-obtained
Determine the Degrees of Freedom
• Degree of freedom refers to the number of
components that are free to vary about a parameter.
Compare Computed test value against
Critical Value
• If the absolute value of computed statistic is greater
than the tabled value, the result is statistically
significant at the specified probability level; if the
computed statistic is smaller, then the results are
nonsignificant.
Hypothesis Testing With Statistical
Software
• Rather than comparing the computed statistic with
critical values to reject or retain Ho, statistical software
provides the exact probability levels for each statistical
analysis.
• The decision to reject or retain Ho is made by comparing
the probability level obtained to a predetermined
statistical significance level