PSY 240:
Statistics in Psychology
One Sample
Statistics:
Calculating
Significance
for Paired
Means
Exam Review
Exam 2 statistics:
M = 75.90, SD = 15.72, Skew = -.43 (moderate)
Scores were lower than Exam 1 but had more
variability – basically there were fewer B’s
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair 1 Exam1 - Exam2 7.93103 12.76415 2.37024 3.07581 12.78626 3.346 28 .002
The Possibility of Being
Wrong
Remember that when a finding has a very
low probability (lower than alpha) we
reject the null
But since this is based on probability, it's
possible that we could be wrong
With an alpha = .05, our outcome could
have occurred simply be chance 5% of
time
Thus, we will be wrong 5 times out of 100
Possibilities and
Probabilities
Actual Situation (Often Unknown)
No True Difference True
Difference
Researcher’s Decision
Not Significant Correct Type II
Decision Error
(p = 1 – α ) (p = 1 – power)
Significant Type I Correct
Error Decision
(p = α ) (p = power)
The Possibility of Being Right
Just as we are wrong 5% of the time for
Type I errors (rejecting the null when we
should not), there is some incidence of error
when we should reject the null but do not
This is called Type II error, and is dependent
on several things
Sample Size (N)
α (usually .05 or .01)
Effect size (e.g., Cohen’s d)
The Possibility of Being Right
Power is the probability of rejecting the null
hypothesis given that we should
In other words, if there is a difference in the population,
what is the chance that we will find it in our study?
These things together contribute to “power”:
Sample Size (N) Larger N means more power
α (either .05 or .01) Larger α means more power
Effect size (Cohen’s d) Larger effect means more power
We like power and want the probability to be high
High power = Greater chance of getting significant result
Low power = The study might be a waste of time
The Concept of Power
We want Statistical Power
We want to find significance when true differences
exist (otherwise why run the study?)
Specifically, our goal is to correctly identify treatment
effects
Often, we need to balance power against the
likelihood of making an error
This can be calculated or estimated
Psychological studies tend to have low power
Factors Influencing Power
Maximizing Chances for
Significance
The primary ways of increasing
power
Larger mean differences
Decreasing variability within groups
Increasing sample size
Can you see how these effect the
formula for the significance test?
Maximizing Chances for
Significance: Effect size
With a larger difference between
means, we will have a larger number
in the numerator of our test statistic
It is easier to see the difference
between two distributions if they
overlap less
Maximizing Chances for
Significance: N
N affects the SE by increasing our confidence
that our sample is like the population (i.e., the
law of large numbers)
Think of the SE formula: SD/√N
As N increases, the SE decreases
Since the SE is the denominator of t and z, our test
statistic will get larger with more people
N has NOTHING to do with effect size – only our
judgment of statistical significance
Maximizing Chances for
Significance: Error variance
Our SD really represents variability in scores
that we cannot understand
We know that scores vary, but why?
Sometimes this is due to group differences, but
not everyone within a group is the same
Thus, the SD is “statistical error” and is why we
account for it in the SE
A large SD will increase our SE, making our test
statistic smaller
Making Errors
Type I Error: Finding statistical significance
when there is no true difference
Also known as a False Positive
The frequency of making this error = α
Type II Error: Failing to find statistical
significance when there is a true
difference
Also known as a False Negative
Using Power Tables
Power tables provide the probabilities of finding
significance (i.e., power) for various combinations
of effect size and sample size
For the example: With d = 1.0 and n = 3, Power = .
157 (for a 2 sample test)
This means that we only gave ourselves a 16% of
even finding statistical significance with this effect
size!
Effect and Sample Sizes
The more people we have in our sample,
the less sampling error we have and the
more accurate our parameter estimates
will be
Large effect sizes are relative easy to
detect (meaning we don't need big sample
sizes to declare statistical significance)
Small effect sizes are difficult to detect
(meaning we need large sample sizes to
declare statistical significance)
Estimating Power
If N = 3 for a dependent samples design, what
effect size would be needed to achieve power =
.80?
If d = 1.0, what sample size would be needed in
order to achieve power = .80?
What is the power associated with a medium
effect size and n = 50 for both a dependent
and independent samples design?
Moral of the story…
Within subjects designs are always more
powerful than between-subjects designs
So, when possible, have your subjects
complete as many measures as possible
However, consider the consequences of long
experiments, such as participant fatigue, and
perhaps even frustration!
Repeated/Paired Measures
When we have data on two similar
measurements for each individual, we
can ask two different questions
Is there a mean difference between the
measures (and is it significant)?
Is there a correlation between the measures
(and is it significant)?
Both questions can be answered by
using a variation of the t test procedure
Example Data
(with Scatterplot)
Testing Mean Differences
For each person, create a difference
score (a number that reflects the
difference in measurement between
the two time points)
Run a one-sample t test
(conceptually speaking) on the
difference scores
Computing the Difference
Scores
Participant Q1 Q2 Q1 – Q2
1 5 7 -2
2 6 7 -1
3 6 8 -2
4 7 8 -1
5 8 9 -1
M 6.4 7.8 -1.4
SD 1.14 .837 .548
Computing the paired-samples
t test
The difference scores had a M = -1.4 and
SD = .548 and N = 5
So…
M Difference − µ Difference
t ( N − 1) =
SDDifference
N
Calculation of the paired t
− 1.4 − 0 − 1.4
t (4) = = = 5.713
0.548 .245
5
For an alpha (α) of .05: CVt = 2.776
The approximate p value is less than .01 because the
value of our t statistic is between the critical values for .
01 and .001
Cohen’s d for paired samples
Conceptually, there is M Difference
nothing new about
calculating effect size
d=
SDDifference
− 1.4
d= = 2.55
This is a very large effect
size, which would make
.548 sense because we had a
significant result with a
very small sample
Steps in Creating the CI for the
Mean Difference (Raw Effect)
1. Establish the center of the interval.
2. Obtain the t score appropriate for the
level of confidence.
3. Estimate the standard error.
4. Calculate the confidence limits in raw
score values by using the following
equation:
CI D = ( M D − µ D ) ± (CVt )( SED )
The Significance Test and
Confidence Interval
PSY 240:
Statistics in Psychology
Two Sample
Statistics:
Understanding
the Sampling
Distribution
Our Basic Sampling
Distribution
The Sampling Distribution of the
Difference between Means
Theoretical probability distribution of all
possible mean differences for samples of a
given size from the population
Characterized by a mean and the standard
error of the mean
But what is the shape of this sampling
distribution? Is it reasonable to assume
that it is normal?
The Sampling Distribution
The basic characteristics of the sampling distribution of
each mean difference are essentially the same: there is
some location and some spread (SE )
How is a distribution of mean differences created?
How does the average mean difference relate to the
population mean difference?
The standard deviation of mean differences is NOT the
same as the standard deviation of scores or the same as
the standard error for either mean
Estimating the Two
Population Means
The central limit theorem suggested that in the one
sample case, our sample mean was the best estimate
of the population mean
In the two sample case, the difference between our
sample means is the best estimate of the difference
between the two population means
Thus, the difference between the means is the location
(mean) of our sampling distribution
If we assume the null hypothesis is true, then the center will
be equal to 0, and if standardized, have a SE = 1
Standard Error of the
Difference between Means
But, in order to standardize, we need the raw SE
…
The standard deviation of the sampling
distribution of mean differences is a function of
the standard errors of each mean
If sample sizes are equal:
SE Difference = SE 2
M1 + SE 2
M2
If sample sizes are unequal, the process is more
complex
Standard Error of the
Difference between Means
If sample sizes are unequal, it is necessary
to pool our estimates of population
variance first:
SSWITHIN SS1 + SS 2
MS WITHIN = =
df WITHIN df 1 + df 2
Then the standard error of the difference
MSWITHIN MSWITHIN
is: SEDifference = +
n1 n2
PSY 240:
Statistics in Psychology
Two Sample
Statistics:
Calculating
Two Sample
Statistics
Independent Groups t Test
Since the population variance is
estimated, we use a t formula:
M1 − M 2
t ( N − 2) =
SEDifference
Note the similarity to the one sample
t formula
Steps in Creating the CI for the
Mean Difference (Raw Effect)
1. Establish the center of the interval.
2. Obtain the t score appropriate for the
level of confidence.
3. Estimate the standard error.
4. Calculate the confidence limits in raw
score values by using the following
equation:
CI D = ( M 1 − M 2 ) ± (CVt )( SEDiff )
Standardized Effect Size
Cohen’s d Statistic
µ1 − µ 2 M1 − M 2
d= ≅
σ M SW ITH IN
The denominator is like a “pooled” or average SD
Important Points
Interpreted similar to a standardized score
Rule of thumb interpretations for size
(.2 = small, .5 = medium, .8 = large)
A Two Group Example
Suppose we compare a
control group to an
experimental group.
Using an α = .05, are the
two groups significantly
different?
What is the confidence
interval for the mean
difference?
What is the standardized
effect size?
Working the Example
Group Statistics
Std. Error
Group N Mean Std. Deviation Mean
Score Control 3 4.0000 2.00000 1.15470
Experimental 3 6.0000 2.00000 1.15470
Working the Example
Group Statistics
Std. Error
Group N Mean Std. Deviation Mean
Score Control 3 4.0000 2.00000 1.15470
Experimental 3 6.0000 2.00000 1.15470
In d ep en d en t Samp les T est
t-test for Equality of M eans
95% Confidence
Interval of the
M ean Std. Error Difference
t df Sig. (2-tailed) Difference Difference Lower Upper
Score -1.225 4 .288 -2.0000 1.63299 -6.53392 2.53392
Steps for hand calculation
SED = 1.1547 + 1.1547 = 1.333 + 1.333
2 2
SE D = 2.667 = 1.633
M1 − M 2
t ( N − 2) =
SE D
4−6 −2
t ( 6 − 2) = = = −1.225
1.633 1.633
APA Method of Presentation
An independent samples t test indicated that
the difference between the control group
(M = 4.00, SD = 2.00) and the experimental
group (M = 6.00, SD = 2.00) was not statistically
significant, t(4) = -1.225, p = .288, d = 1.0. The
95% confidence interval for the difference
between means ranged from -6.534 to 2.534.