0% found this document useful (0 votes)
4 views20 pages

SPSS Data Analysis in Psychology Research

SPSS is a widely used software for quantitative data analysis in psychology, which employs statistical methods to understand human behavior. Research methods include quantitative and qualitative approaches, with hypotheses guiding data analysis through various statistical tests. Key concepts include levels of measurement, operationalization, confidence intervals, and effect sizes, all essential for interpreting research findings.

Uploaded by

mk6944143
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views20 pages

SPSS Data Analysis in Psychology Research

SPSS is a widely used software for quantitative data analysis in psychology, which employs statistical methods to understand human behavior. Research methods include quantitative and qualitative approaches, with hypotheses guiding data analysis through various statistical tests. Key concepts include levels of measurement, operationalization, confidence intervals, and effect sizes, all essential for interpreting research findings.

Uploaded by

mk6944143
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

SPSS is mainly used for data analysis, and this analysis involves

quantitative data. There are many software programs for data analysis,
such as SPSS, Python, and AMOS, but SPSS is the one most
commonly used.

Crash Course in Statistics and the Use of SPSS:


We study statistics because it helps us perform data analysis, which is a
key part of research. In data analysis, we work with numbers and need
to understand their range and meaning to determine whether the results
are good or bad.

Many studies in psychology employ the scientific method, just like those
in the natural sciences. However, studying people is not easy because
humans differ from one another and also change over time. For
example, one person might react differently from another in the same
situation, and even the same person might respond differently at
different stages of life. A person’s reaction during adolescence may not
be the same in adulthood.

Because of this variability, we cannot apply strict criteria like 99.9%


accuracy, which makes psychology different from subjects such as
physics or chemistry, where things behave in predictable and
consistent ways.

In recent years, the statistical methods used in psychology have become


more advanced, thanks to powerful software like SPSS, Python,
AMOS, and Mplus.

The main goal of psychological research is to understand human


behavior and experience more deeply. To achieve this, data collection
is essential — the more data we collect, the more accurate our results
become.
We collect data using two main methods:

1. Quantitative methods, which involve numerical data and are


analyzed using tools like SPSS or AMOS.
2. Qualitative methods, which focus on non-numerical data and are
analyzed using software such as N-Vivo.

the research method we use depends on what kind of question we are


asking and what type of data we need to collect.

 If we want to collect numbers or measurable data (for example,


“How many students feel stressed before exams?”), we use
quantitative methods like surveys or experiments.
 If we want to understand people’s thoughts, feelings, or
experiences (for example, “How do students describe their
experience of exam stress?”), we use qualitative methods like
interviews or observations.

Levels of Significance: p < 0.05, 0.01, 0.001 — these values help us


interpret the results of our research. We analyze our findings based on
these levels to determine whether our results are statistically significant
or if they occurred by chance.

In both qualitative and quantitative methods, we use variables to study


relationships, impacts, or predictions. When we want to see the
relationship between variables, we use correlation analysis. When we
want to see the impact or prediction of one variable on another, we use
regression analysis.

There is a difference between groups and variables. Variables are what


we study — for example, levels of depression or anxiety. Groups, on the
other hand, are categories from which we collect information, such as
demographic characteristics. Groups can include factors like sex,
siblings, social status, city, country, province, school, university, or
salary.

For example, if we want to see the difference between males and


females in levels of depression, we use a t-test, because it compares
two groups. If we have more than two groups, we use ANOVA
(Analysis of Variance).

We also use factor or structural analysis methods to reduce data. After


data collection, we use descriptive statistics to summarize demographic
information.

In quantitative or qualitative research, we identify independent


variables (IVs) and dependent variables (DVs) and form hypotheses
based on a review of existing literature.

For example, if we study three variables — shame toward menstruation,


self-objectification, and attitude toward menstruation — we can form the
following hypotheses:

1. There will be a positive relationship between self-objectification,


attitude toward menstruation, and shame among female university
students. (This represents a correlation analysis.)
2. There will be a significant relationship between self-
objectification, attitude, and shame toward menstruation among
female university students. (Also a correlation analysis.)
3. Self-objectification and attitude will predict or lead to shame
toward menstruation among female university students. (This
represents a regression analysis.)

Thus, the type of hypothesis we make determines which data analysis


we should use.

Levels of measurement tell us how we collect and classify our data.


There are four levels of measurement: nominal, ordinal, interval, and
ratio.
1. Nominal: In the nominal level, we categorize the data but do not rank
it. For example, ethnicity, city, gender, or marital status — these are
categories, but they have no particular order.

2. Ordinal: The ordinal level includes both categories and ranking. For
example, in university, students may be in the first, second, or third
semester — these are categories that also show rank or order. Another
example is English proficiency levels: beginner, intermediate, and
advanced. These levels show both categorization and ranking.

3. Interval: The interval level includes categorization, ranking, and


numerical values. The main feature of this level is that there is no true
zero — zero does not mean “nothing.” For example, temperature or IQ
test scores are measured on an interval scale because zero does not mean
the complete absence of temperature or intelligence.

4. Ratio: The ratio level includes categorization, ranking, numerical


values, and it also has a true zero, which means zero represents the
complete absence of what is being measured. Examples include weight,
height, and age.

 Arbitrary zero means the zero is not true — it does not show the
real absence of something.
✅ Example: Temperature – 0°C does not mean “no temperature”; it just
means it’s cold. So, this is an arbitrary zero.

 True zero means the zero is real — it shows the complete absence of
what is being measured.
✅ Example: Weight – 0 kg means “no weight at all.” So, this is a true
zero.
HYPOTHESIS

A hypothesis is an educated guess or a testable statement about what


you think will happen in your research.

It’s something you can test through data or experiments to see if it’s
true or false.

🧠 Example (simple):

You think students who sleep more before an exam perform better.
So your hypothesis could be:

“Students who sleep at least 8 hours before an exam will get higher test
scores than those who sleep less.”

Then, you collect data (sleep hours and test scores) to test if this is true.

✳️Two main types of hypotheses:

1. Null Hypothesis (H₀):


o Says there is no difference or no relationship.
o Example: “There is no difference in test scores between
students who sleep more and those who sleep less.”
2. Alternative Hypothesis (H₁ or Ha):
o Says there is a difference or relationship.
o Example: “Students who sleep more score higher than those
who sleep less.”
Conceptual Definition:

A conceptual definition explains what a concept means in general,


using ideas or theory — not numbers.
It tells what the concept is about according to psychology or literature.

✅ Example (Depression):
Depression is a psychological condition that includes both physical and
psychological symptoms.

 Physical symptoms may include hypersomnia (sleeping too much),


insomnia (trouble sleeping), and weight gain or loss.
 Psychological symptoms may include hopelessness and suicidal
thoughts.

So, this is the conceptual definition — it explains what depression


means as a concept.

Operational Definition:

An operational definition explains how the concept will be measured


in your study.
It tells what tool, scale, or method you used to define or measure it.

✅ Example (Depression in your study):


In this study, depression will be measured using the Beck Depression
Inventory (BDI).
A higher score on the scale indicates a higher level of depression.

✅ Example (Anxiety in your study):


In this study, anxiety will be defined as the score obtained on the State-
Trait Anxiety Inventory (STAI).
A higher score represents a higher level of anxiety.
In short:

 Conceptual definition = meaning of the concept (theory-based).


 Operational definition = how you measure it in your study (scale
or method-based).

Operationalization means turning an idea (something abstract) into

something you can actually measure or test.

For example, if you want to study happiness, you can’t directly see or
measure happiness — it’s a concept.
So you must decide how to measure it — maybe by using a happiness
questionnaire, or by asking people to rate their happiness on a scale
from 1 to 10.

That process (turning the idea “happiness” into a measurable thing) is


called operationalization.
Between-Subjects Study Design versus Within-Subject Design
In a between-subjects design, each participant experiences only one
condition of the independent variable. In contrast, in a within-subjects
design, each participant experiences all conditions of the independent
variable.

Between-group differences examine how two or more groups differ


from one another, while within-group differences explore how the
same group varies under different conditions.

For example, comparing test scores between two different classes


would represent a between-group difference, whereas examining how
scores vary within a single class would represent a within-group
difference.

Correlational Design
A correlational design is used to find out if there is a relationship
between two or more variables — without changing or controlling
them.
In this design, the researcher only observes and measures variables
as they naturally occur.

Quasi- experimental design


A quasi-experimental design is similar to an experimental design, but
it lacks random assignment or does not have a control group.
The researcher still manipulates an independent variable, but cannot
randomly assign participants — often because it is not ethical or
practical.

Experimental Design

Experimental research is a type of study where the researcher changes


one or more variables (called the independent variables) to see how
they affect another variable (called the dependent variable).
Participants are randomly divided into different groups — for example,
a treatment group that receives the intervention and a control group
that does not.
By comparing the results of these groups, researchers can find out
whether one thing actually causes another.

P-Value

P-values are used in research to see if the results from a sample are
likely to be real or just happen by chance.
Usually, a p-value less than 0.05, 0.01, or 0.001 is considered
statistically significant, meaning the result is unlikely to be due to
chance. Even though some people suggest using a smaller threshold,
0.05 is still common.

What is a Confidence Interval (CI)?

A confidence interval is a range of numbers that gives us an estimate


of what the true value (like the average) is for an entire population.
It also tells us how confident we are that the true value lies inside that
range.

🎓 Example:

Let’s say a university wants to know the average height of all its
students.

But there are thousands of students — too many to measure!

So, the researcher measures only 100 students (a sample).


They find that the average height of those 100 students is 165 cm.

Then, they calculate a 95% confidence interval of 160–170 cm.


This means:

“We are 95% sure that the true average height of all students (the whole
population) is between 160 cm and 170 cm.”

So, even though they didn’t measure everyone, the CI gives a range that
likely contains the true value.

📊 Why do we use Confidence Intervals?

Because researchers can’t test every person in a population — they test


only a sample.
So, their results are estimates, not exact.

A confidence interval tells us how accurate that estimate probably is.

🧠 Confidence Level (like 95%)

The confidence level (such as 95%) tells you how sure you are that the
true value lies inside the interval.

 95% CI → We are 95% confident that the real value lies in that
range.
 99% CI → We are 99% confident, but the range will be wider (to
be extra sure).
 90% CI → We are 90% confident, but the range will be
narrower.

Population vs. Sample — the basic idea:

 A population means everyone or everything you want to study.


 A sample means a small group taken from that population.
 If we collect data from every person in the population, then we
find the average (mean) of that population.
That means it is called a parameter.
 But usually, it’s too hard to study everyone, so we only collect
data from a small group (a sample).
When we calculate the average or standard deviation from that
smaller group, it’s called a statistic.

Descriptive Statistics

Descriptive statistics help to summarize and simplify large amounts of


data.
Their purpose is to give a clear picture of what the data looks like by
showing a few representative or typical values.
This may include a measure of central tendency (like the mean) and a
measure of spread (like the standard deviation). Based on the type of
data,
POINT ESTIMATE

A point estimate gives us our best guess.


For example, if the average height of a group (sample) is 1.73 meters,
we can use this as our guess for the average height of everyone in the
population.
But it’s not likely that the real average is exactly 1.73 meters — it’s just
our best estimate.
The limitation of the point estimate is that it doesn’t tell us how close
or accurate our guess might be.

As an alternative, we use a confidence interval.


A confidence interval gives us two numbers — a lower and an upper
limit — that show the range where the true value is likely to fall.
It also tells us how sure we are about our estimate.
If the range is wide, we are less certain; if it’s narrow, we are more
certain.
For example, if our 95% confidence interval is 1.6 m to 1.8 m, it
means we are 95% sure that the true average height is between 1.6 and
1.8 meters.

BOOTSTRAPPING

Bootstrapping SPSS analysis method used to estimate parameters.


For example, if we are interested in the average height of a population
and have collected data from a sample of 100 individuals, we can use
the bootstrapping method to estimate the mean height of the entire
population.

In this method, we create many new samples, each containing 100


observations, and calculate the mean for each of these samples.
But where do these new samples come from? SPSS does not collect
additional data; instead, it uses your existing sample and applies a
technique called random resampling with replacement — meaning
that data points are randomly selected from your original sample, and
each value can be chosen more than once.

INFERENTIAL STATISTICS

Inferential statistics are used to draw conclusions and make inferences


after analyzing data collected through surveys.
They involve hypothesis testing and estimation to make comparisons,
predictions, and generalizations about a population based on sample
data.
This approach also involves making certain assumptions about the data
and population.

CONFIDENCE INTERVAL AND STATISTICAL INFERENCES

The confidence interval approach discussed earlier can also be used to


make statistical inferences.
SPSS can calculate the mean difference (for example, between two
groups or two conditions), which serves as an estimate of the true
difference between the two population means.
This mean difference can be expressed as either a point estimate or a
confidence interval.

For instance, if the upper and lower limits of the 95% confidence
interval for the mean difference are 1.3 and 2.2, it indicates that there is
a 95% probability that the true mean difference lies somewhere
between 1.3 and 2.2.

EFFECT SIZE

Effect size shows how meaningful or strong the relationship between


variables is, or how large the difference between groups is.
It reflects the practical significance of a research finding — not just
whether the result is statistically significant.

A large effect size means the finding has real-world or practical


importance, while a small effect size suggests that the result may have
limited practical value.
Effect size tells us the practical implications of our data and helps us
understand how much impact an intervention, treatment, or
manipulation of an independent variable has.

According to Cohen (1988):

 An effect size of 0.2 is considered small,


 An effect size of 0.5 is considered medium, and
 An effect size of 0.8 is considered large.
PARAMETRIC TESTS

SPSS does not indicate which descriptive statistics you should use to
summarize your data or which inferential statistics are appropriate for
testing your hypothesis.

Most of the inferential statistical tests discussed in the recommended


book are parametric tests, as these tests generally have greater
statistical power. However, their use is limited to situations where the
data meet certain conditions, including the following:

1. The data are measured on an interval or ratio scale.


2. The data are drawn from a normally distributed population.
3. The samples being compared come from populations with equal
variances.

NON-PARAMETRIC TESTS

In some cases, SPSS provides information that helps you determine


whether your data violates any of these assumptions.
Additionally, certain statistical tests may have different or additional
assumptions beyond those listed.

When the data do not meet all these requirements, researchers use
nonparametric tests.
These are inferential tests that make fewer assumptions about the data
and its distribution.
However, they are generally less powerful than their parametric
counterparts.
Central Tendency

Measure of Central Tendency


Central Tendency refers to a central or middle value that represents an
entire set of numbers.

The measure of Central Tendency is a way to describe a set of data by


identifying a single value that represents the whole set of data.

There are three main measures of central tendency: mean, median,


and mode.

The Mean (Average)

The best method to find the central tendency is by using the average,
also called the mean. It is found by adding all the numbers and then
dividing the total by how many of numbers there are.

The mean shows a value that represents the whole group.

Suppose five students scored 70, 80, 90, 85, and 75 in a test.
Add them up: 70 + 80 + 90 + 85 + 75 = 400.
Divide by 5 students → 400 ÷ 5 = 80.

This means that, on average, each student scored 80 on the test.


The Mode
The mode is another method to find the central tendency.

The mode is the most repeated value in a group of numbers.

In our dream example, the mode is 8, because three students had 8


dreams, and no other number was repeated that many times.

Another way to understand the mode is that it is the number with the
highest frequency.

The Median

Another method to find the middle value is the median.

To find it, arrange all the numbers from smallest to largest — the
number that falls in the middle is the median.

Measures of Dispersion

Dispersion means scatter, variation, or spread.

Measures of dispersion are numeric values that show how much the data
is spread out or close together around the middle value.

The five most common measures of dispersion are:

 Range
 Variance
 Standard deviation
 Mean deviation
 Quartile deviation
When the data varies more, the measure of dispersion becomes larger.

Range

It is the simplest method to show how much the data varies and the
quickest way to measure dispersion.

The range is the difference between the smallest and largest values in a
data set.
This difference gives a quick idea of the spread of the data.

Quartile Deviation
Quartile deviation measures how the middle part of the data is spread
out .

The interquartile range (IQR) is the difference between the third


quartile (Q3) and the first quartile (Q1). Quartile deviation is half of
this range.

Quartiles divide a dataset into four equal parts, so there are three
quartiles:

 Q1 (First Quartile) – the lower quartile


 Q2 (Second Quartile) – the middle value (also the median)
 Q3 (Third Quartile) – the upper quartile

The interquartile range is calculated as:

IQR = Q3 − Q1
Measures of Shape

Measures of shape describe the pattern or distribution of data within a


dataset.

 For quantitative data (numbers), the distribution can be described


because the values have a logical order, and the lowest and highest
values can be identified on a histogram.
 For qualitative data (non-numeric), the shape cannot be described
in this way.

Shapes of a Dataset
A dataset can be symmetrical or asymmetrical. Two common
examples are:

1. Symmetrical Distribution
o The two sides of the distribution are mirror images of each
other.
o A normal distribution is a perfectly symmetrical
distribution.
o On a histogram, it forms a bell-shaped curve, also called a
normal curve or bell curve.
2. Asymmetrical Distribution (Skewed Distribution)
o The two sides are not mirror images.
o Skewness shows if values are more frequent at the low or
high end.
o Positive skew: The right tail is longer than the left tail.

Interpreting Skewness

 A skewness value between -1 and +1 is excellent.


 Between -2 and +2 is generally acceptable.
 Values beyond -2 or +2 indicate a significant deviation from
normality.
This helps us understand how data is spread and whether it follows a
normal pattern or not.

Variance and Standard Deviation

 Variance shows how spread out the numbers in the data set but
measures the average squared distance of each value from the mean
making it difficult to interpret.

 Standard Deviation shows how spread the numbers are in data set
but measures it in the same units as the original data, making it easier
to understand. It is the square root of the variance.

The Empirical Rule 68-95-99.7 Rule tells us how values are spread in a
normal distribution:

 About 68% of values are within 1 standard deviation from the


mean.
 About 95% are within 2 standard deviations.
 About 99.7% are within 3 standard deviations.
Outliers are values in a dataset that are much higher or much lower
than most of the other values. They stand out because they are far
away from the rest of the data.

 Example 1 – Test Scores:


If most students scored between 70 and 90 on a test, but one
student scored 30, that 30 is an outlier.
 Example 2 – Age:
In a class of students aged 18–22, a student aged 40 would be an
outlier.

You might also like