Quantitative
Data Analysis
Using SPSS
Elva Susanti, [Link]., [Link]
Coding the data for SPSS,
setting up an SPSS database
and entering the data
• The dataset
The data from the 30 questionnaires
are provided in Table. Each row
provides data for: their sex and age,
the counselor they saw, how many
sessions they attended, and their
satisfaction rating for the
counseling.
SPSS VARIABEL Coding Intrstruction
NAME
Customer The number assigned to each customer/questionnaire. It is important to
assign a number to each customer for three reasons:
1. In case you need to refer to the actual questionnaire to check
the data for that customer.
2. In case you need to add more data for that customer from further
questionnaires at a later date.
3. In SPSS it is sometimes useful to re-order or 'sort' the data, for example,
by putting all the males together at the top of the files.
Sex 1 = Male
2 = Female
Counsellor 1 = ani
2 = Andy
3 = Adi
Sessions Enter number of session
Satisfaction Enter satisfaction rating 1-5
Setting up
an SPSS
database
When you open SPSS you
should be faced with the
following screen:
The screen below is known as the Data View
• ach row will contain the data for one
customer. So, in the example on the next
page, we have data for 3 customer:
• Customer 1 is a male (coded 1), aged 21.
• Customer 2 is a female (coded 2), aged 21.
• Customer 3 is male, aged 40.
• Click on Variable View. This will produce the following screen where you will type in
Rename information about the variables:
• Name
variable Enter the first SPSS variable name listed in your codebook, i.e., Customer, Sex, Age,.
Then press the right arrow on your keyboard to go to the next column – type.
• Type: what type of data is it?
• Once you have entered a variable name the default value for Type will appear
automatically as numeric. All of our variables will be numeric because we will be
coding any words – such as male and female – as numbers (i.e., 1 for male and 2 for
female). So you can move on to the next column – Width
•In SPSS words are known as string
variables. So, we would click on
String and then OK and this cell
would say String, rather than
Numeric. Notice also that SPSS
provides options to enter dates or
currency data.
• Width: how many numbers will you be entering?
SPSS defaults to eight characters. The most we will need is two – for our age and session data (e.g., 24 years, 12 sessions). There is no
need to change this. You would increase it if you were entering very large numbers, for example, 184,333,333.24 (i.e., 11
numbers/characters).
• Decimals
SPSS defaults to two decimal places. Since our data does not require decimal places we can simply click in the Decimals cell and click
the up or down arrows (which appear to the right of the cell) to adjust decimal places needed for that particular variable.
For datasets with many variables that do not have decimal places it may be worthwhile changing the default setting to 0 decimal
places. You can do this by clicking on Edit from the menu at the top of the screen and then choosing Options. Next, click the Data
tab and change the decimal place value to 0 in Display format for new numeric variables.
• Label
The Label column allows you to provide a longer description of your variable,
which will be shown in the output produced by SPSS. You do not need to put anything here for ‘customer’, or the other variables,
since the names are selfexplanatory.
• Values
Values are numbers assigned to categories for nominal variables, for
example, where male = 1 and female = 2. Since your first variable
(patient) has no ‘values’, you do not need to put anything here.
• Missing
Sometimes it is useful to assign specific values to indicate different
reasons for missing data. However, SPSS recognizes any blank cell as
missing data and excludes it from any calculations, so if you intend
to leave the cell blank there is no need to enter values for missing
data. And we have no missing data anyway.
• Columns
You can change the column width to reduce the space it takes on
the screen. But you need to allow enough space for variable names,
so the default of eight is usually OK.
• Align
This is usually set at Right, which is OK.
• Measure
Measure • The default measure is Scale, but you can change this to Ordinal or
Nominal by clicking in the cell and then on the down arrow on the
definition right of the cell. We have scale data and nominal data
Adding value labels
The next variable is ‘sex’. Proceed as you did for
‘patient’ until you get to
Values. Whereas ‘patient’ had no values – it
was simply patient numbers – sex
has two values: we will be entering the number 1
for male and number 2 for
female. So you need to tell SPSS that is what you
are doing.
1 Click in the Values cell and then on the button with 3 dots on
the right side
of the cell. This opens the Value Label box.
2 Click in the box marked Value. Type in 1.
3 Click in the box marked Value Label. Type in male.
4 Click on Add. You will then see in the summary box: 1=male.
5 Repeat this procedure for females (Value = 2; Value Label:
female; Add).
6. 6 Then click OK. This information is now stored in your SPSS
database. The only other thing to change is Measure (to
Nominal) and your database should look like this:
• You should now be able to complete the information for the
remaining variables, ensuring that you enter the value labels for
‘counsellor’ (1 = Ani ;2 = Andy; 3 = Adi).
• When you have done this your database should look like this:
Sorting the data
Output
Descriptive statistics
• Frequencies
Running frequencies in SPSS
From the menu at the top of
the screen click Analyze, then
Descriptive
Statistics then Frequencies.
• Choose the variable sex by clicking with your mouse.
• Once sex is highlighted move it across into the variables box by
clicking the arrow. Alternatively you could just double-click it once it
is highlighted.
• Oke, output
• When we have missing data, the percent
and percent valid values are different.
This is because the percent column
calculates the percentage for all data –
including missing data. So, in the output
above, we have eighteen males, twelve
females and zero missing data – 40
percent women and 60 percent men,
respectively. Valid percentages, however,
ignore missing data (ten cases) and have
calculated percentages based on a total
of 30 cases.
• Now try running frequencies for the other categorical variable –
counsellor to see how many customer they each treated.
• Go to the menu at the top of the screen and click Analyze then
Descriptive Statistics then Frequencies. You should then remove the
variable sex from the frequencies analysis box by double-clicking it with
the left mouse button. You can now run frequencies for counsellor,
which should result in the following
Output councellor
Running frequencies for
measures of central tendency
• 1 From the menu at the top of the screen click Analyze, then
Descriptive
Statistics, then Frequencies (remove any existing variables from
previous analyses by double-clicking them to return them to the list).
• 2 Double-click (left mouse button) the variables age, sessions and
satisfaction to move them into the Variables box.
• 3 Click Statistics and click in the boxes next to mean, median, mode,
and minimum and maximum. Then click Continue.
• 16.7 percent said the service was not
very good, and 3.3 percent said the
service was very good
Using graphs
to visually
illustrate the
data
•Producing a bar chart comparing the mean number
of sessions provided by each
counsellor
1 From the menu at the top of the screen click
Graphs, then Chart Builder (a dialogue box may
appear asking if you have set the correct
measurement levels for each variable and included
value labels for categorical variables). Since you
have done both (you have, haven’t you . . . ), put a
tick next to Do not show this dialogue again and
click OK).
2 You should then be faced with the following
screen:
line
Histograms
• Histograms are similar to bar charts but are designed to
represent data along a continuum. The age of patients is
a good example.
Using SPSS to produce a histogram for age
1 From the menu at the top of the screen click Graphs,
then Chart Builder.
2 Select Histogram from the Gallery of Charts to reveal
the range of charts available.
3 Drag the first histogram (top left) into the preview area.
4 From the variables list drag age into the X-axis.
Cross-tabulation and the
chi-square statistic
Running cross-tabulations in SPSS
• 1 From the menu at the top of the screen click Analyze then Descriptive
satistics then Crosstabs.
• 2 Move the variable sex into the rows box and counsellor into the columns box.
• 3 Click the Cells button. In this box the observed counts should already be ticked (this
simply provides the observed frequencies). You should also click Row percentages –
since we want to know the percentages of males and females who were treated by each
counsellor.
•Cross-tabulation of sex and
counsellor
From the table above, Ani's counselors have fewer proportions
of serving than Andy's and Adi's counselors.
1 From the menu at the top of the screen click: Analyze then
Descriptive Statistics then Crosstabs.
Running 2 Move the variable sex into the rows box and counsellor into the
chi-square columns box.
in SPSS 3 Click the Cells button. Ensure that observed and expected counts
are ticked, and row percentages (since we want to know the
percentages of males and females who saw each counsellor).
4 Click Continue to close this box.
5 Click on the Statistics tab and put a tick in the chi-square box.
Chi square
Provisional hypothesis formulation
Ho : There is no relationship between gender and counsellor
Ha : There is a relationship between gender and counselor
If the asymptotic is significant < 0.05 then Ho is rejected and Ha
is acceptedFrom the results of the SPSS output, it can be seen
that the [Link] value of 0.667 is greater than 0.05, then Ho
is accepted, it can be concluded that there is no relationship
between gender and counselor
Producing a test of normality
1 From the menu at the top of the screen click on Analyze then Descriptives
then Explore.
Normali
2 Move satisfaction into the Dependent list and counsellor into the Factor
list.
3 Under Display ensure that there is only a tick next to Plots.
ty Data 4 Click on the Plots tab to open the plots dialogue box.
5 Under Boxplots click None, and remove any ticks under Descriptive. Place a
tick in Normality plots with tests. Under Spread vs Level tick none.
6 Click Continue, then OK.
•asymp sig 0.031 and 0.008 are smaller than 0.05 then the
data is not normally distributed on adi and andy while the
asymp sig kolmogrof smirnov on ani is 0.2 greater than
0.05 then the ani data is normally distributed
Producing histograms to check
for normal distributions
•the plots follow the fit line, then the variables are
normally distributedand most of the bars are below the
curve, then the variables are normally distributed.
the box is in the middle with both legs of the same length, the
horizontal line is in the middle of the box and there are no plots
above or below the box, then the variables are normally
distributed.
•asymp sig 0.023 is smaller than
0.05 then the data is not
normally distributed
Independen T 1 From the menu at the top of the screen click Analyze
then Non parametric test
Test then Independent samples t-test, in fields, test fields
satisfaction and groups counsellor, klik Run
Step
Regression
Test
• After the average value is
obtained for each variable,
regression can be done by
clicking on Analyze on the
main menu then Regressions
then Linear, like following:
• Click statistic
• Then click the Plot option, a linear regression box will appear: Plots. In
column Y enter *Z-PRED and in column X enter *S-RESID then check
Histogram and Normal Probability Plot in the Standardized Residual
Plots box. Then the countinue is likefollowing picture:
• The next step, click the
Save option, a Linear
Regressions dialog box will
appear: Savethen check
Unstandardized in the
residual column then click
Continue as follows:
Then we will return to the Linear Regression dialog box. Now, after all
the processes have been carried out properly and correctly, the last step
is Click OK. Then the result of
questionnaire data processing regression will look like this
R square 0.475 so that the effect of variable x on variable
Y is 47.5% the current
is very strong
THANK
YOU