0% found this document useful (0 votes)

38 views27 pages

SPSS Data Preparation Steps

The document discusses various steps involved in data preparation and analysis using SPSS. It covers questionnaire checking, editing, coding, creating a codebook, data cleaning, and selecting appropriate univariate and multivariate statistical techniques based on the characteristics and properties of the data.

Uploaded by

Vardaan Bhaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views27 pages

SPSS Data Preparation Steps

Uploaded by

Vardaan Bhaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Preparation &

Analysis
with
SPSS
Data Preparation Process
Check Questionnaire

Edit

Code

Transcribe

Clean Data

Data Analysis
Questionnaire Checking
A questionnaire returned from the field may be
unacceptable for several reasons.
Editing

Editing the questionnaires involves identifying illegible, incomplete,

inconsistent or ambiguous responses:

Treatment of Unsatisfactory Results

– Returning to the Field

– Assigning Missing Values

– Discarding Unsatisfactory Respondents

Coding
Coding means assigning a code, usually a number, to each possible
response to each question. The code includes an indication of
– the column position (field) e.g. sex of a respondent
– data record that includes related fields such as sex, marital status,
age, income etc.

Coding Questions

• Fixed field codes, which mean that the number of records for each
respondent is the same and the same data appear in the same
column(s) for all respondents, are highly desirable.

• If possible, standard codes should be used for missing data. Coding of

structured questions is relatively simple, since the response options are
predetermined.

• In questions that permit a large number of responses, each possible

response option should be assigned a separate column.
Coding
Guidelines for coding unstructured questions:

• Category codes should be mutually exclusive and collectively

exhaustive.

• Only a few (10% or less) of the responses should fall into the “other”
category.

• Category codes should be assigned for critical issues even if no one

has mentioned them.

• Data should be coded to retain as much detail as possible.

Coding : Codebook
A codebook contains coding instructions and the necessary
information about variables in the data set. A codebook generally
contains the following information:

• column number
• record number
• variable number
• variable name
• question number
• instructions for coding
Data Cleaning
Consistency Checks

Consistency checks : identify data that are out of range, logically

inconsistent, or have extreme values.
Selecting a Data Analysis Strategy
Earlier Steps of the Research Process

Known Characteristics of the Data

Properties of Statistical Techniques

Background and Philosophy of the Researcher

Data Analysis Strategy

• Metric Data- Data that are on interval or ratio scale
• Non-metric Data- Data that are on nominal or ordinal scale
• Univariate Techniques- Statistical techniques appropriate for analysing
data when there is single measurement of each element in the sample.
• Multivariate Techniques- Statistical techniques appropriate for analysing
data when there are two or more measurements on each element in the
sample. It tells simultaneous relationship between two or more
phenomenon.
– Dependence Techniques- When one or more of the variables can be
identified as dependent variable & the remaining as independent
variables.
– Interdependence Techniques- The techniques that attempt to group
data based on underlying similarity. No distinction is made as to which
variables are dependent/ independent.
A Classification of Univariate Techniques
Univariate Techniques

Metric Data Non-numeric Data

One Sample Two or More One Sample Two or More

Samples Samples
* t test * Frequency
* Z test * Chi-Square
* K-S
* Runs
* Binomial
Independent Related
* Two- Group * Paired Independent Related
test t test
* Z test * Chi-Square
* One-Way * Sign
* Mann-Whitney * Wilcoxon
ANOVA * Median * McNemar
* K-S * Chi-Square
* K-W ANOVA
A Classification of Multivariate Techniques
Multivariate Techniques

Dependence Interdependence
Technique Technique

One Dependent More Than One Variable Interobject

Variable Dependent Interdependence Similarity
Variable
* Cross- * Multivariate * Factor * Cluster Analysis
Tabulation Analysis of Analysis * Multidimensional
* Analysis of Variance and Scaling
Variance and Covariance
Covariance * Canonical
* Multiple Correlation
Regression * Multiple
* Conjoint Discriminant
Analysis Analysis
Type I Error & Type II Error
• A Type I error (α) is the mistake of rejecting the null hypothesis when it is true.
• A Type II error (β) is the mistake of failing to reject the null hypothesis when it is false.
Machine is working erroneously. But, it is assumed
to be working accurately and hence, it will fill in
wrongly causing loss to company & customers.

Ho (True) Ho(False)

Accept Ho Correct Decision Type II Error (β)

(1- α)

Reject Ho
Type I Error (α) Correct Decision
(1- β )
Machine is working accurately. But,
it is assumed to be working
erroneously and hence filling will
be 23 April 2018
stopped & mechanic is called.
Hypotheses Testing
• Level of Significance (α): Risk that a researcher is willing to take of rejecting the null hypotheses when it
happens to be true. It is probability of making a Type I error (α). The higher the significance level, the higher
the probability of rejecting a null hypothesis when its true.

• Critical Region: It is the rejection region. If the value of mean falls within this region, the null hypothesis is
rejected.

• Critical value: The value of a test statistic beyond which the null hypothesis can be rejected.

• Power of Test (1- β): It is the ability of a test to reject a false null hypothesis. The probability of supporting
an alternative hypothesis that is true. High value of 1- β(near 1) means test is working fine, it is rejecting a
null hypothesis when it is false.

• One-Tailed Test : If null hypothesis is rejected only for values of the test statistic falling into one specified
tail of its sampling distribution.

• Two-Tailed Test: If the null hypothesis is rejected for values of the test statistic falling into either tail of its
sampling distribution. A deviation in either direction would reject the null hypothesis. Normally α is divided
into α/2 on one side and α/2 on the other.
One Tailed & Two Tailed Test
• A manufacturer of a light bulb wants to produce bulbs with a mean life of 1000
hours. If the lifetime is shorter, he will lose customers to the competitors; if the
lifetime is longer, he will have a very high production cost because the filaments will
be very thick. Determine the type of test.

• The wholesaler buys bulbs in large lots & does not want to accept bulbs unless
their mean life is at least 1000 hours. Determine the type of test.
One Tailed Test
Two Tailed Test
Univariate
Data
Analysis
t-tests (Cases)
One sample t-test : To test if mean of a distribution differs significantly from
some preset value

For the given [Link] file, find if the final marks scored by students differ
significantly from the Professor’s goal of class average of 60. Design
hypothesis & test it.
t-tests (Cases)
Independent sample t-test : To test if means of a distribution of two samples
differs significantly from each other

If there are 15 customers of our brand each in Mumbai & Delhi, and they
are asked to rate our brand on a 7 point scale. 1= most disliked & 7 = most
liked.
The ratings by these 30 customers from two cities are mentioned next.
Develop a hypothesis to test if ratings by two cities are different. Also test
the hypothesis.
t-tests (Cases)
Paired sample t-test : To test if two measurements on the same sample differ
significantly

If there are 18 customers of Passion brand of garments. This set of

customers is to be monitored for their attitude towards Passion brand
before and after release of an advertising campaign. The attitude is to be
measured on a 10 point scale. 1= highly disliked, 10= highly liked.

The ratings by these 18 customers before and after the advertising

campaign are mentioned next. Develop a hypothesis to test if these ratings
by customers are different. Also test the hypothesis.
ANOVA
• Whereas t-tests compare only two distributions, analysis of variance is able to
compare many. E.g. if in case of MARKS file, we want to see whether Quiz1 scores by
men and women are different i.e. who (men or women) score higher in the quiz, a t-
test is appropriate.

If, however, we wish to see whether any of the five different ethnic groups’ scores
differ significantly from each other on the same quiz, it would require one way analysis
of variance to accomplish it.

One way ANOVA means:

Exactly one dependent variable (Continuous) e.g. quiz1 scores, here
Exactly one independent variable (Categorical) e.g. ethnicity, here, with 5 level

Two (Three) way ANOVA means: Exactly one dependent variable &
Exactly two (three) independent variable

MANOVA: Multiple dependent variables & multiple independent variables

One-Way ANOVA
• File # [Link]
Dependent variable – Quiz 1 scores
Independent variable – Ethnicity (with 5 levels)

– Ho: There is no difference among students with different

ethnicities as far as quiz1 marks scored by them is concerned.

– H1: There is significant difference among students with

different ethnicities as far as quiz1 marks scored by them is
concerned.
Chi-Square Test
Graduation background of MBA students & their performance in terms of
grade is given below:

Education Background:
• [Link] (1)
• B.E. (2)
• [Link]. (3)
Ho: Graduation background of MBA students does not
• B.B.A. (4)
influence their performance in terms of grade.
• B.A. (5)
Ha: Graduation background of MBA students influence their
Grade Codes: performance in terms of grade.
• A (1)
• B (2)
• C (3)
Correlation (r)
• Degree of association between two sets of quantitative data e.g. how crop
production is correlated with rainfall?
• r varies from -1 to +1; r=0 (no correlation); r= (+/-)1 (perfect correlation)

Bivariate Correlation: Correlation between two variables

• File # [Link]
• To produce correlation matrix of gender, gpa & final

Partial Correlation: Process of finding correlation between two variables after

the influence of other variables has been controlled for.
Regression
• Regression explains variation in one variable (dependent variable) based on the
variation in one or more other variables (independent variables)
• Simple regression: one dependent & one independent variable
• Multiple regression: one dependent & more than one independent variables

• File # [Link]
It is dersired to study the effect that six different conditions (independent variables)
have on yield per hectre for a crop of wheat. The research was conducted by
accumulating data from fifteen major states in India
The six independent variables are;
X1= Rainfall (in cms)
X2= Soil type (1, low quality to 5, high quality)
X3= Quantity of fertilizer (in quintal/ sq. km of land)
X4= Land percentage being irrigated by State Agri. Deptt.
X5= Seed quality (1, low quality to 5, high quality)
X6= Percentage of automation in cultivation process
Dependent variable is Y= yield per hectre in quintals
Regression
We need to determine:

1. Is model a good fit? From ANOVA table (F-value)

2. What % of variation in dependent variable is explained by independent variables?

From Model Summary (Adjusted R square)

3. Which independent variables are good explanatory variables of dependent variable?

From Coefficients (t-values)

4. Regression Equation

T-Test vs ANOVA in Hypothesis Testing
100% (1)
T-Test vs ANOVA in Hypothesis Testing
27 pages
Inferential Statistics Explained: Key Concepts
No ratings yet
Inferential Statistics Explained: Key Concepts
46 pages
Abr 8
No ratings yet
Abr 8
17 pages
Understanding Statistical Tests and Types
No ratings yet
Understanding Statistical Tests and Types
10 pages
Understanding Hypothesis Testing Basics
No ratings yet
Understanding Hypothesis Testing Basics
17 pages
Research Methodology: Reliability & Validity
No ratings yet
Research Methodology: Reliability & Validity
16 pages
Quantitative Data Analysis Techniques
No ratings yet
Quantitative Data Analysis Techniques
5 pages
Hypothesis Formulation and Testing Guide
No ratings yet
Hypothesis Formulation and Testing Guide
23 pages
Unit IV
No ratings yet
Unit IV
6 pages
Statistical Techniques
No ratings yet
Statistical Techniques
37 pages
Statistical Analysis Methods Explained
No ratings yet
Statistical Analysis Methods Explained
41 pages
Two-Tailed T-Test Explained
No ratings yet
Two-Tailed T-Test Explained
26 pages
A/B Testing and Hypothesis Testing Guide
No ratings yet
A/B Testing and Hypothesis Testing Guide
28 pages
Action Research Methodology Guide
No ratings yet
Action Research Methodology Guide
66 pages
Statistical Techniques for Data Analysis
No ratings yet
Statistical Techniques for Data Analysis
10 pages
Hypothesis Testing Steps Explained
No ratings yet
Hypothesis Testing Steps Explained
4 pages
Module 4 - Lesson 2
No ratings yet
Module 4 - Lesson 2
12 pages
Data Preparation and Analysis Guide
100% (1)
Data Preparation and Analysis Guide
38 pages
Hypothesis Testing BTE 711 NOTE 2
No ratings yet
Hypothesis Testing BTE 711 NOTE 2
7 pages
Selecting Statistical Tests for Hypothesis
No ratings yet
Selecting Statistical Tests for Hypothesis
20 pages
Hypothesis Testing and Statistical Models
No ratings yet
Hypothesis Testing and Statistical Models
52 pages
Understanding Type I and II Errors
No ratings yet
Understanding Type I and II Errors
7 pages
Hypothesis Testing in Data Analysis
No ratings yet
Hypothesis Testing in Data Analysis
28 pages
UNIT 3 - Hypothesis Testing
No ratings yet
UNIT 3 - Hypothesis Testing
5 pages
Data Analysis: Types & Techniques Explained
No ratings yet
Data Analysis: Types & Techniques Explained
48 pages
Primary Data Analysis with SPSS
No ratings yet
Primary Data Analysis with SPSS
72 pages
Statistical Tests: A Comprehensive Guide
No ratings yet
Statistical Tests: A Comprehensive Guide
11 pages
Hypothesis Tests for Continuous Y and Discrete X
No ratings yet
Hypothesis Tests for Continuous Y and Discrete X
28 pages
Essential Statistical Tests Overview
No ratings yet
Essential Statistical Tests Overview
40 pages
Statistical Inference
No ratings yet
Statistical Inference
23 pages
Understanding Hypothesis in Research
No ratings yet
Understanding Hypothesis in Research
36 pages
Understanding Type I & II Errors in Statistics
No ratings yet
Understanding Type I & II Errors in Statistics
5 pages
Hypothesis Testing and Statistical Tests
No ratings yet
Hypothesis Testing and Statistical Tests
20 pages
Descriptive vs Inferential Statistics
No ratings yet
Descriptive vs Inferential Statistics
33 pages
Advanced Statistical Methods Overview
No ratings yet
Advanced Statistical Methods Overview
167 pages
Hypothesis Testing Fundamentals Guide
No ratings yet
Hypothesis Testing Fundamentals Guide
8 pages
Types of Statistical Tests Explained
No ratings yet
Types of Statistical Tests Explained
19 pages
Understanding Parametric and Non-Parametric Tests
No ratings yet
Understanding Parametric and Non-Parametric Tests
10 pages
Parametric vs Nonparametric Tests
No ratings yet
Parametric vs Nonparametric Tests
14 pages
A/B Testing for Data-Driven Decisions
No ratings yet
A/B Testing for Data-Driven Decisions
25 pages
Inferential Statistics
No ratings yet
Inferential Statistics
49 pages
Data Analytics Techniques Overview
No ratings yet
Data Analytics Techniques Overview
13 pages
Inferential Statistics for Data Science
100% (1)
Inferential Statistics for Data Science
10 pages
Data Processing and Analysis Techniques
No ratings yet
Data Processing and Analysis Techniques
25 pages
Understanding Statistical Tests
No ratings yet
Understanding Statistical Tests
30 pages
Descriptive and Normality
No ratings yet
Descriptive and Normality
52 pages
Understanding Hypothesis Testing Basics
No ratings yet
Understanding Hypothesis Testing Basics
34 pages
Descriptive Statistics & Hypothesis Testing
No ratings yet
Descriptive Statistics & Hypothesis Testing
61 pages
Choosing Statistical Tests Explained
No ratings yet
Choosing Statistical Tests Explained
19 pages
Research Methodology Design Guide
No ratings yet
Research Methodology Design Guide
42 pages
Statistical Instruments in Research Analysis
No ratings yet
Statistical Instruments in Research Analysis
36 pages
T-test vs ANOVA: Statistical Comparison Guide
No ratings yet
T-test vs ANOVA: Statistical Comparison Guide
39 pages
Hypothesis Testing in Biostatistics
No ratings yet
Hypothesis Testing in Biostatistics
22 pages
Matched Group Design in Psychology
No ratings yet
Matched Group Design in Psychology
3 pages
Advanced Educational Statistics Overview
No ratings yet
Advanced Educational Statistics Overview
5 pages
Statistical Significance in Research Methods
No ratings yet
Statistical Significance in Research Methods
19 pages
Statistical Tests: T-Tests & ANOVA Guide
No ratings yet
Statistical Tests: T-Tests & ANOVA Guide
5 pages
Report Preparation Guidelines for MBA
100% (1)
Report Preparation Guidelines for MBA
22 pages
Fieldwork Management and Training Guide
No ratings yet
Fieldwork Management and Training Guide
9 pages
Understanding Causal Research Design
No ratings yet
Understanding Causal Research Design
23 pages
Ace Designers - Competing Through Process Improvement
No ratings yet
Ace Designers - Competing Through Process Improvement
17 pages
Hypothesis Testing in Research
No ratings yet
Hypothesis Testing in Research
29 pages
Technical Writing Guide for Dissertations
No ratings yet
Technical Writing Guide for Dissertations
6 pages
Order Statistics and Joint PDF Analysis
No ratings yet
Order Statistics and Joint PDF Analysis
42 pages
Reference Guide On Multiple Regression: Daniel L. Rubinfeld
No ratings yet
Reference Guide On Multiple Regression: Daniel L. Rubinfeld
55 pages
Stock Market Awareness in Young Entrepreneurs
100% (1)
Stock Market Awareness in Young Entrepreneurs
7 pages
Chi-Square Test Applications
No ratings yet
Chi-Square Test Applications
22 pages
Global Population Growth Trends 2020-2050
No ratings yet
Global Population Growth Trends 2020-2050
327 pages
Research Process and Problem Formulation
100% (1)
Research Process and Problem Formulation
10 pages
Hypothesis Testing for Population Mean
100% (1)
Hypothesis Testing for Population Mean
46 pages
Pet Attachment and Owner Anxiety Levels
No ratings yet
Pet Attachment and Owner Anxiety Levels
24 pages
Statistical Tests for Process Comparisons
No ratings yet
Statistical Tests for Process Comparisons
144 pages
Enhancing Learning Action Cell Effectiveness
100% (1)
Enhancing Learning Action Cell Effectiveness
13 pages
Assignment STAT5002
No ratings yet
Assignment STAT5002
5 pages
Understanding Hypothesis: Types & Testing
No ratings yet
Understanding Hypothesis: Types & Testing
4 pages
Hypotheses Testing in Biostatistics
100% (1)
Hypotheses Testing in Biostatistics
108 pages
Grade 10-11 Statistics Curriculum Guide
No ratings yet
Grade 10-11 Statistics Curriculum Guide
14 pages
Business Statistics Exam Instructions
No ratings yet
Business Statistics Exam Instructions
10 pages
Piso Test: Coin Weighing Analysis
No ratings yet
Piso Test: Coin Weighing Analysis
8 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
11 pages
Understanding Binary Logistic Regression
No ratings yet
Understanding Binary Logistic Regression
33 pages
Fourth Quarter Summative Test: Statistics
100% (1)
Fourth Quarter Summative Test: Statistics
6 pages
Understanding the Scientific Method
100% (2)
Understanding the Scientific Method
17 pages
Digitization's Impact on Investment Decisions
No ratings yet
Digitization's Impact on Investment Decisions
22 pages
A Level Biology Statistics Guide
No ratings yet
A Level Biology Statistics Guide
12 pages
Psychology Research Skills Exam Paper
No ratings yet
Psychology Research Skills Exam Paper
22 pages
Understanding Research Hypotheses and Methods
No ratings yet
Understanding Research Hypotheses and Methods
7 pages
Statistical Mini-Research Guide
100% (3)
Statistical Mini-Research Guide
16 pages
Narrative Sequence in Contemporary Narratology 1st Edition Raphaël Baroni Ebook Complete Online Access
100% (1)
Narrative Sequence in Contemporary Narratology 1st Edition Raphaël Baroni Ebook Complete Online Access
35 pages
Research Methodology Overview
No ratings yet
Research Methodology Overview
65 pages
Understanding Random Samples in Statistics
No ratings yet
Understanding Random Samples in Statistics
108 pages

SPSS Data Preparation Steps

Uploaded by

SPSS Data Preparation Steps

Uploaded by

Data Preparation &

Editing the questionnaires involves identifying illegible, incomplete,

Treatment of Unsatisfactory Results

– Returning to the Field

– Assigning Missing Values

– Discarding Unsatisfactory Respondents

• If possible, standard codes should be used for missing data. Coding of

• In questions that permit a large number of responses, each possible

• Category codes should be mutually exclusive and collectively

• Category codes should be assigned for critical issues even if no one

• Data should be coded to retain as much detail as possible.

Consistency checks : identify data that are out of range, logically

Known Characteristics of the Data

Properties of Statistical Techniques

Background and Philosophy of the Researcher

Data Analysis Strategy

Metric Data Non-numeric Data

One Sample Two or More One Sample Two or More

One Dependent More Than One Variable Interobject

Accept Ho Correct Decision Type II Error (β)

If there are 18 customers of Passion brand of garments. This set of

The ratings by these 18 customers before and after the advertising

One way ANOVA means:

MANOVA: Multiple dependent variables & multiple independent variables

– Ho: There is no difference among students with different

– H1: There is significant difference among students with

Bivariate Correlation: Correlation between two variables

Partial Correlation: Process of finding correlation between two variables after

1. Is model a good fit? From ANOVA table (F-value)

2. What % of variation in dependent variable is explained by independent variables?

3. Which independent variables are good explanatory variables of dependent variable?

You might also like