Appendix I
A Refresher on some
Statistical Terms and Tests
Chapter Objectives
• Provide a ‘refresher’ of some statistical terms and tests
• Explain what types of analysis are appropriate, under
what conditions and for what objectives
• Give examples of SPSS computer output
• Explain descriptive statistics, including frequencies,
means, standard deviations, and variance
• Present a process for statistical hypothesis testing using
a computer package
• Demonstrate how inferential statistics can be used to test
hypotheses
Statistics
• Descriptive Statistics
– Help to describe the phenomena of interest
• Inferential Statistics
– Help to draw conclusions from the analysis of
data
• Parametric
– Assumes sample drawn from normal population
• Non-parametric
– Assumes sample drawn from a non-normal population
Properties of the Four Measurement Scales
Highlights
Scale Differ- Order Distance Unique Measures Measures of Some tests
ence origin of central dispersion of
tendency significance
Nominal Yes No No No Mode — X2
Ordinal Yes Yes No No Median Semi- Rank-order
interquartile correlations
range
Interval Yes Yes Yes No Arithmetic Standard t, F
mean deviation,
variance,
coefficient of
variation
Ratio Yes Yes Yes Yes Arithmetic Standard t, F
or deviation,
geometric variance or
mean coefficient of
variation
Note: The interval scale has 1 as an arbitrary starting point. The ratio scale has the natural origin 0, which is meaningful.
Sample Questionnaire for Data Analysis
Business Research Class Questionnaire
The purpose of this short questionnaire is to collect some nominal, ordinal, interval
and ratio data that can be used to demonstrate some of the basic statistical methods
for analysing quantitative data. The individual responses will be anonymous and
the data collected will be used only for class exercises.
Please tick the appropriate box, provide the data requested or circle a number,
where appropriate.
1. What is your gender?
Female Male
2. Please indicate your height to the nearest centimetre (cm)
3. Please indicate your weight to the nearest kilogram (kg)
4. What is the colour of your eyes? (just tick one box please!)
Blue Brown Other
5. Please indicate the extent to which you disagree or agree with the following
statements:
Strongly Strongly
disagree Disagree Neutral Agree agree
Statistics is interesting. 1 2 3 4 5
Statistics knowledge is
useful in business. 1 2 3 4 5
Many thanks for your time and assistance with completing this questionnaire.
We will now proceed to analyse the data collected!
Variable Names, Labels and Values for
Sample Data Set
Variable name Labels (variable definition) Values for variable
id Identifier for each respondent
gender Female or male 0 = Female
1 = Male
height Height in centimetres
weight Weight in kilograms
eye_col Eye colour 1 = Blue
2 = Brown
3 = Other
stat_int Statistics is interesting 1 = Strongly disagree
2 = Disagree
3 = Neutral
4 = Agree
5 = Strongly agree
stat_use Statistics is useful
Example of SPSS Data Editor Input Data –
Data View
Example of SPSS Data Editor Input Data –
Variable View
Descriptive Statistics
• Frequencies
• Measures of central tendencies
– mean, median, mode
• Measures of dispersion
– range, variance, standard deviation
– other measures
– standard error of the mean
Example – Responses to the statement
‘Statistics is interesting’
The Mean
X
Range
Represents the difference between the
highest and lowest values of a variable of
interest.
• Eg, max = 50, min = 30, range = 20
Variance Formula
Note: This formula is correct. The formula in the book is incorrect
Area under the Normal Curve
Box and Whisker Plots
Normal, Skewed and Sampling Distributions
Source: Adapted from Zikmund (2000:381).
Standard Error of the Mean
• When a number of samples are taken from
the population, the sample means form a
distribution
• The standard deviation of these sample
means is called the standard error of the
mean
• As the sample size increases, the standard
error gets smaller
Standard Error of the Mean - Formula
Example of SPSS output of
Descriptive Statistics
Std.
N Min. Max. Mean Dev Skewness Kurtosis
Std. Std. Std.
Stat. Stat. Stat. Stat. Error Stat. Stat. Error Stat. Error
Height in
centimetres 34 160 188 173.74 1.30 7.59 -.088 .403 -.831 .788
Weight in
kilograms 34 54 95 74.15 1.84 10.72 -.133 .403 -.318 .788
Statistics is
interesting 34 1 5 3.03 .21 1.24 .243 .403 -.901 .788
Statistics is
useful 34 1 5 2.15 .15 .89 1.329 .403 2.579 .788
Valid N
(listwise) 34
Inferential Statistics
Helps to draw inferences or conclusions
from the analysis of the data, such as:
• The relationships between two variables
• Differences in variables among different
subgroups
• How several independent variables might
explain the variance in an independent
variable
Inferential Statistics
• Statistical hypothesis testing
– The null and alternate hypotheses
– Choosing a statistical test
– Significance level
• Correlations
Process for Statistical Hypothesis Testing
using a Computer
The null and alternate hypotheses
• Null hypothesis
– the conjecture that postulates no differences or
no relationship between or among variables
• Alternate hypothesis
– an educated conjecture that sets the parameters
one expects to find
Choosing a Statistical Test
• Parametric tests can be applied to interval
and ratio data (and also ordinal data where
they are expressed in numeric form and
‘interval’ features are present).
• Non-parametric tests are applied to
categorical data — ie, nominal and most
ordinal data
Classification of Statistical Tests
Dependent variable
Nominal Ordinal Interval/Ratio
Chi-square test Sign test Analysis of
for independence Median test variance (ANOVA)
Nominal
Mann-Whitney U test T-test
Cochran Q test Kruskal-Wallis one-way
Fisher exact analysis of variance
probability
One
Spearman’s rank Analysis of variance
Ordinal
correlation with trend analysis
Kendall’s rank correlation
Independent variables
Analysis of variance Simple regression
Interval
ratio
(ANOVA) analysis
Pearson correlation
Friedman two-way analysis Analysis of variance
Nominal
of variance (factorial design)
Two or more
Ordinal
Multiple Multiple regression
Interval
ratio
discriminant analysis
analysis
Source: Adapted from R. L. Baker and R. E. Schultz (eds) 1972, Instructional Product Research.
New York: Van Nostrand Co.
Significance level
• the probability of rejecting the null
hypothesis when it is true
• also called the critical value
• the probability of this occurring is called
(alpha)
• Significance level = 1 – confidence level
• Eg significance level = 0.05, indicates
confidence level = 0.95 (or 95%)
Hypothesis Testing and
Statistical Decision Making
Statistical decision
Accept H0 Reject H0
H0 is true Correct Type I error
True state of (Probability = 1-a) (Probability = a)
the situation H0 is false Type II error Correct
(Probability = b) (Probability = 1- b)
Relationship between Type I and II Errors
Source: D. A. Aaker, V. Kumar
and G. S. Day 1995, Marketing
Research, 5th edition. New York:
John Wiley & Sons, p. 473
Pearson Correlation
• indicates the direction, strength and
significance of the bivariate relationships
between interval or ratio variables, eg:
HO: Role overload and performance are not related
to each other. [r = 0]
HA: the two are significantly negatively correlated.
[r < 0]
r = -0.1735 p = 0.083
r = -0.29 p = 0.054
r = -0.33 p = 0.049
Scatter Diagrams of two Variables with
different Correlation Coefficients
Procedure for Chi-square Test with SPSS
Step 1: Formulating the hypotheses
Step 2: Decision criterion
Step 3: Analyse data with computer package
Step 4: Make a statistical decision
Step 5: Interpret the decision
Example of SPSS Output for Crosstabs and
Chi-square tests36
Female or male * eye colour cross-tabulation
Eye colour
Blue Brown Other Total
Female Female Count 3 3 2 8
or Male Expected Count 3.1 3.1 1.9 8.0
Male Count 10 10 6 26
Expected Count 9.9 9.9 6.1 26.0
Total Count 13 13 8 34
Expected Count 13.0 13.0 8.0 34.0
Example of SPSS Output for Crosstabs and
Chi-square tests36 (cont)
Chi-square tests
Value df Asymp. Sig.
(2-sided)
Pearson chi-square .013a 2 .994
Likelihood ratio .012 2 .994
No. of valid cases 34
a
3 cells (50.0%) have expected count less than 5. The minimum expected count is 1.88.
t distribution
• is suitable for analysing the means of small
samples
• drawn from a population that is normally
distributed
• shape of the t distribution depends on the
degrees of freedom (df )
t distribution formula
Comparison of t distribution & normal curve
Example of SPSS output for single
sample tests
One-sample statistics
N Mean Std. Deviation Std. Error Mean
Statistics is interesting 34 3.03 1.24 .21
Statistics is useful 34 2.15 .89 .15
One-sample test
Test Value = 3
95% Confidence Interval
Sig. Mean of the Difference
t df (2-tailed) Difference Lower Upper
Statistics is
interesting 0.138 33 0.891 2.94E-02 -0.40 0.46
Statistics is
useful -5.575 33 0.000 -0.85 -1.16 -0.54
Example of SPSS output for two
independent samples t-tests
Group statistics
Female or Male N Mean Std. Deviation Std. Error Mean
Height in Male 26 176.38 5.97 1.17
centimetres Female 8 165.13 5.74 2.03
Independent samples test
t-Test for Equality of Means
Levene’s 90%
Test for Confidence
Equality of Std. Interval of the
Variances Sig. (2- Mean Error Difference
F Sig. t df tailed) Diff. Diff. Lower Upper
Height in Equal variances .747 .394 4.701 32 .000 11.26 2.40 7.20 15.32
centimetres assumed
Equal variances 4.803 12.062 .000 11.26 2.34 7.08 15.44
not assumed
Example of SPSS output for one-way
between groups ANOVA
ANOVA
Sum of df Mean F Sig.
Squares Square
Weight in Between groups 2350.495 5 470.099 9.117 .000
kilograms Within groups 1443.770 28 51.563
Total 3794.265 33
Statistics is Between groups 11.313 5 2.263 1.598 .193
interesting Within groups 39.657 28 1.416
Total 50.971 33
Regression Analysis
• Explains the variance in the dependent variable by
a set of predictors
• R-square (R2) is the explained variance
• Step-wise regression will indicate the order of
importance of the significant preditors in the
regression model
• The Beta weight of the predictors and their
significance indicates the weight each predictor
(independent variable) exerts in explaining the
variance in the dependent variable.
A Simple Regression Model
General Form of Simple Regression Line
Y = a + bX
where:
Y is the dependent variable
X is the independent variable
a is the intercept of the regression line on the
Y (vertical) axis
b is the slope of the regression line
Assumptions of Regression Analysis
Regression analysis — assumptions
1 Errors are normally distributed.
2 Errors have a zero mean (or expected value).
3 Errors have a constant variance over all values of the dependent variable
and in each time period.
4 Errors are not correlated (that is, errors in one time period are not related
to one another).
Assumptions (1) to (4) are required to obtain unbiased estimates and to use
probability theory to test the reliability of estimates.
5 Other assumptions for multiple regression analysis include: The number of
independent or explanatory variables in the regression must be smaller than
the number of observations.
6 There must not be perfect linear correlation among the independent
variables.
Example of SPSS output for simple regression
analysis
Variables entered/removedb
Model Variables Entered Variables Removed Method
1 Height in centimetresa . Enter
a
All requested variables entered.
b
Dependent variable: Weight in kilograms
Model summary
Model R R Square Adjusted R Square Std. Error of the
Estimate
1 .702a .492 .476 7.76
a
predictors: (constant) height in centimetres
ANOVAb
Model Sum of Squares df Mean Square F Sig.
1 Regression 1868.153 1 1868.153 31.037 .000a
Residual 1926.112 32 60.191
Total 3794.265 33
a
Predictors: (constant), Height in centimetres
b
Dependable variable: Weight in kilograms
Example of SPSS output for simple regression
analysis (cont.)
Coefficientsa
Unstandardised Standardised
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -98.189 30.963 -3.171 .003
Height in centimetres .992 .178 .702 5.571 .000
a
Dependent variable: Weight in kilograms
Factor Analysis
helps to reduce a vast number of variables
(for example all the questions tapping
several variables of interest in a
questionnaire) to a meaningful,
interpretable and manageable set of factors
Output of a Factor Analysis for the Evaluation
Questionnaire
Item Factor A Factor B Factor C
01 0.22346 -0.07300 0.56741
02 0.22173 0.11086 0.72553
03 -0.11021 -0.13279 0.81159
04 0.61291 -0.18332 -0.11947
05 0.66958 0.02347 0.20299
06 0.07750 -0.61141 0.20597
07 0.72003 -0.00131 0.00424
08 0.77667 0.19312 0.12914
09 0.01284 -0.24331 0.58230
10 0.62759 -0.26720 -0.10879
11 0.43661 -0.15479 0.39083
12 0.48110 -0.44660 0.03393
13 0.03974 -0.70698 0.17139
14 -0.00829 -0.81637 0.12066
15 -0.06771 -0.93732 -0.08814
Items under each Factor for the Evaluation
Questionnaire
FACTOR A
4. The trainer has provided adequate feedback on student
performance.
5. The trainer has provided an adequate flow of ideas in this course.
7. The trainer seems to know when trainees did not understand the
material.
8. The trainer made a major contribution to the value of this course.
10. The trainer has been creative in developing materials for this
course.
FACTOR B
6. This course has been adapted to trainees’ needs.
13. I learned a lot in this course.
14. This course generally fulfilled my goals.
15. This course stimulated me to want to take more work in the same
or related area.
Items under each Factor for the Evaluation
Questionnaire (cont.)
FACTOR C
1. Class time was well spent.
2. The course was well organised.
3. There was considerable agreement between announced objectives
and what was taught.
9. The goals and objectives for this course have been clearly stated.
THE LEAKING ITEMS
11. The teaching methods used in this course have been suitable.
12. This course stimulated interest in the subject content.
Multivariate Analysis
• examines several variables and their relationships
simultaneously
• Multivariate techniques include:
– MANOVA
– Discriminant analysis
– Canonical correlation
– Factor analysis
– Cluster analysis
– Multidimensional analysis