BASIC MEASUREMENT AND SCALING CONCEPTS
(Taken from Foxcroft & Roodt, Chapter 3)
Learning Outcomes
Describe 3 distinguishing properties of the different levels of
measurement
Describe & give an example of 4 types of measurement levels
Describe and give examples for each of the different
measurement scales
Describe the type of data each scaling method generates
Define & describe 3 basic categories of statistical measures of
location, variability & association
Name and describe the different types of test norms.
Consider This…
We understand that scaling is the process of assigning quantitative
values to qualitative data
Why is scaling (developing measuring tools) important for
psychology?
What could be considered a core limitation of measuring tools?
Levels of measurement
What is measurement?
Measurement is the transformation
of psychological attributes into
numbers
Properties of scales
Identity Magnitude Equal Intervals Absolute Zero
• Male/Female • ‘Moreness’ • The • True zero
• Yes/No • <>= difference where
between all absolutely
points is the nothing can
same be measured
Categories of measurement levels: Categorical vs. Continuous
Measurement Levels
Categorical Continuous
Nominal Ordinal Interval Ratio
There are an infinite number of
Data is ranked (spaces
Data sorted into categories Equal spaces between points points between each data point
between points are not equal)
(more accurate measurement)
Measurement scaling options
Nominal Ordinal Interval Ratio
Property: Identity Properties: Identity, Properties: Identity, Properties – all
Qualitative Magnitude magnitude, Equal Intervals can be
E.g. Horse Race intervals divided infinitely
E.g: Gender
Quantitative & Quantitative &
Qualitative Qualitative
E.g. 12hr clock Kelvin Temp, 24hr
clock, weight
PREFERENCE FOR SCALES
• This is more statistically rigorous • We often treat ordinal data as if
Preference for
Interval / Ratio
ordinal / nominal
Psych data is usually
so gives more ‘accurate it is interval data – we assume
measurements’. But as equal differences between
psychological data is not intervals/measurements. This
mathematical in nature, our issue with this can be seen most
measurements can’t be this clearly in cross cultural contexts
exact. of assessment where the
differences are more marked.
Measurement Errors
MEASUREMENT ERRORS
• Source of error & • Built into the test
Random sampling
error
Systematic Error
impact of error • Consistently
due to chance influences scores
factors of specific groups
RANDOM SAMPLING ERROR
Randomly affects measurement across the sample
Effect is not consistent across the sample
Effects tend to cancel each other out
Adds variability but does not affect the average
Random error is considered ‘noise’
Sample size is important – greater size = less error
SYSTEMATIC ERROR
Any factor(s) that systematically affect(s) measurement of
the variable across the sample
Consistently positive or consistently negative across the
sample
Assessment results tend to be a function of group
membership & not a true measure of the construct being
measured.
Result of effect = BIAS
REDUCING ERRORS
Pilot test instruments Double check data Statistical Multiple measures of
procedures the same construct
Application: Random vs. Systematic Error
• Differentiate between
systematic and random error
in psychological assessment
tools.
• Use this figure to substantiate
your answers
Random vs. Systematic Errors
Systematic error is an error inherent in the tool itself. This will result in a
consistent error which is measurable and can be anticipated if known.
This means that the tool would consistently treat groups differently
advantaging the one and disadvantaging the other. The main idea here
is that the results are dependent on group membership not on what is
being tested. Another form of systematic error, could be that there are
validity issues with the test. This would also impact negatively on the
learner’s results.
Random error refers to errors that cannot be controlled as they are not
consistent from one administration of the tool to another. The young boy
is clearly distressed, he may doubt his ability or may be struggling with
performance anxiety or a sense of learned helplessness. This would
impact on his ability to concentrate which would impact on his
performance.
Types of Scales
Different scaling options
Semantic Constant
Category Likert Intensity
Differential Sum
Similar to
Opposite poles
semantic Ipsative
anchored
Indicate degree differential
Responses are
to which you
defined
agree with a
categories
statement Indicate where
Two opposite Ranking through
you fall between
anchor points weighting
the poles
Different scaling options
Paired Graphic
Forced Choice Guttman
Comparison Rating
Ipsative Used for attitude
Must choose an
Ranking using or prejudice
option
visual display e.g. research
Ranking by weight emoticons for
service or pain Score depends on
scale Can only choose
cumulative score
2 options only one option
of sub-questions
Application: Identifying different types of scales
Use one or more of the links to access a free test
online. Have a look at the types of questions and see if Links to free tests
you can identify some different scale options. Career Test
Work Values Test
What did you use to help you make up your mind? Team Roles Test
Considerations when choosing a scale format
Single vs.
Questions vs.
composite
statements
dimensions
These are important
choices to make, can
Labelled/Unlabelled Single attribute vs you think of contextual
responses comparative rating factors that would guide
your thinking here?
Even vs. odd Ipsative vs.
numbered ratings normative scales
BASIC STATISTICAL
CONCEPTS
What is DATA?
Observations you collect from your sample but more specifically the observed values of the
variable you are looking at
We make sense of data using statistics
Statistics can be descriptive: give us information about the data
Statistics can be inferential: we use the statistics to draw conclusions about the sample
which we then apply to the population
Raw data is not very useful what would you say is the benefit of visual data?
Visual data allows us to instantly make sense of the data just by looking at it. When presented
visually, it facilitates the interpretation of the data. We can quickly see groupings and patterns and
can start to ask some questions about what we are seeing
We ‘measure’ data to be more rigorous in our analysis
Just by looking at the data we
After presenting data visually we
get a feeling for it, but a ‘feeling’
then analyse it.
is subjective
Two main types of measures
Instead of just looking at the exist:
data we ‘measure’ it • Measures of Centre
• Measures of Variance
Measures of Centre / Central Tendency
• Purpose: Look at where data values lie
• There are 3 measures of center
• Mean = mathematical average (most rigorous
as it is sensitive to all the data)
• Median = once ranked, this is the value that is
in the middle
• Mode = most frequently occurring value
Normal vs. Skewed Distributions
Measures of Variation
• Variation tells us how spread out
data values are
• The more spread out values are the
less consistent the data is so fewer
inferences can be made reliably i.e.
data varies more from sample to
sample
• Range = crudest measurement of
variation = maximum value –
minimum value
Measures of Variance Cont.
More complex measures of variation are variance and
standard deviation.
These are complex because they take into account all
the variables within the data set
Variance = measure of average distance from each
point (data value) to the middle (mean)
Standard Deviation is the most frequently measure of
variance used.
Calculating variance and standard deviation:
Variance Standard Deviation
• Work out the mean • Square root of variance
• For each number subtract
the mean & square the
result
• Work out the average of
the squared differences
Example
• You and your friends have just measured the heights of your dogs (in millimeters)
• The heights at the shoulders are: 600mm, 470mm, 170mm, 430mm and 300mm.
• Find out the Mean, the Variance and the Standard Deviation
Answer: The Mean
• (600 + 470 + 170 + 430 + 300) / 5 = 1970/5 = 394
• This means the average height is 394mm.
• It is plotted on the graph (green line)
Answer: Variance
• To calculate the Variance you need to take each difference,
• square it
• and then average the result
• [2062 + 762 + (-224)2 + 362 + (-94)2] / 5 = 108,520 / 5 = 21,704
Answer: Standard Deviation
• The Standard Deviation is the square root of the Variance
• Standard Deviation = √21,704 = 147
Why is the Standard Deviation important/useful?
• We can use this to identify which of the dogs are within one standard deviation of the
mean.
• Identify which dogs are ‘normal’, ‘extra large’ or ‘extra small’
Measures of Association
Measures of Association
Correlation Coefficient – relationship
between 2 variables
Ranges from -1 to +1 where zero
shows no correlation
Formula = Pearson Product –
Moment Correlation
Strong positive Moderate positive No correlation
correlation correlation
Moderate negative Strong negative Curvilinear
correlation correlation relationship
What is regression analysis?
The comparison that gives the
best prediction of the relationship
between the dependent variable
and independent variable(s).
E.g. Aptitude tests being used to
predict job performance of
machine operators
Norms
Why are norms important?
Norms
Standards are important.
A normal distribution has a mean of zero and a SD of 1
Comparison with peers and/or the greater population.
The norm group is a representative sample of the population.
Establishing norms for a test is the final stage of
standardisation
Establishing Norms
Identify Population (applicant pool)
Take a Random Representative Sample (Incumbent population)
Test/Research the sample
Calculate norms
Co-Norming – use one sample group to establish normative data
on 2 or more related measures
Types of norms usually found in psychological tests
Developmental Norms
• Mental Age
• Grade Equivalent Percentiles
Standard Scores
• Z-Scores
Deviation IQ • T-Scores
• Sten
• Stanine
Standard Scores (Z Scores):
Used to ID probability of a score occurring within our normal distribution
Enables comparison of two scores from different normal distributions.
Z Score is derived by subtracting direct score from the mean and dividing by standard deviation
Arithmetic mean of a standard score is zero and standard deviation is one
Standard Scores are dimensionless
T scores
T Scores eliminate negative values from Z Scores
T Score = (Z Score X 10) + 50
Z-scores and T scores both represent standard deviations from the mean,
T-scores use a mean of 50 and z-scores use a mean of 0.
A T-score of over 50 means above average, below 50 means below average.
Stanines
Indicates nine statistical units from a scale of 1 to 9
Stanines are used to indicate performance level a single digit
score
Scores are ranked lowest to highest and then assigned to the
corresponding stanine
Two scores in the same stanine can be further apart than two
scores in an adjacent stanine which reduces the value of the
scale
Stens
The Sten (standard ten) is a standard score system commonly used with personality questionnaires.
Stens divide the score scale into ten units.
Each unit has a band width of half a standard deviation except the highest unit (Sten 10).
Sten scores can be calculated from Z-scores using the formula: Sten = (Zx2) + 5.5.
Stens have the advantage that they enable results to be thought of in terms of bands of scores, rather than absolute scores.
These bands are narrow enough to distinguish statistically significant differences between candidates, but wide enough not to
over emphasize minor differences between candidates.
Stens
Percentiles versus Percentages
Percentage
• A percentage is the number out of every hundred with a particular
attribute. For example, if 120 of 150 candidates pass an assessment,
the percentage pass rate is 80% (80 out of every hundred).
Percentile
• A percentile is a position in a rank ordering expressed as the
percentage who are lower in the rank ordering. For example, a student
at the 70th percentile performed better than 70% of other candidates.
Percentiles
Deviation IQ Scales
• Normal Standard
Score with Mean of
100 and standard
deviation of 15
• Not easily
comparable to other
norm scores
because of the
different standard
deviation.
Norms versus Standards
• Norms describe what a given population can do
• Standards represent what a given population should be able to do
• Tests can be norm-referenced or criterion-referenced
Norm-referenced assessment Criterion-referenced assessment
• shows how students are achieving compared with a • shows what students can or can’t do in relation to a
statistical sample of others of an equivalent group at a specific list of tasks or skills. Teachers’ judgments are
given point in time. Such tests often provide results in about whether the student has achieved each individual
percentiles or stanines. skill or task. When writing, for example, a student may
be able to succeed at each task or skill but still not be
able to write a compelling piece which meets the needs
of an audience.
Application: Thinking about fair and ethical
assessment
Numeracy & Literacy assessments are
often done with learners. These
assessments have both age norms and
grade norms. Grade norms (when
aligned to the curriculum) are an
example of a criterion referenced
assessment. Why do you think using
these norms would be fairer and/or
ethical within the South African context
where we have a history of unequal
access to education?
Using Grade Norms
We use grade norms as these are aligned
with what each learner needs to achieve in
order to move to the next grade.
Understanding performance relative to
these criterion is meaningful as it informs
where support is needed.
The age range in any grade can be quite
broad. Not all learners have had the same
level of stimulation and facilitation and so
results would vary greatly and would not
necessarily inform meaningful intervention
strategies.