in Adams,J., Khan, H.T.A., Raeside, R. & White, D.
(2007)
Research Methods for Graduate Business and Social
CHAPTER 14
Sciences Students. London: SAGE
Doc 2.5
Tests of Measurement
and Quality
14.1 INTRODUCTION
Three criteria are generally used for testing and evaluating measurements of variables and ensuring the quality
of data, research design methods and the overall accuracy of study results. These criteria are known as reliability,
validity and generalisability. These are very important, both in qualitative as well as in quantitative research.
In quantitative research, they are followed fairly easily in order to understand the actual reality and to generalise
the findings. However, qualitative study has different benefits although generalisation can be difficult. Moreover,
qualitative research requires theoretical sophistication and methodological rigour.
14.2 RELIABILITY
Reliability estimates the consistency of the measurement or more simply, the degree to which an instrument
measures the same way each time it is used under the same conditions with the same subjects. Reliability is
essentially about consistency. That is, if we measure something many times and the result is always the same,
then we can say that our measurement instrument is reliable. In other words, when the outcome of the
measuring process is reproducible, the measuring instrument is reliable—this does not mean that it is valid!
It simply means that the measurement instrument does not produce erratic and unpredictable results. It may
be measuring a variable wrongly all the time but as long as it measures it consistently wrongly, it will be con-
sidered reliable! This may seem odd but what it basically means is that reliability is a necessary condition for
validity but not a sufficient condition on its own.
A very important aspect of reliability lies in the definitions of variables which are being measured. If we
construct a variable such as ‘sensitivity to prices’ and ask respondents a series of questions in order to measure
their price sensitivity, we need to be absolutely certain that we are measuring what we think we are measuring.
Our very own definition of ‘price sensitivity’ may not be shared by all respondents or even with current the-
oretical understanding of the concept. For reliability in measurement, especially in survey research, we must
236 Research Methods for Graduate Business and Social Science Students
have a clear and an unambiguous definition of all the concepts and artificial constructs being used in the
research design. If this is not the case, we will find it very difficult to make any kind of sensible and useful
generalisations from our research findings.
There are two ways by which reliability is usually assessed: first, by checking the stability of measure-
ment using the test-retest method (repeatability) and second, by examining internal consistency or applying
the split-half method.
Test-retest Method
Assessing the repeatability of a measure is the first aspect of reliability. The test-retest method is a conservative
method to estimate reliability. The idea behind it is that one should get the same score on a given test on
repetitive testing. There are three main components to this method.
(i) Administering the measurement instrument for each subject at two separate times to test for stability.
It is said to be reliable if the measure is stable over time. For example, a researcher measures job
satisfaction and finds that 60 per cent of the population is satisfied with their jobs. If the study is
repeated a few weeks later under similar conditions, and if he/she finds the same result, it appears that
the measure is reliable.
(ii) The high stability correlation or consistency between the two measures at time-1 and time-2 indicates
a high degree of reliability.
(iii) Assume there is no change in the underlying conditions (or trait you are trying to measure) between
test-1 and test-2. For example, at individual level, we assume that a person does not change his or her
attitude about the job.
When a measuring instrument produces unpredictable results from one to the next, the results are said to
be unreliable because of error in measurement.
Split-half Method/Equivalent form Method
The second dimension of reliability concerns the homogeneity of the measure. The technique of splitting
halves is the most basic method for checking internal consistency when a measure contains a large number of
items. In the split-half method, one may calculate results from one-half of the scale items (e.g., odd-numbered
items) and check them against the results from the other half of the items (e.g., even numbered items).
However, in the equivalent-form method, two alternative instruments are designed to be as equivalent as
possible. Internal consistency estimates reliability by grouping questions in a questionnaire that measure
the same concept. For example, you could write two sets of three questions that measure the same concept
(say, class participation) and after collecting the responses, run a correlation between those two groups of
three questions to determine if your instrument is reliably measuring that concept. The closer the correlation
coefficient is to one, the higher the reliability estimate of the instrument. Both the split-half and equivalent-
form methods measure homogeneity or internal consistency rather than stability over time.
Tests of Measurement and Quality 237
14.3 VALIDITY
Validity is the strength of our conclusions, inferences or propositions. It involves the degree to which you
are measuring what you are supposed to, more simply, the accuracy of your measurement. For instance, we
are studying the effect of strict attendance policies on class participation. In this case, we saw that class
participation did increase after the policy was established. Each type of validity would highlight a different
aspect of the relationship between our treatment (strict attendance policy) and our observed outcome (increased
class participation). There are four types of validity commonly examined in research methods.
(i) Internal Validity asks if there is a relationship between the programme and the outcome we saw, is it a
causal relationship? For example, did the attendance policy cause class participation to increase?
(ii) External Validity refers to our ability to generalise the results of our study to other settings. In our
example, could we generalise our results to other classrooms?
(iii) Construct Validity is the hardest to understand. It asks if there is a relationship between how the
researcher operationalised concepts in the study to the actual causal relationship that he/she is trying
to study. Or in the example, did our treatment (attendance policy) reflect the construction of attendance
and did our measured outcome—increased class participation—reflect the construct of participation?
Overall, we are trying to generalise our conceptualised treatment and outcomes to broader constructs
of the same concepts.
(iv) Conclusion Validity asks is there a relationship between the programme and the observed outcome?
Or, in the above example, is there a connection between the attendance policy and the increased
participation?
It is believed that validity is more important than reliability because if an instrument does not accurately
measure what it is supposed to, there is no reason to use it even if it measures consistently (reliably).
Threats to Internal Validity
Internal validity concerns the likelihood that changes in the dependent variable (the subject of the research)
can only be attributed to manipulation of the independent variable and not to some other variable. When this
is the case, a study is said to have high internal validity. If it is possible to provide an alternative explanation
for the results of the study, the study has low internal validity. The following factors are important threats to
the internal validity of a study.
History
When external events affect the outcome of a study, the passage of time may become a concern in that what
one observes may be an outcome of events which occurred in the past and this may be jumbled up with con-
temporary effects. For example, if a study was carried out on the risks associated with various financial
238 Research Methods for Graduate Business and Social Science Students
institutions, a public crisis of confidence in the banking system during the period of the study would adversely
affect the experimental outcomes. The longer a study lasts, the more likely it is that history will become a
problem.
Maturation
Maturation refers to changes that can occur in the subjects of the study over a period of time. This includes
ageing, fatigue and acquisition of skills or experience over time. For example, bank employees might become
accustomed to a commission incentive after two weeks and stop trying to sell more.
Testing
Testing effects can be attributed to changes in subjects that arise from the influence of the testing process
itself. For example, it is possible that a pre-test can sensitise or bias a subject’s behaviour and result in an
improved performance on the post-test. One way to overcome this threat is to use a different post-test to the
pre-test.
Instrumentation
Instrumentation refers to the inconsistency or unreliability in the measuring instruments or observation
procedures during a study. For example, observers may be inconsistent in what they record during a study, or
a post-test may be much more difficult than a pre-test.
Selection
Selection problems arise from one group in an experiment being different from another group. For example,
one group might be brighter, more experienced or more receptive to change than another group. In other
words, ‘people factors’ can cause a bias in the study. This threat is overcome by the random assignment of
subjects to groups. It remains a problem where existing groups are used for the treatment and control groups.
Mortality
Mortality refers to the attrition of subjects from a study. The longer a study proceeds, the more likely it is that
mortality will become a problem, especially if the subjects who drop out share a common characteristic with
the entire group.
Threats to External Validity
External validity refers to the degree to which the findings of a research study can be generalised to other
settings and situations. When conducting an experiment, a researcher hopes that the findings can be applied
at a later time to other groups of people in other geographical locations. The following factors may threaten
the external validity of the study.
Tests of Measurement and Quality 239
Reactive Effects of Testing
The artificial effects of pre-testing may sensitise the subjects to the treatment. Without a pre-test, different
outcomes may result from the experiment than would occur in practice. For example, if a study of attribute
change was to begin with a pre-testing of attitudes, participants might become sensitised to the attributes in
question and therefore show more attitude change as the result of the experimental treatment.
Reactive Effects of Selection
If the samples drawn for a study are dissimilar to the general population, it becomes difficult to generalise
findings from the sample to the larger population. For example, the findings of a study involving only urban
dwellers may not be applicable in rural settings. It is desirable to use samples which are representative of
the broadest population possible. One widespread selection practice in much of management research is to
use university students as subjects. This practice poses a significant external validity threat to many studies
if results are extended to the commercial world.
Reactive Effects of Experiment Setting
The arrangements for an experiment or the experience of participating in the experiment may limit the
generalisability of the findings to other settings. Reactive effects can also occur when subjects know that
they are participating in an experiment. This effect was demonstrated many years ago in the Hawthorne
Plant of the Western Electric Company in the USA. A part of this study investigated the relationship between
productivity and the brightness of lighting in the factory. As expected, productivity increased as illumination
was increased. However, as brightness was decreased productivity also rose. It was concluded that it was
the attention the workers were receiving, rather than lighting that was affecting production. This type of
experimental participatory effect has become known as the Hawthorne Effect.
14.4 GENERALISABILITY
Finally, in any research, it is important that we conclude something about a particular phenomenon, which is
outside our own (narrow) research study. Why is this important? Because unless we can make some
generalisations, we are not really pushing knowledge forward and that is the whole point of research. Of
course, in many examples of business research, it is the case that a manager may not be interested in
generalising research findings and is only interested in ‘solving’ his or her particular research problem. This
is fine in terms of immediate problem solving but in terms of gaining a deeper understanding of the very
nature of a particular problem, it does not go very far. Most businesses will seek an ‘off the peg’ solution to
a business problem because it is often cheaper than undertaking their own research. But such solutions only
exist if the research which produced them was capable of generalising its findings.
Sensitivity analysis, by varying some of the inputs and doing ‘what if’ investigation also helps to ascertain
the efficiency of estimates and models built from a proportion of data. This helps to determine the stability of
the solution and how safe it is to generalise from that model. In essence, the ability of any research design to
240 Research Methods for Graduate Business and Social Science Students
produce findings which are (mostly) applicable to other situations, organisations, countries and other people
is dependent on the quality of the underlying theory which allows us to interpret the ‘world’ in the context of
a given research problem. Without theory no amount of empirical data will enable us to better understand the
world we live in. That is the crux of generalisability—the ability to explain the same (or similar) phenomena
at all times and in all places without necessarily having to study it directly at all times and in all places.
14.5 REFERENCES
Black, T. 1993. Evaluating Social Science Research. London: Sage Publications.
Dunn, G. 2004. Statistical Evaluation of Measurement Errors: Design and Analysis of Reliability Studies. London: Arnold.
Golafshani, N. 2003. ‘Understanding Reliability and Validity in Qualitative Research’, Qualitative Report, 8(4): 597–607.
Available at [Link]
%20research%22.
Kirk, J. and M.L. Miller. 1986. Reliability and Validity in Qualitative Research. London: Sage Publications.