0% found this document useful (0 votes)
14 views30 pages

Goodness of Measures in Research Methodology

This document discusses ways to evaluate the quality of measurement instruments or scales. It describes two important concepts: reliability and validity. Reliability refers to the consistency and stability of a measurement instrument and can be assessed through test-retest reliability or internal consistency. Validity determines whether an instrument actually measures the intended concept; there are several types including face validity, content validity, criterion validity, and construct validity. The document provides examples to illustrate different reliability methods and aspects of validity.

Uploaded by

Muhammad Haroon
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views30 pages

Goodness of Measures in Research Methodology

This document discusses ways to evaluate the quality of measurement instruments or scales. It describes two important concepts: reliability and validity. Reliability refers to the consistency and stability of a measurement instrument and can be assessed through test-retest reliability or internal consistency. Validity determines whether an instrument actually measures the intended concept; there are several types including face validity, content validity, criterion validity, and construct validity. The document provides examples to illustrate different reliability methods and aspects of validity.

Uploaded by

Muhammad Haroon
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Research Methodology

Lecture No : 11 (Goodness Of Measures)

Recap
Measurement is the process of assigning numbers or labels to objects, persons, states of nature, or events. Scales are a set of symbols or numbers, assigned by rule to individuals, their behaviors, or attributes associated with them

Using these scales we complete the development of our instrument. It is to bee seen if these instruments accurately and measure the concept.

Sources of Measurement Differences


Why do scores vary? Among the reasons legitimate differences, are differences due to error (systematic or random) 1. That there is a true difference in what is being measured. 2. That there are differences in stable characteristics of individual respondents On satisfaction measures, there are systematic differences in response based on the age of the respondent.
11/22/2013 5

[Link] due to short term personal factors mood swings, fatigue, time constraints, or other transistory factors. Example telephone survey of same person, difference may be due to these factors (tired versus refreshed) may cause differences in measurement. [Link] due to situational factors calling when someone may be distracted by something versus full attention.
11/22/2013 6

[Link] resulting from variations in administering the survey voice inflection, non verbal communication, etc.

Differences due to the sampling of items included in the questionnaire.

7. Differences due to a lack of clarity in measurement instrument (measurement instrument error). Example; unclear or ambiguous questions.
8. Differences due to mechanical or instrument factors blurred questionnaires, bad phone connections.

11/22/2013

Goodness of Measure
Once we have operationalized, and assigned scales we want to make sure that these instruments developed measure the concept accurately and appropriately. Measure what is suppose to be measured Measure as well as possible

11/22/2013

Validity : checks as to how well an instrument that is developed measured the concept Reliability: checks how consistently an instrument measures

10

11

Ways to Check for Reliability


How to check for reliability of measurement instruments or the stability of measures and internal consistency of measures? Two methods are discussed to check the stability . 1. Stability (a) Test Retest Use the same instrument, administer the test shortly after the first time, taking measurement in as close to the original conditions as possible, to the same participants.
11/22/2013 12

If there are few differences in scores between the two tests, then the instrument is stable. The instrument has shown test-retest reliability. Problems with this approach. Difficult to get cooperation a second time Respondents may have learned from the first test, and thus responses are altered Other factors may be present to alter results (environment, etc.)

13

(b) Equivalent Form Reliability This approach attempts to overcome some of the problems associated with the test-retest measurement of reliability. Two questionnaires, designed to measure the same thing, are administered to the same group on two separate occasions (recommended interval is two weeks).
11/22/2013 14

If the scores obtained from these tests are correlated, then the instruments have equivalent form reliability. Tough to create two distinct forms that are equivalent. An impractical method (as with test-retest) and not used often in applied research.

15

(2)Internal Consistency Reliability


This is a test of the consistency of respondents answer to all the items in a measure . The items should hang together as a set. i.e. the items are independent measures of the same concept, they will correlated with one another

11/22/2013

16

Developing questions on the Concept Enriched Job

Validity
Definition: Whether what was intended to be measured was actually measured?

11/22/2013

18

Face Validity The weakest form of validity Researcher simply looks at the measurement instrument and concludes that it will measure what is intended. Thus it is by definition subjective.

11/22/2013

19

Content Validity The degree to which the instrument items represent the universe of the concepts under study. In English: did the measurement instrument cover all aspects of the topic at hand?

11/22/2013

20

Criterion Related Validity The degree to which the measurement instrument can predict a variable known as the criterion variable.

11/22/2013

21

Two subcategories of criterion related validity Predictive Validity Is the ability of the test or measure to differentiate among individuals with reference to a future criterion. E.g. an instrument which is suppose to measure the aptitude of an individual, when used can be compared with the future job performance of a different individual. Good performance (Actual) should also have scored high in the aptitude test and vise versa 22

Concurrent Validity Is established when the scale discriminates individuals who are known to be different that is they should score differently on the test. E.g. individuals who are happy at availing welfare and individuals who prefer to do job must score differently on a scale/ instrument which measures work ethics.

Construct Validity Does the measurement conform to some underlying theoretical expectations. If so then the measure has construct validity. i.e. If we are measuring consumer attitudes about product purchases then do the measure adhere to the constructs of consumer behavior theory. This is the territory of academic researchers

11/22/2013

24

Two approaches are used to measure construct validity Convergent Validity A high degree of correlation among 2 different measures intended to measure same construct Discriminant Validity The degree of low correlation among varaibles that are assumed to be different.

11/22/2013

25

To check validity through Correlation analysis, Factor Analysis, Multi trait , Multi matrix correlation etc

26

Reflective vs Formative measure scales: In some multi item measure where it is measuring different dimensions of a concept do not hang together Such is the case of Job Description Index measure which measures job satisfaction from 5 different dimension i.e Regular Promotions, Fairly good chance for promotion, Income adequate, Highly Paid, good opportunity for accomplishment.
27

In this case some items of dimensions Income adequate and Highly paid to be correlated but dimension items of Opportunity for Advancement and Highly Paid might not correlated. In this measure not all the items would related to each other as its dimensions address different aspect of job satisfaction. This measure /scale is termed as Formative scale

28

In some cases the measure dimensions and items correlate. In this kind of measure/scale the different dimensions share a common basis ( common interest) An example is of a scale on Attitude towards the Offer scale. Since the items are all focused on the price of an item, all the items are related hence this scale is termed as Reflective Scale.
29

Recap

30

Common questions

Powered by AI

Construct validity assesses whether a measurement accurately captures the theoretical construct it is intended to measure. Convergent validity is determined by the high correlation between different measures that aim to assess the same construct, ensuring that related variables converge . Discriminant validity evaluates the low correlation between measures that are supposed to represent different constructs, confirming that unrelated variables are distinct . These assessments involve techniques such as correlation analysis and factor analysis to determine the internal consistency of the measures, ensuring that the instrument accurately reflects the theoretical framework it is based on .

Test-retest reliability involves administering the same instrument to the same participants shortly after the initial test under similar conditions. If the scores are consistent, the instrument is considered stable . However, this method can face challenges such as respondent fatigue and environmental changes influencing results . Equivalent form reliability uses two different questionnaires designed to measure the same thing, administered to the same group at different times. Correlated scores indicate reliability . The limitations include the difficulty in creating two equivalent forms and the impracticality of the method in applied research .

Discriminant validity is crucial in social science research because it ensures that constructs supposed to be different are indeed distinct, thereby safeguarding against redundancy and ensuring clarity in the construct definitions . It relates to construct validation by confirming that the measure accurately captures the specific theoretical construct without overlap into unrelated constructs. This is achieved through techniques such as low correlation measures among supposed different constructs and is integral to establishing the overall construct framework's validity . By verifying distinctiveness, discriminant validity strengthens the specificity and relevance of theoretical models in research.

A formative scale is appropriate when measuring different dimensions of a concept that do not necessarily correlate, as in the case of the Job Description Index, where different aspects of job satisfaction (such as income adequacy and opportunity for advancement) address separate concepts . Conversely, a reflective scale is used when different dimensions of the scale are expected to correlate because they measure the same underlying concept, such as attitudes towards a pricing offer where all items are interrelated by the common factor of price interest . The main difference lies in the interrelationship of the items: formative scales have dimensions that contribute independently to the construct, while reflective scales have interrelated items that reflect the same central underlying variable.

Measurement differences can arise from several sources, impacting the validity of survey results. True differences in what is being measured can cause variability, as can stable characteristics of individual respondents such as systematic differences in responses based on age . Other sources include transient personal factors like mood swings and time constraints, situational factors such as distractions during the survey, and variations in survey administration including voice inflection and non-verbal communication . Further, differences might stem from the sampling of questionnaire items, lack of clarity in measurement instruments, and technical issues like blurred questionnaires or bad phone connections . These factors can introduce both systematic and random errors, affecting the data's validity by not accurately representing the intended measurements.

Mechanical or instrument factors, such as blurred questionnaires and poor phone connections, introduce inaccuracies during data collection, leading to incorrect interpretations of the measured concept . These errors can be mitigated by implementing quality checks on instruments before deployment, using clear and unambiguous designs for questionnaires, and ensuring reliable communication channels in telephone surveys. Regular maintenance and calibration of technical equipment involved in data collection, paired with pilot testing and feedback loops from participants, can further aid in identifying potential issues early, ensuring data reliability and accuracy .

Predictive validity assesses whether a test can forecast an individual's performance on a future criterion, such as using an aptitude test to predict job performance . Concurrent validity, on the other hand, evaluates whether the test can differentiate between groups that are known to be different at the time the test is administered, such as differing scores in work ethics between individuals who prefer welfare and those who prefer jobs . Both are subcategories of criterion-related validity, which examines the correlation between the measurement instrument and a criterion variable to evaluate the test’s effectiveness in real-world settings .

Face validity is a subjective assessment of whether an instrument appears to measure what it intends to measure, making it the weakest form of validity due to its reliance on the researcher's judgment . In contrast, content validity evaluates whether the instrument covers all relevant aspects of the concept being studied, providing a more comprehensive assessment . While face validity can provide initial assurance, content validity is critical for ensuring that the measure comprehensively reflects the domain of interest, hence contributing more significantly to the overall validity of the instrument.

When using correlation analysis and factor analysis to assess validity, challenges include multicollinearity, where high correlations between variables can obscure unique contributions, and overfitting, where models are too complex for the data at hand . Additionally, factor analysis requires large sample sizes and normally distributed data to produce reliable results. These issues can be addressed by using techniques like regularization to handle multicollinearity, ensuring adequate sample sizes, and performing exploratory factor analysis (EFA) to verify the suitability of the factor model. Goodness of fit tests and cross-validation can further ensure that models are appropriate and not overly biased by sample-specific characteristics .

Internal consistency reliability enhances the reliability of a survey instrument by ensuring that all items within a scale measure the same underlying concept, meaning responses should be consistent across similar items . Methods to assess it include Cronbach's alpha, which evaluates the average correlation among items, and split-half reliability, which involves dividing the items into two sets and comparing the correlation between these sets. A high level of internal consistency indicates that the items are reliably assessing the concept in question . These methods provide a robust measure of consistency across items, reducing the likelihood of errors attributed to item-specific variance.

You might also like