Research Methods PDF
Research Methods PDF
In Collaboration With
SCHOOL OF FSET
Prepared by
Page 1 of 121
INTRODUCTION
This course is designed to equip students with the knowledge and techniques of
undertaking research. The course introduces students to the importance of research and
the research process. Issues related to research problem formulation, research design,
methods of data collection, data analysis and reporting on research findings will be
covered.
COURSE OBJECTIVES
By the end of the course the learner will be able to conduct research. The learner will
submit a research proposal for the research work to be accomplished in the unit EAE 400:
Research Project.
NOTE
Assessment tests will be given according to schedules set by the instructor. All students
are expected to demonstrate high integrity and ensure full participation during such
exercises.
Course grades are determined by cumulative performance over all Assessments and the
final exam.
Page 2 of 121
The proposal development will account for the course work grade (30%) while the
remaining 70% will be accounted for by end of semester examination.
Page 3 of 121
REFERENCES
Page 4 of 121
TABLE OF CONTENTS
Page 5 of 121
4.4 Summary---------------------------------------------------------------------------- 48
4.5 Self-Test Questions---------------------------------------------------------------- 48
4.6 Further Reading-------------------------------------------------------------------- 49
TOPIC FIVE: CONCEPTUAL FRAMEWORK AND HYPOTHESIS
FORMULATION
5.1 Topic Objectives----------------------------------------------------------------- 50
5.2 Conceptual Framework--------------------------------------------------------- 50
5.3 Hypotheses----------------------------------------------------------------------- 54
5.4 Summary-------------------------------------------------------------------------- 57
5.5 Self-Test Question--------------------------------------------------------------- 58
5.6 Further Reading------------------------------------------------------------------ 58
TOPIC SIX: RESEARCH DESIGNS
6.1 Topic Objectives----------------------------------------------------------------- 59
6.2 Meaning Of Research Design-------------------------------------------------- 59
6.3 Types Of Research Designs---------------------------------------------------- 60
6.4 Concepts In Research Design-------------------------------------------------- 61
6.5 Summary-------------------------------------------------------------------------- 62
6.6 Self-Test Questions-------------------------------------------------------------- 62
6.7 Further Reading------------------------------------------------------------------ 63
TOPIC SEVEN: SAMPLING DESIGN
7.1 Topic Objectives----------------------------------------------------------------- 64
7.2 Census Vs. Sample Surveys--------------------------------------------------- 64
7.3 Sampling Design----------------------------------------------------------------- 65
7.4 Costs In Sampling--------------------------------------------------------------- 66
7.5 The Characteristics Of A Good Sampling Design--------------------------- 67
7.6 Types Of Sampling Methods--------------------------------------------------- 68
7.7 Probability Sampling Methods------------------------------------------------- 71
7.8 Summary-------------------------------------------------------------------------- 78
7.9 Self-Test Questions--------------------------------------------------------------- 79
7.10 Further Reading----------------------------------------------------------------- 79
TOPIC EIGHT: METHODS OF DATA COLLECTION
8.1 Topic Objectives------------------------------------------------------------------- 80
8.2 Types Of Data In Research------------------------------------------------------- 80
8.3 Collection Of Primary Data------------------------------------------------------ 80
8.4 Collection Of Secondary Data--------------------------------------------------- 90
8.5 Selection Of Data Collection Method------------------------------------------- 91
8.6 Summary---------------------------------------------------------------------------- 93
8.7 Self-Test Questions---------------------------------------------------------------- 94
8.8 Further Reading-------------------------------------------------------------------- 94
TOPIC NINE: DATA ANALYSIS AND INTERPRETATION
9.1 Topic Objectives------------------------------------------------------------------- 95
9.2 Getting Data Ready for Analysis------------------------------------------------ 95
9.3 Data Analysis and Interpretation------------------------------------------------ 99
Page 6 of 121
9.4 The Chi-Square Test------------------------------------------------------------- 101
9.5 Regression Analysis------------------------------------------------------------- 102
9.6 Case Study------------------------------------------------------------------------ 103
9.7 Summary------------------------------------------------------------------------- 114
9.8 Self-Test Questions------------------------------------------------------------- 115
9.9 9 Further Reading---------------------------------------------------------------- 115
APPENDIX 1: GUIDELINES FOR WRITING A RESEARCH PROPOSAL 116
Page 7 of 121
TOPIC ONE
INTRODUCTION TO RESEARCH METHODS
Research is conducted with a problem and policy goal in mind and is aimed at providing
a scientific explanation to a phenomenon.
Page 8 of 121
f) Enable prediction
g) Enable control
h) Develop a theory
Research can be classified based on the methods of data collection, methods of data
analysis and the purpose of the research. Consequently, research may be classified as:
Descriptive research includes surveys and fact-finding enquiries of different kinds. The
major purpose of descriptive research is description of the state of affairs as it
exists at present. It is Ex post facto research and its main characteristic is that the
researcher has no control over the variables; he can only report what has
happened or what is happening. Most ex post fact research projects are used for
descriptive studies in which the researcher seeks to measure such items as, for
example, frequency of shopping, preferences of people, or similar data and to
discover causes even when they cannot control the variables. The methods of
research utilized in descriptive research are survey methods of all kinds, including
comparative and correlational methods.
Analytical research, the researcher has to use facts or information already available, and
analyze these to make a critical evaluation of the material.
Conceptual research is that related to some abstract idea(s) or theory. Philosophers and
thinkers generally use it to develop new concepts or to reinterpret existing ones.
Page 9 of 121
Empirical/experimental research relies on experience or observation alone, often without
due regard for system and theory. It is data based research, coming up with
conclusions, which are capable of being verified, by observation or experiment.
In such a research it is necessary to get at facts firsthand, at their source, and
actively to go about doing certain things to stimulate the production of desired
information. In such a research, the researcher must first provide himself with a
working hypothesis or guess as to the probable results. He then works to get
enough facts (data) to prove or disprove his hypothesis. He then sets up
experimental designs which he thinks will manipulate the persons or the materials
concerned so as to bring forth the desired information. Such research is thus
characterized by the experimenters control over the variables under study and his
deliberate manipulation of one of them to study its effects. Empirical research is
appropriate when poof is sought that certain variables affect other variables in
some way.
Page 10 of 121
1.5 RESEARCH APPROACHES
Page 11 of 121
Once the relationships are established, predictions are made about the relationship
between the dependent and independent variable. Since it is difficult to control the
independent variable in human behaviour, the normative research method is
applied to study problems of social sciences. An example of a normative research
would be a study of the effect of a religion on the voting behaviour of a particular
society.
A historical research method uses data or information from the past to enable a
researcher to learn about causes, effects and trends of past events that may help
explain current events or phenomena as well as guide in predicting future events.
This method involves studying, recording and analyzing past data in historical
perspectives. Both qualitative and quantitative variables can be used in the
collection of historical information to individual cases. Researchers do not collect
new data but simply study data that is already available.
Research methods refer to all those methods that are used for conducting research i.e. the
methods used by the researcher during the course of studying the research problem.
Research methods can be put into the following three groups:
▪ Those methods, which are concerned with the collection of data. These methods
will be used where the data already available are not sufficient to arrive at the
required solution.
▪ Those statistical techniques, which are used for establishing relationships between
the data and the unknowns.
▪ Those methods that are used to evaluate the accuracy of the results obtained.
Research methods therefore refer to the behaviour and instruments used in selecting and
constructing research techniques. Research techniques refer to the behaviour and
instruments a researcher uses in performing research operations such as making
observations, recording data, techniques of processing data and the like. Methods are
therefore more general and they generate techniques.
Research methodology is a system of explicit rules and procedures upon which research
is based and against which claims for knowledge are evaluated. In it we study the various
steps that are generally adopted by a researcher in studying his research problem along
with the logic behind them.
Page 12 of 121
It is necessary for the researcher to design the methodology for a problem as the same
may differ from problem to problem. For example, an architect, who designs a building,
has to consciously evaluate the basis of his decision i.e. he has to evaluate why and on
what basis he selects particular size, number and location of doors, windows and
ventilators, uses particular materials and not others and the like. Similarly, in research
the scientist has to expose the research decisions to evaluation before they are
implemented. He has to specify very clearly and precisely what decisions he makes and
why he makes them so that others can evaluate them also.
Research methods do constitute a part of the research methodology. Thus, when we talk
of research methodology we not only talk of the research methods but also consider the
logic behind the methods we use in the context of our research study and explain why we
are using a particular method or technique and why we are not using others so that
research results are capable of being evaluated either by the researcher or by others
1. Formulation of the research problem: Here the researcher will ask a question to give
an answer to a previously unanswered or unsolved problem. E.g. what causes
inflation?
These propositions are tentatively accepted as basis for investigation whose aim is
to accept or reject them.
3. Research Design: The overall research activity should be well planned and logically
designed. The researcher should state the title of the research, purpose and problem
statement and articulates the method of conducting the research to achieve the
specified objectives.
4. Determination of the type of data to be collected: This involves identifying what kind
of data needed to provide the information required for a particular study. Depending
on the available human and material resources, a researcher may opt to use either
primary or secondary data. Primary data refers to data collected through field surveys
using questionnaires, interviews, etc. while secondary data refers to data that is
already collected by others and is available in published and unpublished materials.
Data that is collected and used for a research purpose may also be quantitative or
qualitative. Quantitative data is information expressed in numeral form and
measures, thus facilitative computation and accurate comparisons (e.g. kilograms,
Page 13 of 121
tons, etc). Qualitative data is information that describes the quality of a
phenomenon under investigation with terms such as good, bad, yes or no etc.
5. Collection of data: Once the researcher has decided on the type of data required for
the research, he/she will embark on the process of collecting the actual data needed
for the research. Using various techniques of data-collection instruments, the
researcher will collect the necessary data either from secondary sources (published
materials) or primary sources (field survey using questionnaires, interviews, fact
list etc).
6. Analyzing and processing the collected data: The researcher begins analyzing and
interpreting the data collected using different statistical methods of analysis. At this
stage, the data collected and analyzed will show whether the hypotheses proposed in
step two are to be accepted or rejected. In other words, at this state the researcher
makes the conclusion and determines whether there is a cause and effect relationship
between the dependent and independent variable or not.
Problem
Generalization
Hypothesis
Theory
Data Analysis
Research Design
Data
Collection Measurement
Adapted from Nachmias and Nachmias. (2004). Research Methods in the Social Sciences.
New York: St. Martin’s Press.
Page 14 of 121
1.9 QUALITIES OF A GOOD RESEARCH
a) Systematic: It should be structured with specified steps to be taken in a specified
sequence in accordance with the well-defined set rules. It involves critical thinking
but avoids guesswork and intuitions in arriving at conclusions.
b) Logical: It should be guided by the rules of logical reasoning and the logical
process of induction and deduction are of great value in carrying out research.
Induction is the process of reasoning from a part to the whole. Conclusion is drawn
from one or more particular facts or pieces of evidence. The conclusion explains the
fact and the fact support the conclusion.
Example, suppose you spend Shs. 2m on promotional campaign and sales do not
increase during and after the promotional campaign. Why did the sales not
increase? One conclusion is that promotional campaign was poorly executed.
This conclusion is only a hypothesis since there are other explanations for the
sales not increasing such as a strike.
d) Replicable: The study should be replicable in order to verify its research results
thereby building a sound basis for decisions. The research procedure used should be
described in sufficient details to permit another researcher to repeat the research for
further advancement, keeping the continuity of what has already been attained
Page 15 of 121
1.10 SUMMARY
The research process entails formulation of the problem, formulating hypotheses, determination
of the type of data and development of instruments for data collection, collection of relevant
data, data analysis and making of generalisations
Research is conducted with a problem and policy goal in mind and is aimed at providing a
scientific explanation to a phenomenon.
Research may be conducted for purposes of gaining familiarity with a phenomenon, accurate
description of a particular individual, situation or a group, determining the frequency with which
something occurs or with which it is associated with something else, testing a hypothesis of a
causal relationship between variables to enable prediction, control or development of a theory
A researcher can use qualitative or quantitative approach to study a problem. The methods
employed could be experimental, normative or historical
Research methodology refers to a system of explicit rules and procedures upon which research is
based and against which claims for knowledge are evaluated while research methods refer to all
those methods that are used for conducting research .Research methods do constitute a part of
the research methodology.
Page 16 of 121
1.11: SELF-TEST QUESTIONS
Page 17 of 121
TOPIC TWO
RESEARCH PROBLEM FORMULATION
Page 18 of 121
African Universities, GTZ, The World Bank, The IMF, USAID, among others.
(h) Researcher’s interest: Carefully observing existing practices in one’s area of interest
at work, at home, at workshops, seminars among others.
(i) Declarations (such as world declaration of the Rights of children), conferences (such
as NEPAD, AU), workshops and seminars
(j) Advanced graduate courses taught in Universities.
(i) Interest
Interest should be the most important consideration in selecting a research problem. A
research endeavour is usually time-consuming, and involves hard work and possibly
unforeseen problems. If you select a topic which does not greatly interest you, it could
become extremely difficult to sustain the required motivation, and hence its completion
as well as the amount of time taken could be affected.
(ii) Magnitude
One should have sufficient knowledge about the research problem to be able to visualize
the work involved in completing the proposed study. Narrow the topic down to
something manageable, specific and clear. It is extremely important to select a topic that
you can manage within the time and resources at your disposal. Even if you are
undertaking a descriptive study, you need to carefully consider its magnitude.
(v) Relevance
Select a topic that is of relevance to you as a professional. Ensure that your study adds to
the existing body of knowledge, bridges current gaps or is useful in policy formulation.
This will help you to sustain interest in the study.
Page 19 of 121
(vi) Availability of data
If your topic entails collection of information from secondary sources (office records,
client records, census or other already-published reports, etc) before finalizing your topic,
make sure that these data are available and in the format you want.
The following seven steps are followed when formulating a research problem
The following should also be noted when selecting the broad topic/area
1. One should avoid a topic calling for a background of knowledge that he/she does
not have.
2. The topic must be one that addresses a felt need
3. It should be in an area where it is possible to get enough material on to ensure
thorough coverage.
4. The selection of the topic should be justified on social, scientific or policy grounds.
Breaking down the topic into its subareas help to clarify the problem so that the
researcher can focus on a few questions.
Page 20 of 121
Example
Suppose the broad area selected is domestic violence. The various aspects/sub areas
would include:
(i) Extent and types of DV
(ii) Impact of DV on the family
(iii) Impact of DV on children
(iv) Services available to the victims of DV
(v) Effectiveness of the services provided to the victims of DV
(vi) Extent of DV in a community
(vii) Profile of families in which DV occurs
(viii) Profile of the victims of DV
(ix) Profile of the perpetrators
(x) Reasons for DV
Similarly, you can select any subject from other fields such as community health or
consumer research and go through this dissection process.
It is neither advisable nor feasible to study all subareas. Out of this list, a researcher will
select issues of sub areas about which he/she is passionate because the individual’s
interest is the most important determinant for selection.
NOTE
One way to decide what interests you most is to start with the process of elimination by
deleting all those subareas in which one is not very interested. Towards the end of the
process, it will become very difficult to delete anything further. Therefore the elimination
procedure continues until one is left with something that is manageable considering the
time available, level of expertise and other resources needed to undertake the study.
The subarea selected forms the basis of developing a topic for study. The selection of the
topic should be justified on social, scientific or policy grounds. Example: “The
determinants of demand for milk in Coast Province, Kenya”
There are two types of problems: that whose aim is to increase our knowledge and that
whose aim is to make our life better (Selamat 2008).
The research topic when stated should reflect three components: the independent
variable, the dependent variable and the population under study.
Page 21 of 121
The dependent variable is the problem variable while the independent variable is that
whose influence or relationship to the problem variable is to be established, and which if
effectively manipulated is highly likely to bring about desirable changes in the problem
variable.
At this step the researcher asks: ‘What is it that I want to find out about in this subarea
(topic)? Therefore, within the chosen subarea, one lists whatever questions for which
answers are being sought. If the questions thought of are too many to be manageable,
once again the researcher goes through a process of elimination, as in step 3.
The research questions should be identified from the research gaps and issues emanating
from the research problem. The questions should be in line with the objectives of the
study so that each of the questions asked will form an objective. Such question(s) may be
descriptive, relationship questions, or difference questions.
Objectives are the goals to attain in the study. They inform a reader of what the
researcher wants to achieve through the study, and should therefore be worded clearly
and specifically.
Page 22 of 121
Objectives should be stated under two headings:
(i) Main objectives
(ii) Sub objectives or specific objectives
The main objective is an overall statement of the thrust of the study. It is also a statement
of the main associations and relationships that one seeks to discover or establish.
The sub-objectives are the specific aspects of the topic that the researcher wants to
investigate within the main framework of your study. They should be worded clearly and
unambiguously. Each sub-objective contains only one aspect of the study. They should be
numbered using roman numbers.
NOTE
The way the main and sub-objectives are worded determines how your research is
classified (e.g. descriptive, correlational or experimental). Thus the wording of your
objectives determines the type of research design you need to adopt to achieve them.
Irrespective of the type of research, the objectives should be expressed in such a way that
the wording clearly, completely and specifically communicates to your readers your
intention. There is no place for ambiguity, non-specificity or incompleteness, either in
the wording of your objectives or in the ideas they communicate.
If the study is primarily descriptive, the main objective should clearly describe the major
focus of the study, even mentioning the organisation and its location unless these are to
be kept confidential. For example,
a) The main objective of this study is to describe the types of treatment programmes
provided by Neema clinic to alcoholics in Kawangware division OR
b) The main objective of this study is to find the opinion of the community about the
health services provided by Neema Clinic in Kawangware division
In the two examples, identification of the organisation and its location is important as the
services may be peculiar to the place and the organisation and may not represent the
services provided by others to similar populations.
If the study is correlational in nature, in mentioning the focus of the study, the
organisation and place, the wording of the main objective should include the main
variables being correlated. For example:
a) The main objective of the study is to to ascertain the impact of migration on
family roles OR
b) The main objective of the study is to compare the effectiveness of different
teaching methods on the comprehension of students”.
Page 23 of 121
If the overall thrust of the study is to test a hypothesis, the wording of main objectives, in
addition to the above, should indicate the direction of the relationship being tested. For
example “to ascertain if an increase in youth unemployment will increase the incidence
of street crime, OR to demonstrate that the provision of maternal and child health
services to the Maasai people in Mara will reduce infant mortality
In this step, the researcher examines the objectives to ascertain the feasibility of
achieving them through the research endeavour. The objectives are assessed by
considering the time, resources (financial and human) and technical expertise at ones
disposal.
Step 7: Double-check.
One should go back and give final consideration to whether or not he /she is sufficiently
interested in the study, and have adequate resources to undertake it. The question to ask
is: ‘Am I really enthusiastic about this study?’, and ‘Do I really have enough resources to
undertake it?’ If your answer to one of them is ‘no’, then re-assessment of the objectives
is required.
NOTE
Every study has a second element, the study population, from whom the required
information to find answers to the research questions is to be obtained. As the research
problem is narrowed down the researcher need to decide very specifically who constitutes
the study population, in order to select the appropriate respondents.
Page 24 of 121
- Be able to advance knowledge or be of interest to society.
- It should indicate the scope of the study and be objectively researchable.
- Give the purpose of the research
(a) Format I
To define a research problem, the topic is broken down into logical statements by
suggesting or enumerating reasons why a particular topic is chosen.
Illustration I
Suppose the problem identified is “youth unemployment in Kenya”
To guide writing the problem statement, the reasons for choosing the topic can be
enumerated
Note: These statements are then put together to provide a clear perspective of the
problem to be investigated and ensuring that the knowledge gap is clearly defined.
Illustration II
Suppose the problem identified is insect defoliation of forest trees
(i) Reasons for choosing this topic:
(a) Commercial value
(b) Tree damage
(c) Percent of species y in commercial forests of the region
(d) Recreation impact; and
Page 25 of 121
(e) Need for control measures
(b) Format II
Another technique to be used especially in policy-oriented research is to state a problem
using A but B statement where A represents a goal or current situation, and BUT indicates
that the goal has not been met or that there is some limitation on the current situation, and
B indicates the obstacles that are in the way.
Illustration
The government of Kenya has over the years stressed the importance of reducing youth
unemployment, BUT the rate of youth unemployment still remains high at 35% ---
There is need to prevent defoliation of large areas of tree species y by insect x in region
b, BUT, there are no effective and economical control measures for this insect.
NOTE
Statement of research purpose should come at the end of the problem statement and
should convey the focus of the study. It should be stated in clear, direct and unambiguous
manner. It is usually expressed in a declarative manner, starting with the phrase: The
purpose of the study is …., followed by neutral verbs such as, to investigate, to examine,
to compare, to explore, to find out, to inquire, to determine, among others.
The statement of purpose also suggests the design of the study by indicating the target
population, the variables, and possible relationships among the variables.
Page 26 of 121
2.6.3: Examples of Statements of Problem
Excerpt 1 Excerpt 2
Excerpt 4
Excerpt 3
Page 27 of 121
2.6.4 Examples of research objectives from the statements of Problem
The following excerpts 5, 6, 7 and 8 are examples of stated objectives from the problem
statements in excerpts 1, 2, 3 and 4 above
Excerpt 5 Excerpt 6
Page 28 of 121
2.7 SUMMARY
.
Problems in research are unresolved questions that call for an investigation
A research problem can be identified from personal experiences, gaps from pprevious
research, current social and political issues, literature from textbooks and articles on an area
of interest, deductions from theory, research themes from funding agencies, declarations and
topics in advanced graduate courses taught in Universities.
When choosing a research problem to study, one must consider own interest, magnitude of
work involved, measurement of concepts, level of expertise, relevance, availability of data
and ethical issues
The steps in formulation of research problem starts from identification of a broad area and
then breaking the broad area into subareas from which a given subarea provides aspect to be
studied. From the selected sub area, research questions and research objectives are
formulated
The research questions are the specific issues in the selected subareas for which answers are
to be provided. Objectives are the goals to be achieved by the study
Each question and objective in research must address only on aspect of the study and must be
clearly and completely stated
The research problem should be clearly defined by describing the problem context, revealing
the knowledge gap and stating the purpose for which the study is to be conducted
Page 29 of 121
2.8 SELF-TEST QUESTIONS
1. State the important points an individual should consider when choosing a topic for
research
2. Explain the steps in research problem formulation
3. Explain the four important aspects to be covered in the content of statement of the
problem
4. Distinguish between main research objective and the specific objectives
2.10 WORK TO DO
Page 30 of 121
TOPIC THREE
LITERATURE REVIEW
Therefore, literature review allows presentation of one’s case more convincingly and
demonstrates how one’s findings would enlarge, modify, or even reject existing
knowledge.
Page 31 of 121
(iv) Research papers written by other scholars including unpublished Graduate
research papers, theses
This involves reviewing the existing theories in the chosen area of study
(i) Theory defines the orientation or perspective of the discipline. It defines the kind of
data that are to be abstracted. That is, it narrows down the kind (range) of facts to be
studied in that discipline. This is because each discipline looks at a phenomenon
differently.
(ii) It helps in conceptualization and classification: If knowledge is to be organized
there must be a system imposed on the facts to be studied. Major task of each
discipline is development of system of classification and structuring of ideas and
precise sets of definitions for these terms. Theory thus offers a conceptual scheme
by which the relevant phenomena are systematized, classified and interrelated.
(iii)It serves as a summary of facts: Theory summaries facts into empirical
generalizations and systems of relationships. It thus provides what is already known
about the object of study and facilitates facts being seen in a framework rather than
in isolation. In this respect, theory serves as a very crucial platform when we wish to
communicate accurately or to explain complex ideas
(iv) Theory summarizes facts and states a general uniformity beyond the immediate
observations. Therefore it helps to predict facts. It facilitates extrapolations from the
known to the unknowns. It also identifies what facts to be expected and therefore
what data to observe.
(v) It points to knowledge gaps. Since it summarizes the known facts and predicts those
yet to be observed, it may point also to areas that have not been explored. This is
because a theory is developed based on assumptions and premises which delineate
some conditions. For example in Economics, we assume that consumer preferences
are exogenous (taken as given) but preferences can be influenced by education,
propaganda, advertisement and other factors.
Page 32 of 121
3.3.2 Empirical Literature
These refer to literature on studies that have been conducted by other scholars based on
real life observations (facts). Only studies that have examined similar problem to the one
being studied should be reviewed.
Empirical Literature often provide information that is useful in assessing the relevance of
theory to real life observations, and therefore enable acceptance, modifications and
extensions to existing theories. In areas where there is no existing theory, they provide
information that could lead to generalisations and therefore build up to theory.
Reviewing empirical literature helps an individual researcher to be kept abreast with what
others have investigated in the given topic (area of study); methodologies used in
studying the problem and what the findings of the studies are. This will enable the
researcher to refine the problem (Isolate the Knowledge gap) as well as the methodology
of study.
(i) Review literature in past tense since these are studies already done and are in the
public domain.
(ii) Review literature chronologically starting from the earliest and building the case to
the most recent ones.
(iii)When reviewing literature, highlight the weaknesses and strengths of the past studies
and how your study deviates from these studies
(iv) Avoid personalising the studies by using words such as “we”, “us” “our”, “I” “my”
“her”, “his” among others.
(v) Cite or quote the surname of the author and the year only in the main text of the
proposal/thesis. Use the APA or the Harvard style of referencing, But be consistent
in style chosen. Example of direct quotes (citations) in the text;
Page 33 of 121
a. According to Varian (1978), demand is -------- OR Republic of Kenya (2002)
asserted that -------
OR
OR
NOTE
All the works cited in the text must be listed in the references at the end of the proposal
or thesis
Page 34 of 121
- There are inconsistencies in previous research
- There are omissions or bias in published studies
- Further testing of research findings is needed
- Evidence is unavailable, inconclusive, limited or contradictory
3.6 SUMMARY
Literature review involves a study of the existing literature on the topic and relates it to the
research problem. Extensive literature review literature review allows presentation of one’s
case more convincingly and demonstrates how one’s findings would enlarge, modify, or even
reject existing knowledge.
The literature reviewed could be from Textbooks, Journal articles, Conference Papers, and
Research papers written by other scholars including unpublished Graduate research papers,
theses
The researcher should review relevant theories and empirical studies in the subject area
.
Review of theory offers a conceptual scheme by which the relevant phenomena under study
are systematized, classified and interrelated.
Empirical literature review enables the researcher to explore the extent to which theory has
explained phenomena in past studies, and the various methodologies used in studying the
problem. Thus empirical literature enables the researcher to know what others have
investigated in the topic area, and their findings, which is useful for refining the problem as
well as the methodology of study
All reviewed work must be cited in the documents written and also be fully described in the
list of references
Page 35 of 121
3.7 SELF-TEST QUESTIONS
1. What are the benefits of doing an extensive literature review before undertaking research
2. What aspects of literature should you focus on when conducting the reviews
3. How would you cite materials reviewed in the study report
3.8 WORK TO DO
1. Identify the difference in presentation of text books, journal articles, conference papers,
unpublished research papers in the listing of references
2. Consider the topic you have selected for your research project
(i) Identify relevant theories in the analysis of the problem to be studied
(ii) Identify relevant empirical studies in the selected topic
(iii) Conduct a thorough theoretical and empirical literature review using the guidelines
provided in this topic
(iv) Write the chapter on literature review
Page 36 of 121
TOPIC FOUR
Concepts are mental images or perceptions. Concepts are subjective impressions and
therefore their understanding may differ from person to person. If they are measured,
there would be problems in comparing responses. A variable on the other hand is an
image, perception or concept that is capable of being measured, hence capable of taking
on different values. That is, variables are measurable with varying degrees of accuracy.
Table 4.1 presents examples of concepts variables.
Concept Variable
• Sex (male/female)
• Effectiveness • Attitude
• Satisfaction • Age (x years, y months)
• Impact • Income (ksh - per year)
• Excellent • Weight ( - Kg)
• High achiever • Height (- cm)
• Self-esteem • Religion (Catholic, Protestant, Jew,
• Rich Muslim)
NB: In research work it is important for the concepts to be converted into variables. To
do this, one starts with identifying the concept’s indicators (the set of criteria reflective of
the concept). These can then be converted into variables as illustrated in table 4.2)
Page 37 of 121
Table 4.2 Converting concepts into variables - Examples
Page 38 of 121
4.3: TYPES OF VARIABLES
Types of variable
Page 39 of 121
The classification developed and presented in figure 4.1 results from looking at variables in
three different ways:
(i) The causal relationship
(ii) The design of the study; and
(iii) the unit of measurement
In a study that attempt to investigate a causal relationship of association, four sets of variables
may operate.
1. Change (independent) variables, which are responsible for bringing about change in a
phenomenon;
2. Outcome (dependent) variables, which are the effects of a change variable
3. Variables which affect the link between cause-and-effect variables (extraneous variables)
4. Connecting or linking variables, which in certain situations are necessary to complete
the relationship between cause-and-effect variables.
Connecting or linking
Variables (4)
Cause Effect
Independent variable – the cause supposed to be responsible for bringing about change(s) in a
phenomenon or situation
Extraneous variable – several other factors operating in a real-life situation may affect
changes in the dependent variable. These factors, not measured in the study, may increase or
decrease the magnitude or strength of the relationship between independent and dependent
variables.
Intervening variable – sometimes called the confounding variable, links the independent and
dependent variables. In certain situations the relationship between an independent and a
dependent variable cannot be established without the intervention of another variable. The
cause variable will have the assumed effect only in the presence of an intervening variable.
Page 40 of 121
Illustration 4.1
Suppose you want to study the relationship between smoking and cancer. You assume that
smoking is a cause of cancer. Studies have shown that there are many factors affecting the
extent to which smoking might cause cancer. They include
▪ The number of cigarettes or the amount of tobacco smoked every day
▪ The duration of smoking
▪ The age of the smoker
▪ Dietary habits
▪ The amount of exercise undertaken by the individual
These variables may either increase or decrease the magnitude of the relationship.
In the above example smoking is the independent variable, cancer is the dependent variable
and all the variables that might affect this relationship, either positively or negatively, are
extraneous variables.
Smoking Cancer
Illustration 4.2
Suppose you want to study the relationship between fertility and mortality. There is no direct
relationship between fertility and mortality. With the reduction in mortality, fertility will
decline only if people attempt to limit their family size. It is thus the intervention of
contraceptive methods that completes the relationship: the greater the use of contraceptives,
the greater the decline in the fertility level. The extent of the use of contraceptives is also
affected by a number of other factors, for example, attitudes towards contraception, level of
education, socioeconomic status and age, religion, and provision and quality of health
services. These are classified as extraneous variables.
Page 41 of 121
Mortality The extent of the use of Fertility
contraceptives
Independent Dependent
variable Intervening variables variable
In figure 4.4 mortality level is the independent variable and fertility is the dependent variable.
But this relationship will be completed only if another variable intervenes – that is, the use of
contraceptives. A reduction in mortality (especially child mortality) increases family size, and
an increase in family size creates a number of social, economic and psychological pressures
on families, which in turn create attitudes favorable to a smaller family size. This change in
attitudes is eventually operationalised in behavior through the adoption of contraceptives. If
people do not adopt methods of contraception, a change in mortality levels will not be
reflected in fertility.
In this causal model, the fertility level is the dependent variable, the extent of contraceptive
use is the intervening variable, the mortality level is the independent variable, and the
unmeasured variables such as attitudes, education, age, religion, the quality of services, and
so on are all extraneous variables. Without the intervening variable the relationship between
the independent and dependent variables will not be complete.
On the basis of the design of the study variables are classified as active and attribute variables
(a) Active variables – those variables that can be manipulated, changed or controlled
(b) Attribute variables – those variables that cannot be manipulated, changed or controlled,
and that reflects the characteristics of the study population; for example, age, gender,
education and income.
Page 42 of 121
Illustration 4.3
Suppose a study is designed to measure the relative effectiveness of three teaching models
(Model A, Model B and Model C). The structure and contents of these models could vary
and any model might be tested on any population group. The contents, structure and
testability of a model on a population group may also vary from researcher to researcher.
On the other hand, a researcher does not have any control over characteristics of the student
population such as their age, gender, or motivation to study. These characteristics of the
study population are called attribute variables. However, a researcher does have the ability to
control and/or change the teaching models. She/he can decide what constitutes a teaching
model and on which group of the student population it should be tested.
Page 43 of 121
(ii) The ordinal or ranking scale
Besides categorizing individuals, objects, responses or a property into subgroups on the basis
of a common characteristic (nominal scale), it ranks the subgroups in a certain order. They
are arranged either in ascending or descending according to the extent a subcategory reflects
the magnitude of variation in the variable. For example, income can be measured either
‘above average’, ‘average’ and ‘below average’. (These categories can also be developed on
the basis of quantitative measures, for example below $10,000 = below average, $10,000 -
$25,000 = average and above $25,000 = above average). The subcategory ‘above average’
indicates that people so grouped have more income than people in the ‘average’ category, and
people in the ‘average’ category have more income than those in the ‘below average’
category. These subcategories of income are related to one another in terms of the magnitude
of people’s income, but the magnitude itself is not quantifiable, and hence the difference
between ‘above average’ and ‘average’ or between ‘average’ and ‘below average’
subcategories cannot be ascertained.
Subcategories are arranged in the order of the magnitude of the property/characteristic. Also,
the ‘distance’ between the subcategories is not equal as there is no quantitative unit of
measurement.
Celsius and Fahrenheit scales are examples of the interval scale. In the Celsius system the
starting point (considered a freezing point) is zero and the terminating point (considered as
boiling point) is 100OC. The gap between freezing and boiling points is divided into 100
equally spaced intervals, known as degrees. In the Fahrenheit system the freezing point is
32OF and the boiling point is 212OF, and the gap between the two points is divided into 180
equally spaced intervals. Each degree or interval is a measurement of temperature – the
higher the degree, the higher the temperature. As the starting and terminating points are
arbitrary, they are not absolute; that is, you cannot say that 600C is twice as hot as 300C or
300F is three times hotter than 100F. This means that while no mathematical operation can be
performed on the readings, it can be performed on the differences between readings. For
example, if the difference in temperature between two objects, A and B, is 150C and the
difference in temperature between two other objects, C and D, is 450C, you can say that the
difference in temperature between C and D is three times greater than between A and B. An
attitude towards an issue measured on the Thurston scale is similar. However, the Likert scale
does not measure the absolute intensity of the attitude but simply measures it in relation to
another person.
The interval scale is relative; that is, it plots the position of individuals or responses in
relation to one another with respect to the magnitude of the measurement variable. Hence, an
interval scale has all the properties of an ordinal scale, plus it has a unit of measurement with
an arbitrary starting and terminating point. Therefore, it is relative in nature. It helps to place
individuals or responses in relation to each other with respect to the magnitude of the
Page 44 of 121
measuring variable. As it is a relative scale no mathematical operations can be performed on
its readings.
Page 45 of 121
(b) Classifications of Variables
Variables in research can be measured using the different scales and from the viewpoint of
the unit of measurement, there are two ways of categorizing variables:
(i) Whether the unit of measurement is categorical (as in nominal and ordinal scales) or
continuous in nature (as in interval and ratio scales)
(ii) Whether it is qualitative (as in nominal and ordinal scales) or quantitative in nature (as in
interval and ratio scales)
The variables thus classified are called categorical and continuous, and qualitative and
quantitative. Categorical variables are measured on nominal or ordinal measurement scales,
whereas for continuous variables the measurements are made either on an interval or a ratio
scale.
Constant variable: a variable that has only one value or category, for example taxi, tree or
water.
Dichotomous variable: a variable that has only two categories as in yes/no, good/bad and
rich/poor.
Polytomous variable: a variable that can be divided into more than two categories, for
example: religion (Christian, Muslim, Hindu); political parties (Labor, Liberal,
Democrat); and attitudes (strongly favourable, uncertain, unfavourable, strongly
unfavourable).
Continuous variable: has continuity in their measurement; for example, age, income and
attitude score. They can take on any value on the scale on which they are measured.
Age can be measured in years, months and days. Income can be measured in shillings
and cents.
Qualitative variables are similar to categorical variables as both use either nominal or ordinal
measurement scales. However, there are some differences. For example, it is possible to
develop categories on the basis of measurements made on a continuous scale, such as
measuring the income of a population in shillings and cents and then developing
categories such as ‘low’, ‘middle’ and ‘high’ income. The measurement of income in
shillings and cents is classified as the measurement on a continuous variable, whereas its
subjective measurement in categories such as ‘low’, ‘middle’ and ‘high’ groups is a
qualitative variable.
Page 46 of 121
Table 4.4: Examples of Categorical/continuous and quantitative/qualitative variables
NOTE
The way a variable is measured determines the type of analysis that can be performed, the
statistical procedures that can be applied to the data, the way the data can be interpreted and
the findings that can be communicated. The way you measure the variables in the study also
determines whether a study is ‘qualitative’ or ‘quantitative’ in nature.
Page 47 of 121
4.4 SUMMARY
In research, concepts need to be converted into variable by identifying the indicators of the
concept which are then converted to variables.
Variables are classified on the basis of the causal relationship, the design of the study or the
unit of measurement. According to the causal relationship variables are independent,
dependent, extraneous or linking variables. On the basis of the design of study, variables are
either active or attribute variables, while on the basis of unit of measurement variables are
categorical or continuous, quantitative or qualitative
The way a variable is measured determines the type of analysis that can be performed, the
statistical procedures that can be applied to the data, the way the data can be interpreted and
the findings communicated. It also determines whether a study is ‘qualitative’ or
‘quantitative’ in nature.
Page 48 of 121
4.6: FURTHER READING
Page 49 of 121
TOPIC FIVE
CONCEPTUAL FRAMEWORK AND HYPOTHESIS FORMULATION
Page 50 of 121
For use within the confines of economics and economic research, we can define concept as a
logical, mental construction of one or more relationships. Several attributes of this definition
are important. It is purely mental, is logical, and can be described; it has been reasoned
through sufficiently and presented with clarity. As such, a concept is inherently abstract
(takes some things as given or assumed). In economics, concepts typically focus on
relationships; they may be basic relationships among variables or more complex systems of
relationships. Also, the relationships are casual in nature. That is, the relationships intend to
explain how or why some things result in (cause) other things. Thus in economics research,
the conceptual framework is a conceptual analysis through the problem to all hypotheses
relevant to the problem. (Williams, 1984)
NOTE
The conceptualization is directed toward the problem, not the objectives or the methods. It is
purely conceptual, that is, without regard for empirical evidence or data. Its primary function
is to lead to and justify meaningful hypotheses that are, in turn, subject to testing (verification
or rejection).
Most of us conceptualize as part of our everyday living, although we may not be aware of it,
and therefore the process is probably not analytically sophisticated.
Illustration 5.2
Suppose you entered your car in the morning only to find that it will not start. With
presentation of the problem, the conceptualizing process begins. As a driver, you will begin
to reason concerning causes and possible solutions. Since the starter works and the engine
turns over, the problem is neither the battery nor the starter. You might continue reasoning in
this manner and, in the process, narrow both the possible causes and alternative solutions.
The conceptual framework may be viewed as an analysis of the research problems(s) using
theory. The theory will probably include economic theory if the research problem is defined
by an economist, but it may not necessarily exclude other theory. Theoretical considerations
from other disciplines may be appropriate to the conceptual economic analysis. It is most
often a process of identifying the appropriate economic and other theory or concepts that are
germane to the analysis of the problem, then applying them in a conceptual analysis of that
specific problem. Thus the central focus of the conceptual analysis is the main problem issue
identified in the specific problem statement. If the theory is not already developed, the task
may involve formulating theory or refining or modifying existing theory. Theoretical
formulations or refinements may be more likely to be required when the research is more
oriented to disciplinary matters, as opposed to a heavier emphasis on problem solving or
subject-matter research.
Although problem-oriented, the conceptual framework often results in other benefits. One is
that it may provide a theoretical link between the objectives and the methods and procedures.
Page 51 of 121
The conceptual framework is related to the objectives because the problem leads directly to
the objectives. The conceptual analysis thus may help identify relationships, or types of
relationships, that are needed to achieve the objectives. It is common for the conceptual
framework to point to relevant variables within relationships as well.
Williams (1984) characterizes the conceptual framework as an organized “think piece” that,
in its analysis of the problem, may include the logic of:
(i) Sources of the problem. This may address conditions, circumstances, policies, and
practices etc. that cause the problem.
(ii) Alternative solutions to the problem
(iii)Identification of variables relevant to the analysis of the problem
(iv) Conceptualized relationships in a system to analyze the problem
(v) Hypotheses to be tested about results of analysis on the problem.
Page 52 of 121
limited but replenishable natural resources is relevant to the conceptual analysis of the
problem. The theory would be adapted to a particular natural resource (water and
specifically the water of the Nile) in a particular place (Egypt or its subdivisions) in its
various uses (agriculture, domestic, industrial). That is, we are not interested in restating
the general concepts of resource use efficiency but in understanding the dimensions of
the problem of allocation and use efficiency in Egypt. Application and adaptation of the
general theory to that situation can enhance our grasp of the problem. The conceptual
analysis of the researchable problem may give insight into how to achieve the objectives
of the study as well.
In the same example, it is also likely that Egyptian policies will need to be considered in
conceptualizing the nature of the problem. That is, one may need to superimpose water
policy conditions or constraints on the theoretical construction of water use efficiency to
see how one would expect those conditions to affect efficiency. Egypt’s policy of free
water for agricultural use affects water use efficiency and may need to be examined or
explained conceptually. This also would be part of the conceptual framework.
Thus each research problem will employ a different set of theories but each adapts well
understood and documented concepts to its research problem, giving clarity to the
understanding of the problem.
When theoretical refinements are introduced, they are primarily as marginal improvements in
the existing theory or adaptation of concepts from other subject matter areas to a new one.
NOTE:
The conceptual framework is not a test of the researcher’s knowledge of existing economic
theory, however. It is not the mathematical derivation of a research method that is planned to
be used in a study unless the intent of the research is to develop a new method. It is not the
explanation of the logic of an econometric model. It is not the presentation of the standard
economic theory that may be applicable to a given study. Yet the conceptual framework may
be, and often is, related to these other activities. The relatedness to methods and objectives is,
however, through the side effects of the conceptual framework, not as its main purpose or its
focus of attention. Diversion of attention to the side effects rather than its functions can lead
one to overlook or misunderstand important dimensions of the research problem.
Page 53 of 121
5.3 HYPOTHESES
Not all research problems may have a formally stated hypothesis. For example in studies,
which are essentially exploratory or in an area of knowledge that has very little previous
research, it might not be possible to formulate any reasonable hypothesis. In such situations,
the researcher may omit the hypothesis and the study is guided by the stated objectives.
A study can have one or several hypotheses. Considerable thought should go into formulating
research hypothesis as they provide direction of the study and guide the collection and
analysis of data. Furthermore, testing the hypotheses forms the basis of making conclusions
from the study. A researcher who understands the facts that are related to a problem is more
likely to suggest a good hypothesis.
Hypothesis should be consistent with common sense, or should be based on a sound rationale
derived from a theory, previous research, or professional experience. They should be testable
within a reasonable time.
Problem statements and hypotheses are similar in substance, except that hypotheses are
declarative statements, clearly testable, and indicative of the expected results. Most
hypotheses can be put into an if-then-logic to indicate the relationship between variables.
An example of research hypothesis, “Low reading students in a remedial reading course will
achieve higher reading comprehension than comparable students in a regular English
course. “Thus,” if a remedial course is taken, then higher reading comprehension can be
achieved.
Page 54 of 121
is called a null hypothesis.
(c) A good hypothesis should be capable of being answered a “yes” and “no” or may
be”. In formal terms, it is capable of being accepted, rejected or not rejected.
In summary, the null hypotheses should be derived from the specific objectives of the study
or from research questions to be answered. They should be stated in behavioral terms/
measurable/ testable terms. Briefly highlight the reason(s) why you think the relationship
between the dependent variable and the independent variable is stated as either positive or not
negative.
A hypothesis is important in terms of bringing clarity to the research problem. It serves the
following functions in research
(i) The formulation of a hypothesis provides a study with focus. It tells you what
specific aspects of a research problem to investigate.
(ii) A hypothesis tells you what data to collect and what not to collect, thereby providing
focus to the study.
(iii)The construction of a hypothesis enhances objectivity in a study.
(iv) A hypothesis may enable a researcher to add to the formulation of theory. It enables a
researcher to specifically conclude what is true or what is false.
Page 55 of 121
5.3.4 Errors in hypothesis testing
Incorrect conclusions about the validity of a hypothesis may be drawn if:
(i) The study design selected is faulty
(ii) The sampling procedure adopted is faulty
(iii)The method of data collection is inaccurate
(iv) The analysis is wrong
(v) The statistical procedures applied are inappropriate; or
(vi) The conclusions drawn are incorrect.
The figure shows the types of errors that can result in the testing of a hypothesis.
True False
When your decision is to:
Hence in drawing conclusions about a hypothesis, two types of error can occur:
• Rejection of a null hypothesis when it is true. This is known as a Type I error.
• Acceptance of a null hypothesis when it is false. This is known as a Type II error.
Page 56 of 121
5.4 SUMMARY
The conceptual framework is an integral component of framing the research problem and a
clear statement of the problem does not occur until the conceptual framework is developed. It
thus complements research problem description, statement of objectives, and literature
review
The conceptual framework is therefore a conceptual analysis through the problem to all
hypotheses relevant to the problem. Although problem-oriented, the conceptual framework
often provides a theoretical link between the objectives and the methods and procedures. The
conceptual analysis thus may help identify relationships, or types of relationships, that are
needed to achieve the objectives. It will also point to relevant variables within relationships
as well.
To develop the conceptual framework in a given study, one starts by looking to the relevant
economic and other theories that could provide insight into understanding the problem of
study. The researcher will then apply that theory to the specific problem. This involves
finding the linkages between the “ideal types” in theory and the “real types” that exist in
actuality.
Hypotheses bring clarity to the research problem. In research hypothesis provides the study
with a focus telling what specific aspects of a research problem one is to investigate, what
data to collect and what not to collect. Thus the construction of a hypothesis enhances
objectivity in a study
A well stated hypothesis should be capable of being expressed as a question and should be
capable of being answered by a “yes” and “no” or may be”. That is, it is capable of being
accepted, rejected or not rejected.
The hypotheses are derived from the specific objectives of the study or from research
questions to be answered and stated in behavioral / measurable/ testable terms. Testing the
hypotheses forms the basis of making conclusions from the study.
When drawing conclusions about a hypothesis, two types of error can occur. The rejection of
a null hypothesis when it is true (Type I error) or the failure to reject null hypothesis when it
is false (Type II error). Such incorrect conclusions may be due to faulty study design, faulty
sampling procedure, inaccurate method of data collection, wrong analysis due to
inappropriate statistical procedures or drawing of wrong conclusions from results.
Page 57 of 121
5.5 SELF-TEST QUESTIONS
1. Distinguish between
(i) Research question and Hypothesis
(ii) Null hypothesis and alternative hypothesis
(iii) Type I error and Type II error
(iv) Conceptual framework and an empirical model
2. Explain the procedure of conducting a hypothesis testing research
3. How can a researcher avoid wrong conclusions in hypothesis testing?
4. Consider the problem of Investigation you have selected in this course. Develop a
conceptual framework and construct the hypotheses that will be tested
Page 58 of 121
TOPIC SIX
RESEARCH DESIGNS
The research design is an umbrella word and can be split into the following:
(i) Sampling design: deals with the method of selecting items to be observed in the study.
(ii) Observational design: relates to conditions under which the observations are made.
(iii)Statistical design: concerns with the question of how many items are to be observed
and how the information and data gathered are to be analyzed.
(iv) Operational design: deals with the techniques by which the procedures specified in the
three categories above can be carried out.
The following questions guide the selection of a research design for a study
- What is the study about?
- Why is the study being made?
- Where will the study be carried out?
- What type of data is required?
- Where can the required data be found?
- What periods of time will the study include?
- What will be the sample design?
- What techniques of data collection will be used?
- How will data be analyzed?
- In what style will the report be prepared?
Page 59 of 121
6.3 TYPES OF RESEARCH DESIGNS
There are different research designs depending on the type of research as follows
A. Exploratory / formulative research studies: emphasize discovery of ideas and insights.
Hence uses survey of relevant literature to build upon the work of others; experience
survey of people who have had practical experience with the problem; analysis of
insight-stimulating examples usually used where there is little experience to serve as a
guide. It uses existing records and unstructured interviewing among other methods
and is flexible in design.
B. Descriptive research studies: are concerned with describing the characteristics of a
particular individual or a group. Studies concerned with specific predictions, with
narration of facts and characteristics concerning individual, group or situation.
C. Diagnostic research studies: determine the frequency with which something occurs or
its association with something else. For example studies concerning whether certain
variables are associated.
D. Hypothesis testing studies: These studies are known as experimental studies. The
researcher tests the hypothesis of causal relationships between variables. They require
procedures that will not only reduce bias and increase reliability, but will permit
drawing of inferences about causality.
In both descriptive and diagnostic research studies the researcher defines clearly what she
wants to measure, and find adequate methods of measuring it along with a clear-cut definition
of the population she wants to study. The research design must make enough provision for
protection against bias and must maximize reliability with due concern for the economical
completion of the study. The design should be rigid and focus on:
(i) Formulating the objectives
(ii) Designing the methods of data collection
(iii)Selecting the sample
(iv) Collecting data
(v) Processing and analyzing data
(vi) Reporting the findings
The table 6.1 summarizes the differences between in research designs in exploratory studies
and descriptive/diagnostic studies
Page 60 of 121
Table 6.1: Differences in research design between exploratory and descriptive or
diagnostic studies
Page 61 of 121
6.5 SUMMARY
Research design refers to the way a study is planned and conducted, the procedures, and
techniques employed to address the research problem or question. The main objective of a
research design is to enhance validity of research findings by controlling potential sources of
bias that may distort findings.
Research design is an umbrella word and can be split into Sampling design, Observational
design, Statistical design and the Operational design. Description of the overall research
design to be adopted by a study must specifically describe the various components.
The choice of the research design in a particular study will be guided by what the study is
about, Why the study is being made, Where it will be carried out, the type of data required
and where it be found, the periods of time the study will include, the sapling design,
techniques of data collection and data analysis that be used
The chosen research design will depend on whether the research is exploratory, descriptive
Diagnostic, or hypothesis testing research. The design will be more rigid in descriptive and
6.6diagnostic
SELF-TEST QUESTIONS
research studies than in exploratory studies. The sampling, observational,
statistical and operational designs will also vary.
.
.
1. Explain the following
(i) Control group
(ii) experimental group
(iii)confounding relationship
2. Describe the various components of the overall research design
3. Consider the topic of your research in this course, describe the research design that will
be used in studying the identified problem
Page 62 of 121
6.7 FURTHER READING
Page 63 of 121
SAMPLING DESIGN
7.2.1 Census
All items in any field of inquiry constitute a ‘Universe’ or ‘Population’. A complete
enumeration of all items in the ‘population’ is known as a census inquiry. It can be presumed
that in such an inquiry, when all items are covered, no element of chance is left and highest
accuracy is obtained.
Demerits of census:
(i) There is no way of checking the element of bias or its extent except through a
resurvey or use of sample checks.
(ii) Involves a great deal of time, money and energy.
(iii)At times, this method is practically beyond the reach of ordinary researchers.
(iv) Sometimes it is not possible to examine every item in the population
(v) Sometimes it is possible to obtain sufficiently accurate results by studying only a part
of total population. In such cases there is no utility of census surveys.
(vi) If the study involves destruction of the elementary units, then studying the entire
population will mean destroying the elementary units
However, it needs to be emphasized that when the universe is a small one, it is no use
resorting to a sample survey.
7.2.2 Sampling
When field studies are undertaken in practical life, considerations of time and cost almost
invariably lead to a selection of respondents i.e. selection of only a few items. The
informants selected should be as representative of the total population as possible in order to
produce a miniature cross-section. The selected respondents constitute what is called a
sample and the selection process is known as a sampling technique. The survey so conducted
is known as sample survey.
Page 64 of 121
Algebraically, if we let the population size to be N, if a part of size n (where n < N) of this
population is selected according to some rule, for studying some characteristic of the
population, then the group consisting of these n units is known as sample.
The researcher must pay attention to the following steps when sampling:
(i) Type of universe: Define the set of objects i.e. the Universe, to be studied. The
universe can be finite or infinite. In finite universe, the number of items is certain.
For example the number of workers in a business organization. But in case of an
infinite universe, the number of items is infinite, i.e, no idea about the total number of
items. For example, the number of listeners of a specific radio programme.
(ii) Sampling unit: A decision has to be taken concerning a sampling unit before
selecting sample. Sampling unit/unit of analysis may be a geographical one such as
state, district, village, or a construction unit such as a house, flat, or it may be a social
unit such as family, club, school, or it may be an individual. The researcher will have
to decide one or more of such units that she has to select for her study.
(iii) Source list: Sampling frame from which sample is to be drawn. It contains the names
of all items of a universe (in case of finite universe). If source list is not available,
researcher has to prepare it. Such a list should be comprehensive, correct, reliable and
appropriate. It is extremely important for the source list to be as representative of the
population as possible.
(iv) Size of sample: Refers to the number of items to be selected from the universe to
constitute a sample. The size of sample should neither be excessively large, nor too
small. It should be optimum (one which fulfills the requirements of efficiency,
representativeness, reliability and flexibility). While deciding the size of a sample,
the researcher should determine the desired precision at an acceptable confidence
Page 65 of 121
level for the estimate. The size of population variance needs to be considered as in
case of larger variance usually a bigger sample is needed. The size of population
should be kept in view for this also limits the sample size. The parameters of interest
in a research study should be kept in view, while deciding the size of the sample.
Costs too dictate the size of sample that can be drawn. As such, budgetary constraint
must be taken into consideration when the sample size is decided.
(v) Parameters of Interest: In determining the sample design, one must consider the
question of the specific population parameters, which are of interest. For instance,
you may be interested in knowing some average or the other measure concerning the
population. There may also be important sub-groups in the population about whom
you would like to make estimates.
(vi) Budgetary constraint: Cost considerations have a major impact upon decisions
relating to not only the size of the sample but also to the type of sample. This fact can
even lead to the use of a non-probability sample.
(vii) Sampling procedure: Finally, the researcher should decide the type of sample she will
use i.e., she must decide about the technique to be used in selecting the items for the
sample. In fact, this technique or procedure stands for the sample design itself. She
should select that sampling design which, for a given sample size and for a given cost,
has a smaller sampling error.
The researcher must consider the costs involved in a sampling analysis. The costs include:
2. The cost of collecting the data (it must be possible to obtain information from sample
selected given the available resources)
3. The cost of an incorrect inference resulting from the data. There are two causes of
incorrect inferences.
▪ Systematic bias and
▪ Sampling error
This results from errors in the sampling procedures. It cannot be reduced or eliminated by
increasing the sample size. At best the causes responsible for these errors can be detected and
corrected.
Page 66 of 121
(v) Defective measuring device: In survey work, systematic bias can result if the
questionnaire or the interview is biased. Similarly, if the physical measuring device is
defective, there will be systematic bias in the data collected through such a measuring
device.
(vi) Non-respondents: If we are unable to sample all the individuals initially included in the
sample. The reason is that in such a situation, the likelihood of establishing contact or
receiving a response from an individual is often correlated with the measure of what is
to be estimated.
(vii) Indeterminacy principle: Individuals act differently when kept under observation than
what they do when kept in non-observed situations. For instance, if workers are aware
that somebody is observing them in course of a work study on the basis of which the
average length of time to complete a task will be determined and accordingly the quota
will be set for piece work, they generally tend to work slowly in comparison to the
speed with which they work if kept unobserved.
(viii) Natural bias in the reporting of data: For example, people in general understate their
incomes if asked about it for tax purposes, but they overstate the same if asked for social
status of their affluence. Generally in psychological surveys, people tend to give what
they think is the ‘correct’ answer rather than revealing their true feelings.
These are the random variations in the sample estimates around the true population
parameters. Since they occur randomly and are equally likely to be in either direction, their
nature happens to be of compensatory type and the expected value of such errors happens to
be equal to zero. Sampling error decreases with the increase in the size of the sample, and it
is of a smaller magnitude in the case of homogeneous population.
Sampling error can be measured for a given sample design and size. The measurement of
sampling error is usually called the precision of the sampling plan. If the sample size is
increased, the precision is improved. However, increasing the size of the sample has its own
limitations: it increases the cost of collecting data and enhances the systematic bias.
Thus the effective way to increase precision is usually to select a better sampling design
which has a smaller sampling error for a given sample size at a given cost. In practice,
however, people prefer a less precise design because:
- it is easier to adopt
- Systematic bias can be controlled in a better way in such a design.
Page 67 of 121
7.6 TYPES OF SAMPLING METHODS
- On the representation basis, the sample may be probability sampling (based on the
concept of random selection) or it may be non-probability sampling (non-random
sampling).
- On element selection basis, the sample may be either unrestricted (each sample element
is drawn individually from the population at large) or restricted (all other forms of
sampling).
The following are the situations in which non-probability sampling can be used:
(i) Where it satisfactorily meets sampling objectives. For example, if it is not required
that the sample needs to meet a cross-section of the population, the non-probability
sampling is suitable.
(ii) Where it cuts on cost and time requirements as compared to probability sampling.
(iii)Where the application of probability sampling breaks down in its application, which
may happen due to the carelessness of people applying it.
Page 68 of 121
(ii) Quota Sampling: the interviewers are simply given quotas to be filled from different
strata, with some restrictions on how they are to be filled. In other words, the actual
selection of the items for the sample is left to the interviewer’s discretion. This
method is very convenient and is relatively inexpensive but introduces researcher
bias.
(iii)Purposive or Judgmental Samples: Sometimes a researcher selects a sub-group, which
can be judged to be representative of the population. Choosing the first three days of
the month as typical days for auditing, or picking a typical village to represent a
national rural population are examples of a purposive sample.
(iv) Convenience Sampling: the researcher selects those respondents who are close at
hand. This saves time, money and effort. What is lost in accuracy is gained in
efficiency. Volunteer subjects such as those used by archaeologists or historians are
an example of convenience or accidental samples
Random sampling from a finite population refers to that method of sample selection, which
gives each possible sample combination an equal probability of being picked up and each
item in the entire population to have an equal chance of being included in the sample. This
implies sampling without replacement i.e. once an item is selected for the sample, it cannot
appear in the sample again. We can therefore define a simple random sample from a finite
population as a sample, which is chosen in such a way that each of the NCn possible samples
have the same probability, (1/NCn) of being selected.
Illustration 7.1
Consider a certain finite population consisting of six elements (a, b, c, d, e, f) i.e. N = 6.
Suppose that you want to take a sample size n = 3 from it. Then there are 6C3 = 20 possible
distinct samples of the required size, and they consist of the elements:
{abc}; {abd}; {abe}; {abf}; {acd}; {ace}; {acf}; {ade}; {adf}; {aef}; {bcd}; {bce}; {bcf};
{bde}; {bdf}; {bef}; {cde}; {cdf}; {cef}; and {def}.
If one randomly chooses one of these samples the probability of choosing any of the 20
samples is 0.05 (1/20)
(ii) By writing the name of each element of a finite population on a slip of paper, putting
them into a box or bag and mixing them thoroughly and then drawing the required
number of slips for the sample one after the other without replacement. In doing so
one must make sure that in the successive drawings, each of the remaining elements
Page 69 of 121
of the population has the same chance of being selected. This procedure will also
result in the same probability for each possible sample.
(iii) By using random number tables to select a random sample. Tippet gave 10400 four-
figure numbers. He selected 41600 digits from the census reports and combined them
into fours to give his random numbers, which may be used to obtain a random sample.
Illustration 7.2
The first thirty sets of Tippet’s numbers are:
2952 6641 3992 9792 7979 5911 3170 5624 4167 9525 1545 1396
7203 5356 1300 2693 2370 7483 3408 2769 3563 6107 6913 7691
0560 5246 1112 9025 6008 8126
Suppose you are interested in taking a sample of 10 units from a population of 5000 units,
bearing numbers from 3001 to 8000. You will select 10 such figures from the above random
numbers which are not less than 3001 and not greater than 8000. If you randomly decide to
read the table numbers from left to right, starting from the first row itself, you obtain the
following numbers: 6641, 3992, 7979, 5911, 3170, 5624, 4167, 7203, 5356 and 7483. The
units bearing the above serial numbers would then constitute your required random sample.
NOTE
It is easy to draw random samples from finite populations with the aid of random number
tables only when lists are available and items are numbered. But in some situations, it is
often impossible to proceed in this way. For example, if you want to estimate the mean
height of trees in a forest, it would not be possible to number the trees, and choose random
numbers to select a random sample. In such a situation what you should do is to select some
trees for the sample haphazardly without aim or purpose, and should treat the sample as a
random sample for study purposes. Selection of each item in a random sample from an
infinite population is controlled by the same probabilities and that successive selections are
independent of one another.
One should resort to simple random sampling because under it bias is generally eliminated
and the sampling error can be estimated. Purposive sampling is considered more appropriate
when the universe happens to be small and a known characteristic of it is to be studied
intensively. At times, several methods of sampling may well be used in the same study.
Page 70 of 121
7.7 PROBABILITY SAMPLING METHODS
The simple random sampling is the basic probability sampling design. A simple random
sample is one, which every member of the population has an equal and independent chance of
being selected. Randomness as a sample selection process can be accomplished with either
lottery or a table or random numbers. Both methods require a listing of the population units
or the sampling frame.
For example, if a 4 per cent sample is desired, the first item would be selected randomly from
the first twenty-five and thereafter every 25th item would automatically be included in the
sample. Thus, in systematic sampling only the first unit is selected randomly and the
remaining units of the sample are selected at fixed intervals.
Merits:
(i) It can be taken as an improvement over a simple random sample in as much as the
systematic sample is spread more evenly over the entire population.
(ii) It is an easier and less costly method of sampling and can be conveniently used
even in case of large populations.
Demerits:
(i) If there is a hidden periodicity in the population, systematic sampling will prove to
be an inefficient method of sampling.
For instance, every 25th item produced by a certain production process is
defective. If you were to select a 4% sample of the items of this process in a
systematic manner, you would either get all defective items or all good items in
the sample depending upon the random starting position.
(ii) If the population list is not in random order, the results of such sampling may, at
times, not be very reliable.
In practice, systematic sampling is used when lists of population are available and they
are of considerable length.
The population is divided into several sub-populations that are individually more
homogeneous than the total population and then you select items from each stratum to
constitute a sample. Since each stratum is more homogeneous than the total population, you
are able to get more precise estimates for each stratum and by estimating more accurately
each of the component parts; you get a better estimate of the whole. Stratification results in
more reliable and detailed information.
Page 71 of 121
Three questions are relevant in this context:
(i) How to form strata?
The strata should be formed on the basis of common characteristic(s) of the items to
be put in each stratum. Various strata to be formed in such a way as to ensure
elements are most homogeneous within each stratum and most heterogeneous
between different strata. Strata are purposively formed and are based on past
experience and personal judgment of the researcher. Careful consideration of the
relationship between the characteristics of the population and the characteristics to be
estimated are used to define the strata. At times, pilot study may be conducted for
determining a more appropriate and efficient stratification plan. You can do so by
taking small samples of equal size from each of the proposed strata and then
examining the variances within and among the possible stratifications.
(iii) How many items to be selected from each stratum or how to allocate the sample size of
each stratum?
Method of proportional allocation under which the sizes of the samples from the
different strata are kept proportional to the sizes of the strata is followed. That is, if Pi
represents the proportion of population included in stratum I, and n represents the
total sample size, the number of elements selected from stratum I is [Link].
Illustration 7.3
Suppose we want a sample of size n = 30 to be drawn from a population of size
N = 8000 which is divided into three strata of size N1 = 4000, N2 = 2400 and N3 = 1600.
Adopting proportional allocation, the sample sizes from each stratum are obtained as follows:
NOTE
Proportional allocation is considered the most efficient and an optimal design when the cost
of selecting an item is equal for each stratum, there is no difference in within-stratum
variances, and the purpose of sampling happens to be to estimate the population value of
some characteristic.
Page 72 of 121
1. In the case where the purpose happens to compare the differences among the strata, then
equal sample selection from each stratum would be more efficient even if the strata differ
in sizes.
2. In cases where strata differ not only in size but also in variability, it is considered
reasonable to take larger samples from the more variable strata and smaller samples from
the less variable strata. The researcher can then account for both (differences in stratum
size and differences in stratum variability) by using disproportionate sampling design by
requiring that:
n1/N1σ1 = n2/N2σ2 = ….. = nk/NKσK where σ1, σ2, …,σK denote the standard deviations of
the k strata, N1, N2, … NK denote the sizes of the k strata and n1, n2, … nK denote the
sample sizes of k strata. This is called ‘optimum allocation’ in the context of
disproportionate sampling. The allocation in such a situation results in the following
formula for determining the sample sizes different strata:
ni = n. N1 σ1___________
N1 σ1 + N2 σ2 +. …+ NK σK For I = 1, 2, …,k.
Illustration 7.4
A population is divided into three strata so that N1 = 5000, N2 = 2000 and N3 = 3000.
Respective standard deviations are:
σ1 = 15, σ2 = 18 and σ3 = 5.
How should a sample of size n = 84 be allocated to the three strata, if you want optimum
allocation using disproportionate sampling design?
Using the disproportionate sampling design for optimum allocation, the sample sizes for
different strata will be determined as under:
= 6300000/126000 = 50
n2 = 84(2000) (18)_________
(5000) (15) + (2000) (18) + (3000) (5)
= 3024000/126000 = 24
n3 = 84(3000) (5)___________
Page 73 of 121
(5000) (15) + (2000) (18) + (3000) (5)
= 1260000/126000 = 10
3. In addition to differences in stratum size and differences in stratum variability, you may
have differences in stratum sampling cost, and then you can have cost optimal
disproportionate sampling design by requiring
where;
C1 = Cost of sampling in stratum 1
C2 = Cost of sampling in stratum 2
CK = Cost of sampling in stratum k
And all other terms remain the same as explained earlier.
The allocation in such a situation results in the following formula for determining the
sample sizes for different strata:
NOTE
Page 74 of 121
population is divided into a number of relatively small subdivisions which are themselves
clusters of still smaller units and then some of these clusters are randomly selected for
inclusion in the overall sample.
Suppose you want to estimate the proportion of machine-parts in an inventory, which are
defective. Also assume that there are 20000 machine parts in the inventory at a given point
of time, stored in 400 cases of 50 each. Now using a cluster sampling, you would consider
the 400 cases as clusters and randomly select ‘n’ cases and examine all the machine parts in
each randomly selected case.
It requires grouping of the population. The units of the population are grouped by cluster
rather than by strata For example, workers in the quality control division. Cluster sampling is
used only because it reduces cost by concentrating surveys in selected clusters. Hence
estimates based on cluster samples are usually more reliable per unit cost.
Demerits:
(i) Cluster sampling can lead to large sampling errors if it is not properly done, hence
less precise than random sampling.
(ii) There is not as much information in ‘n’ observations within a cluster as there
happens to be in ‘n’ randomly drawn observations.
Any of the other methods of sampling may be used in each of these stages. If you select
randomly at all stages, you will have what is known as multi-stage random sampling design.
This method of sampling is applied in big inquiries extending to a considerable large
geographical area, such as the entire country.
Suppose you want to investigate the working efficiency of nationalized banks in Kenya and
you want to take a sample of few banks for this purpose. The first stage is to select large
primary sampling unit such as provinces.
▪ If you select certain districts and interview all banks in the chosen districts. This would
represent a two-stage sampling with the ultimate sampling units being clusters of districts.
▪ If instead of taking a census of all banks within the selected districts, you select certain
towns and interview all banks in the chosen towns. This would represent a three-stage
sampling design.
▪ If instead of taking a census of all banks within the selected towns, you randomly sample
banks from each selected town, then it is a case of using a four-stage sampling plan.
Merits:
(i) It is easier to administer than most single stage designs mainly because of the fact
that sampling frame is developed in partial units.
(ii) A large number of units can be sampled for a given cost because of sequential
clustering, whereas this is not possible in most of the simple designs.
(iii) It is most useful in sampling a large number of units, especially when cost saving is
an important consideration.
Page 75 of 121
Demerits:
Sampling errors are likely to be larger than those of other probability samples.
Merits:
(i) The results of this type of sampling are equivalent to those of a simple random
sample i.e. not so biased
(ii) The method is less cumbersome
(iii) It is relatively less expensive.
Illustration 7.5
The number of departmental stores in 15 towns are 35, 17, 10, 32, 70, 28, 26, 19, 26, 66, 37,
44, 33, 29 and 28 as shown in table 7.2. If you want to select a sample of 10 stores, using
cities as clusters and selecting within clusters proportional to size, how many stores from
each town should be chosen?
Page 76 of 121
Since there are 500 departmental stores from which you have to select a sample of 10 stores,
the appropriate sampling interval is 50. The starting point is 10 and then you add successively
increments of 50 till 10 numbers have been selected. The numbers, thus, obtained are: 10,
60, 110, 160, 210, 260, 310, 410 and 460. From this list, two stores should be selected
randomly from town number five and one each from town number 1, 3, 7, 9, 10, 11, 12, and
14. This sample of 10 stores is the sample with probability proportional to size.
When a particular lot is to be accepted or rejected on the basis of single sample, it is known
as single sampling; when the decision is to be taken on the basis of two samples, it is known
as double sampling and in case the decision rests on the basis of more than two samples but
the number of samples is certain and decided in advance, the sampling is known as multiple
sampling. But when the number of samples is more than two but it is neither certain nor
decided in advance, this type of system is often referred to as sequential sampling.
Page 77 of 121
7.8 SUMMARY
.
Census inquiry involves a complete enumeration of all items in the ‘population while a sample
survey involves selection of a representative group of the population from whom the required
information is obtained.
A sampling design is a definite plan / the techniques or the procedure the researcher would
adopt in selecting items for the sample.
When sampling the researcher should not only pay attention to costs of collecting the data but
also the cost of making incorrect inference resulting from the data (systematic bias).
Systematic bias may result from an inappropriate sampling frame, selection using improper
random methods, Substitution of items in a sample, failure to cover all the items in a sample,
defective measuring device, non-respondents, indeterminacy principle, or due to the natural
bias in the reporting of data.
Sampling designs can be classified on the basis of representation or the element selection
technique. On the representation basis, the sample may be probability sampling or non-
probability sampling while on the basis of element selection the sample may be either
unrestricted or restricted
Non probability sampling does not afford any basis for estimating the probability that each
item in the population has of being included in the sample e.g. in Snowball sampling, Quota
Sampling, Purposive or Judgmental Sampling and Convenience Sampling
In Probability Sampling also known as random sampling or chance sampling every item of the
universe has an equal chance of inclusion in the sample. Random sampling ensures the law of
statistical regularity, which states that if on an average the sample chosen is a random one, the
sample will have the same composition and characteristics as the universe. Examples of
random sampling designs include Simple Random Sampling, systematic sampling, stratified
sampling, cluster sampling, multistage sampling, area sampling, and sequential sampling
Page 78 of 121
7.9 SELF-TEST QUESTIONS
1. Distinguish between
(i) Systematic sampling and sampling error
(ii) Probability and non probability sampling
(iii)Stratified and multistate sampling
(iv) Cluster and area sampling
(v) Proportionate allocation and disproportionate allocation
2. Compare and contrast the merits and demerits of census survey and sample surveys
5. Consider your chosen research topic and the problem under investigation. Describe the
sampling design to be used in undertaking the study
Page 79 of 121
TOPIC EIGHT
METHODS OF DATA COLLECTION
While deciding about the method of data collection to be used in the study, the researcher
should keep in mind two types of data viz., primary and secondary.
(i) Primary data are those, which are collected afresh and for the first time, and thus happen
to be original in character.
(ii) Secondary data, on the other hand, are those which have already been collected by
someone else and which have already been passed through the statistical process.
A decision has to be made on which sort of data to use for the study and accordingly select
one or the other method of data collection. The methods of collecting primary and secondary
data differ since primary data are to be originally collected, while in case of secondary data
the nature of data collection work is merely compilation.
Under this method, the information is sought by way of investigator’s own direct observation
without asking from the respondent. For instance, in a study relating to consumer behaviour,
Page 80 of 121
the investigator instead of asking the brand of a product used by the respondent, may himself
look at the product.
Advantages:
(i) Subjective bias is eliminated if observation is done accurately.
(ii) The information obtained under this method relates to what is currently happening; it
is not complicated by either the past behaviour or future intentions or attitudes.
(iii)This method is independent of respondents’ willingness to respond and as such is
relatively less demanding of active cooperation on the part of respondents as happens
to be the case in the interview or the questionnaire method.
(iv) This method is particularly suitable in studies, which deal with respondents who are
not capable of giving verbal reports of their feelings for one reason or the other.
Disadvantages:
(i) It is an expensive method.
(ii) The information provided by this method is very limited.
(iii)Sometimes unforeseen factors may interfere with the observational task.
(iv) At times, the fact that some people are rarely accessible to direct observation creates
obstacles for this method to collect data effectively.
While using observation method in data collection, the researcher should keep in mind the
following:
(i) What should be observed?
(ii) How the observations should be recorded?
(iii) How the accuracy of observation can be ensured?
Advantages:
1. The researcher is enabled to record the natural behaviour of the group
2. The researcher can even gather information, which could not easily be
obtained if he observes in a disinterested fashion.
3. The researcher can even verify the truth of statements made by informants in
the context of a questionnaire or a schedule.
Disadvantages:
1. The observer may lose the objectivity to the extent he participates emotionally
2. The problem of observation-control is not solved
3. It may narrow-down the researcher’s range of experience
Page 81 of 121
(iii) Disguised observation: When the observer is observing in such a manner that his
presence may be unknown to the people he is observing
(iv) Uncontrolled observation: If the observation takes place in the natural setting. No
attempt is made to use precision instruments. The major aim of this type of
observation is to get a spontaneous picture of life and persons. It has a tendency to
supply naturalness and completeness of behaviour, allowing sufficient time for
observing it. Uncontrolled observation is resorted to in case of exploratory researches.
The main pitfalls of non-controlled observation
- Subjective interpretation
- The danger of having the feeling that we know more about the observed
phenomena than we actually do.
(v) Controlled observation: When observation takes place according to definite pre-
arranged plans, involving experimental procedure. We use mechanical or precision
instruments as aids to accuracy and standardization. Such observation has a tendency
to supply formalized data upon which generalizations can be built with some degree
of assurance. Controlled observation takes place in various experiments that are
carried out in a laboratory or under controlled conditions
In certain cases it may not be possible or worthwhile to contact directly the persons
concerned or on account of the extensive scope of enquiry. In such cases, the direct personal
investigation technique may not be used. Instead, an indirect oral examination can be
conducted; whereby the interviewer cross-examines other persons who are supposed to have
knowledge about the problem under investigation and the information obtained is recorded.
Most of the commissions and committees appointed by government to carry an investigation
make use of this method.
Page 82 of 121
interviewer has relatively greater freedom while recording the responses to include some
aspects and exclude others. The flexibility in unstructured interviews results in lack of
comparability of one interview with another which makes analysis of unstructured responses
more difficult and time-consuming than that from structured interviews. Unstructured
interviews also demand deep knowledge and greater skill on the part of the interviewer.
Personal interviews may also be focused or clinical interview. Focused interview is meant to
focus attention on the given experience of the interviewee and its effects. The interviewer
has the freedom to decide the manner and sequence in which the questions would be asked
and has also the freedom to explore reasons and motives. The main task of the interviewer is
to confine the respondent to a discussion of issues with which she seeks conversance. Such
interviews are used generally in the development of hypotheses and constitute a major type of
unstructured interviews.
The clinical interview is concerned with broad underlying feelings or motivations or with the
course of individual’s life experience. The method of eliciting information under it is
generally left to the interviewer’s discretion. In case of non-directive interview, the
interviewer’s function is simply to encourage the respondent to talk about the given topic
with a bare minimum of direct questioning. The interviewer often acts as a catalyst to a
comprehensive expression of the respondents’ feelings and beliefs and of the frame of
reference within which such feelings and beliefs take on personal significance.
Page 83 of 121
10. The interviewer can collect supplementary information about the respondent’s
personal characteristics and environment, which is often of great value in interpreting
results.
Disadvantages:
1. It is a very expensive method, especially when large and widely spread geographical
sample is sought.
2. There remains the possibility of the bias of interviewer as well as that of the
respondent; there also remains the headache of supervision and control of
interviewers.
3. Certain types of respondents such as important officials or executives or people in
high income groups may not be easily approachable under this method and to that
extent the data may prove inadequate.
4. This method is relatively more time consuming, especially when the sample is large
and recalls upon the respondents are necessary.
5. The presence of the interviewer on the spot may over-stimulate the respondent,
sometimes even to the extent that he may give imaginary information just to make the
interview interesting.
6. Under the interview method the organization required for selecting, training and
supervising the field-staff is more complex with formidable problems.
7. Interviewing at times may also introduce systematic errors.
8. Effective interview presupposes proper rapport with respondents that would facilitate
free and frank responses. This is often a very difficult requirement.
Page 84 of 121
Advantages:
1. It is more flexible in comparison to mailing method
2. It is faster than other methods i.e. a quick way of obtaining information
3. It is cheaper than personal interviewing method since the cost per response is
relatively low
4. Recall is easy; callbacks are simple and economical
5. There is a higher rate of response than what we have in mailing method; the non-
response is generally very low
6. Replies can be recorded without causing embarrassment to interviewee
7. Interviewer can explain requirements more easily
8. At times, access can be gained to interviewee who otherwise cannot be contacted for
one reason or the other
9. No field staff is required
10. Representative and wider distribution of sample is possible.
Disadvantages:
1. Little time is given to interviewee for considered answers; interview period is not
likely to exceed five minutes in most cases
2. Surveys are restricted to interviewee who have telephone facilities
3. Extensive geographical coverage may get restricted by cost considerations
4. It is not suitable for intensive surveys where comprehensive answers are required to
various questions
5. Possibility of the bias of the interviewer is relatively more
6. Questions have to be short and to the point; probes are difficult to handle.
Advantages:
1. There is low cost even when the population is large and is widely spread
geographically.
2. It is free from the bias of the interviewer; answers are in interviewee’ own words
3. Interviewee have adequate time to give well thought out answers
4. Interviewee, who are not easily approachable, can also be reached conveniently
5. Large samples can be made use of and thus the results can be made more dependable
and reliable.
Disadvantages:
1. Low rate of return of the duly filled in questionnaires; bias due to non-response is
often indeterminate
2. It can be used only when interviewee are educated and cooperating
3. The control over questionnaire may be lost once it is sent.
Page 85 of 121
4. There is inbuilt inflexibility because of the difficulty of amending the approach once
questionnaires have been dispatched
5. There is the possibility of ambiguous replies or omission of replies altogether to
certain questions; interpretation of omissions is difficult.
6. It is difficult to know whether willing interviewee are truly representative
7. This method is likely to be the slowest of all.
Before using this method, conduct “pilot study” for testing the questionnaires. Pilot survey is
the replica and rehearsal of the main survey. Such a survey brings to the light the weaknesses
(if any) of the questionnaires and also of the survey techniques. From the experience gained
in this way, improvement can be effected.
The three main aspects of a questionnaire are: the general form, question sequence, and
question formulation and wording.
In an unstructured questionnaire, the interviewer is provided with a general guide on the type
of information to be obtained, but the exact question formulation is largely her own
responsibility and the replies are to be taken down in the respondent’s own words to the
extent possible; in some situations tape recorders may be used to achieve this objective.
Page 86 of 121
the relation of one question to another should be readily apparent to the respondent, with
questions that are easiest to answer being put in the beginning. The first few questions are
particularly important because they are likely to influence the attitude of the respondent and
in seeking her desired cooperation. The opening questions should be such as to arouse
human interest. The following type of questions should be avoided as opening questions in a
questionnaire:
1. Those that put too great a strain on the memory of intellect of the respondent
2. Those of a personal character
3. Those that relate to personal wealth, income, age, among others
Following the opening questions, you should have questions that are vital to the research
problem and a connecting thread should run through successive questions. The question-
sequence should conform to the respondent’s way of thinking. Knowing what information is
desired, you can rearrange the order of the question (in case of unstructured questionnaire) to
fit the discussion in each particular case. But in a structured questionnaire the best that can be
done is to determine the question-sequence with the help of a pilot survey, which is likely to
produce good rapport with most interviewee. Relatively difficult questions should be
relegated towards the end so that even if the respondent decides not to answer such questions,
considerable information would have already been obtained. Thus, question-sequence should
usually go from the general to the more specific and you should always remember that the
answer to a given question is a function not only of the question itself, but of all previous
questions as well. For instance, if one question deals with the price usually paid for a product
and the next with reason for preferring that particular band, the answer to this latter question
may be couched largely in terms of price-differences.
There are two principal forms of questions: multiple-choice questions (selects one of the
alternative possible answers put to her) and open end ones.
Disadvantages:
Page 87 of 121
1. Puts answers in people’s mouths” i.e. they may force a statement of opinion on an
issue about which the respondent does not in fact have any opinion.
2. Not appropriate when the issue under consideration happens to be a complex one and
also when the interest of the researcher is in the exploration of a process.
The open-end question refers to where an informant supplies the answer in her own words.
The question with only two possible answers (usually ‘Yes’ or ‘No’) can be taken as a special
case of the multiple-choice question, or can be named as a ‘closed question’. Open-ended
questions which are designed to permit a free response from the respondent rather than one
limited to certain stated alternatives are considered appropriate. Such questions give the
respondent considerable latitude in phrasing a reply. Getting the replies in respondent’s own
words is, thus, the major advantage of open-ended questions. However, from an analytical
point of view, open-ended questions are more difficult to handle, raising problems of
interpretation, comparability and interviewer bias.
In practice, the various forms complement each other. As such questions of different forms
are included in one single questionnaire. For instance, multiple-choice questions constitute
the basis of a structured questionnaire, particularly in mail survey. But even there, various
open-ended questions are generally inserted to provide a more complete picture of the
respondent’s feelings and attitudes.
Pay proper attention to the wordings of questions since reliable and meaningful returns
depend on it to a large extent. Since words are likely to affect responses, they should be
properly chosen. Simple words, which are familiar to all interviewee, should be employed.
Words with ambiguous meanings should be avoided. Similarly, danger words, catchwords or
words with emotional connotations should be avoided. Caution should also be exercised in
the use of phrases, which reflect upon the prestige of the respondent. Question wording, in no
case, should bias the answer.
Page 88 of 121
10. Brief directions with regard to filling up the questionnaire should be given in the
questionnaire itself.
11. The physical appearance of the questionnaire affects the cooperation the researcher
receives from the recipients and as such an attractive looking questionnaire,
particularly in mail surveys, is a plus point for enlisting cooperation.
12. The quality of the paper, along with the colour, should be good so that it may attract
the attention of recipients.
8.3.4 Schedules
Schedules are proformas containing a set of questions that are filled in by the enumerators.
These enumerators along with schedules go to interviewee, put to them the questions from
the proforma in the order the questions are listed and record the replies in the space meant for
the same in the proforma. In certain situations, schedules may be handed over to interviewee
and enumerators may help them in recording their answers to various questions in the said
schedules. Enumerators explain the aims and objects of the investigation and also remove the
difficulties, which any respondent may feel in understanding the implications of a particular
question or the definition or concept of difficult terms.
The enumerators should be trained to perform their job well and the nature and scope of the
investigation should be explained to them thoroughly so that they may well understand the
implications of different questions put in the schedule. Enumerators should be intelligent and
should possess the capacity of cross-examination in order to find out the truth. They should
be honest, sincere, and hardworking and should have patience and perseverance.
This method of data collection is very useful in extensive enquiries and can lead to fairly
reliable results. It is, however, very expensive and is usually adopted in investigations
conducted by governmental agencies or by some big organizations. Population census all
over the world is conducted through this method.
Page 89 of 121
5. The questionnaire method is likely to be very slow since many interviewee do not
return the questionnaire in time despite several reminders, but in case of schedules the
information is collected well in time as enumerators fill them in.
6. Personal contact is generally not possible in case of the questionnaire method as
questionnaires are sent to interviewee by post who also in turn return the same by post.
But in case of schedules direct personal contact is established with interviewee.
7. Questionnaire method can be used only when interviewee are literate ad cooperative,
but in case of schedules the information can be gathered even when the interviewee
happen to be illiterate.
8. Wider and more representative distribution of sample is possible under the
questionnaire method, but in respect of schedules there usually remains the difficulty
in sending enumerators over a relatively wider area.
9. Risk of collecting incomplete and wrong information is relatively more under the
questionnaire method, particularly when people are unable to understand questions
properly. But in case of schedules, the information collected is generally complete and
accurate as enumerators can remove the difficulties, if any, faced by interviewee in
correctly understanding the questions. As a result, the information collected through
schedules is relatively more accurate than that obtained through questionnaires.
10. The success of questionnaire method lies more on the quality of the questionnaire
itself, but in the case of schedules much depends upon the honesty and competency of
enumerators.
11. In order to attract the attention of interviewee, the physical appearance of
questionnaire must be quite attractive, but this may not be so in case of schedules as
they are to be filled I by enumerators and not by interviewee.
12. Along with schedules, observation method can be used but such a thing is not possible
while collecting data through questionnaires.
Page 90 of 121
Since it is not safe to take published statistics at their face value without knowing their
meaning and limitations, before using secondary data, they must possess following three
characteristics:
1. Reliability: Test reliability by finding out: Who collected the data? What were the
sources of data? Were they collected using proper methods? At what time were they
collected? Was there any bias of the compiler? What level of accuracy was desired?
Was it achieved?
2. Suitability: Carefully scrutinize the definition of various terms and units of collection
used at the time of collecting the data from the primary source. Similarly, the object,
scope and nature of the original enquiry must also be taken into account. If the
researcher finds differences in these, the data will be unsuitable for the current
enquiry and should not be used.
3. Adequacy: If the level of accuracy achieved in data is found inadequate for the
purpose of the current enquiry, they will be considered as inadequate and should not
be used. The data will also be considered inadequate, if they are related to an area,
which may be either narrower or wider than the area of the present enquiry.
It is very risky to use the already available data. Hence can only be used by the researcher
when she finds them reliable, suitable and adequate, instead of spending time and energy in
field surveys to collect information
(d) Precision required: Precision required is yet another important factor to be considered at
the time of selecting the method of collection of data.
Page 91 of 121
Conclusion
• Telephone interview method may be considered appropriate if funds are restricted, time
is also restricted and the data is to be collected in respect of few items with or without a
certain degree of precision.
• In case funds permit and more information is desired, personal interview method may be
is relatively better.
• In case time is ample, funds are limited and much information is to be gathered with no
precision, then mail-questionnaire method can be regarded more reasonable.
• When funds are ample, time is also ample and much information with no precision is to
be collected, then either personal interview or the mail-questionnaire or the joint use of
these two methods may be taken as an appropriate method of collecting data.
• Where a wide geographical area is to be covered, the use of mail-questionnaires
supplemented by personal interviews will yield more reliable results per shilling spent
than either method alone.
• The secondary data may be used in case the researcher finds them reliable, adequate and
appropriate for her research.
• While studying motivating influences in market researches or studying people’s attitudes
in psychological/social surveys, the research may resort to the use of one or more of the
projective techniques stated earlier. Such techniques are of immense value in case the
reason is obtainable from the respondent who knows the reason but does not want to
admit it or the reason relates to some underlying psychological attitude and the
respondent is not aware of it. When the respondent knows the reason and can tell the
same if asked, then a non-projective questionnaire, using direct questions, may yield
satisfactory results even in case of attitude surveys.
• The most desirable approach with regard to the selection of the method depends on the
nature of the particular problem and on the time and resources (money and personnel)
available along with the desired degree of accuracy.
• Much depends upon the ability and experience of the researcher.
Page 92 of 121
8.6 SUMMARY
A researcher can use primary or secondary data Primary data to study a research problem.
Primary data are those which are collected afresh and for the first time, and are thus original
in character while secondary data are those which have already been collected by someone
else and which have already been passed through some statistical process.
While secondary data collection will simply consist of compilation a lot of activity goes into
the collection of primary data. The researcher can use observation method, Interview method,
Questionnaires, schedules or a combination of different methods to collect primary data
Interview method involves presentation of oral-verbal stimuli by the researcher for oral-
verbal responses. This can be through face to face or telephone interaction.
In the questionnaire method a questionnaire is sent to the persons concerned with a request to
them to answer the questions and return the questionnaire. It is extensively employed in
various economic and business surveys where information is to be obtained from very many
respondents.
A pilot study for testing the questionnaires should be conducted before the main research is
undertaken. This helps to brings to the light the weaknesses (if any) of the questionnaires
and also of the survey techniques enabling improvements. The question-sequence in the
questionnaire should be clear and smoothly moving. The first few questions should be those
seeking for responses that are easy to remember, and those that arouse human interest
followed by questions that are vital to the research problem. The questions should be very
clear and complete
Schedules are proformas containing a set of questions that are filled in by the enumerators.
The put the questions from the proforma in the order listed to the interviewee and records the
replies in the space meant for the same in the proforma. The method is commonly employed
extensive enquiries such as conduction a country’s population census
Page 93 of 121
8.7 SELF-TEST QUESTIONS
3. Compare and contrast the merits and demerits of interviewing and use of questionnaires
in a social research survey
4. Explain the principles that govern interviews in research
5. Explain the essentials of a good questionnaire
6. Explain the main differences in the use of questionnaires and use of schedules in social
research
7. What factors should a researcher consider in determining the method to use in collecting
data
Page 94 of 121
TOPIC NINE
DATA ANALYSIS AND INTERPRETATION
Data analysis
Interpretation
of results
Research
discussion question
answered
Page 95 of 121
9.2.1 Editing data
Data have to be edited, especially when they relate to responses to open-ended questions of
interviews and questionnaires, or unstructured observations.
All the information that may have been noted down by the interviewer, observer, or
researcher in a hurry must be clearly interpreted so that it may be coded systematically. It is
recommended that such editing should be done preferably the very same day the data are
collected so that the respondents may be contacted for any further information or
clarification, if need be. The edited data should be identifiable through the use of a different
color pencil or ink so that the original information is still available in case of further doubts
later.
Not all respondents answer every item in the questionnaire. Answers may have been left
blank because the respondent did not understand the question, did not know the answer, was
not willing to answer, or was simply indifferent to the need to respond to the entire
questionnaire.
If a substantial number of questions – say, 25% of the items in the questionnaire – have been
left unanswered, it may be a good idea to throw out the questionnaire and not include it in the
data set for analysis. In such a case it is important to mention the number of returned but
unused responses due to excessive missing data in the final report submitted. If, however,
only two or three items are left blank in a questionnaire with, say, 30 or more items, there is
need to decide how these blank responses are to be handled. The following are possible
alternatives
(i) A blank response to an interval-scaled item with a mid-point would be assigned the
midpoint in the scale as the response to that particular item.
(ii) Allow the computer to ignore the blank responses when the analyses are done. This,
of course, will reduce the sample size whenever that variable is involved in the
analyses.
(iii) Assign to the item the mean value of the responses of all those who have responded
to that particular item.
(iv) Give the item the mean of the responses of this particular respondent to all other
questions measuring this variable.
(v) Give the missing response a random number within the range for that scale.
Page 96 of 121
There are several ways of handling blank responses but the common approach, is either to
give the midpoint in the scale as the value or to ignore the particular item during the analysis.
The best way to handle missing data to enhance the validity of the study, especially if the
sample size is big, is to omit the case where the datum relating to a particular analysis is
missing.
9.2.3 Coding
The easiest way to illustrate a coding scheme is through an example.
Consider a questionnaire designed to test the job involvement – job satisfaction hypothesis in
the organization (Serakan Co.) In the questionnaire, there are 5 demographic variables and 16
items measuring involvement and satisfaction as shown below
Here are some questions that ask the respondent to tell on he/she experience work life in
general. The respondent is asked to circle the appropriate number on the scales below. The
question paused is: “To what extent would you agree with the following statements, on a
scale of 1 to 7, 1 denoting very low agreement, and 7 denoting very high agreement?”
(i) The responses to the demographic variables can be coded from 1 to 5 for age, and 1 to
6 for the variables of education and job level, depending on which box in the columns
was checked by the respondent. Sex can be coded as 1 or 2 depending on whether the
Page 97 of 121
response was from a male or female. Work shift can be coded 1 to 3, and
employment status as either 1 or 2.
(ii) It is easy to see that when some thought is given to coding at the time of designing the
questionnaire, coding can become simple. For example, since numbers were given
within boxes for all the above items (instead of simply putting a box for marking
appropriate one), it would be easy to transfer them to the code sheet, or directly key in
the data.
(iii) Items numbered 6 to 21 on the questionnaire can be coded by using the actual number
circled by the respondents. If, for instance, 3 had been circled for the first question
then the response will be coded as 3; if 4 was circled, we would code it as 4, and so
on.
(iv) It is possible to key in the data directly from the questionnaires, but that would need
flipping through several questionnaires, page by page, resulting in possible errors and
omissions of items. Transfer of the data first onto a code sheet would thus help.
Human errors can occur while coding. At least 10% of the coded questionnaires
should therefore be checked for coding accuracy. Their selection may follow a
systematic sampling procedure. That is, every nth form coded could be verified for
accuracy. If many errors are found in the sample, all items may have to be checked.
9.2.4 Categorization
It is often useful to set up a scheme for categorizing the variables such that the several items
measuring a concept are all grouped together.
Responses to some of the negatively worded questions have also to be reversed so that all
answers are in the same direction. For example, a response of 7 on a 7-point scale, with 7
denoting “strongly agree” for a negatively worded statement really means “strongly
disagree”, which actually is a 1 on the 7-point scale. Thus the item has to be reversed so as to
be in the same direction as the positively worded questions. In the Serakan Co. data, items 16
to 21 will have to be recoded such that scores of 7 are read as 1; 6 as 2; 5 as 3; 3 as 5; 2 as 6;
and 1 as 7.
If the questions measuring a concept are not contiguous but scattered over various parts of the
questionnaire, care has to be taken to include all the items without any omission or wrong
inclusion
If questionnaire data are not collected on scanner answer sheets, which can be directly
entered into the computer as a data file, the raw data will have to be manually keyed into the
computer. Raw data can be entered through any software program. For instance, the SPSS
Data Editor, which looks like a spread sheet, can enter, edit, and view the contents of the data
file. Each row of the editor represents a case, and each column represents a variable. All
missing values will appear with a period (dot) in the cell. It is possible to add, change, or
delete values easily after the data have been entered.
Page 98 of 121
9.3 DATA ANALYSIS AND INTERPRETATION
If the response to each individual item in a scale does not have a good spread (range) and
shows very little variability, then the researcher would suspect that the particular question
was probably not properly worded and respondents did not quite understand the intent of the
question. Biases, if any, could also be detected if the respondents have tended to respond
similarly to all items – that is, stuck to only certain points on the scale. The maximum and
minimum scores, mean, standard deviation, variance, and other statistics can be easily
obtained, and these will indicate whether the responses range satisfactorily over the scale.
Remember that if there is no variability in the data, then no variance can be explained!
It is always prudent to obtain (1) the frequency distributions for the demographic variables,
(2) the mean, standard deviation, range, and variance on the other dependent and independent
variables, and (3) an intercorrelation matrix of the variables, irrespective of whether or not
the hypotheses are directly related to these analyses. These statistics give a feel for the data.
In other words, examination of the measure of central tendency, and how clustered or
dispersed the variables are, gives a good idea of how well the questions were framed for
tapping the concept. The intercorrelation matrix will give an indication of how closely
related or unrelated the variables under investigation are. If the correlation between two
variables happens to be high – say, over .75 – we would start to wonder whether they are
really two different concepts, or whether they are measuring the same concept. If two
variables that are theoretically stated to be related do not seem to be significantly correlated
Page 99 of 121
to each other in our sample, we would begin to wonder if we have measured the concepts
validity and reliably.
Establishing the goodness of data lends credibility to all subsequent analyses and findings.
Hence, getting a feel for the data becomes the necessary first step in all data analysis. Based
on this initial feel, further detailed analyses may be done to test the goodness of the data.
(a) Reliability
The reliability of a measure is established by testing for both consistency and stability.
Consistency indicates how well the items measuring a concept hang together as a set.
Cronbach’s alpha is a reliability coefficient that indicates how well the items in a set are
positively correlated to one another. Cronbach’s alpha is computed in terms of the average
intercorrelations among the items measuring the concept. The closer Cronbach’s alpha is to
1, the higher the internal consistency reliability.
The stability of a measure can be assessed through parallel form reliability and test-retest
reliability. When a high correlation between two similar forms of a measure is obtained,
parallel form reliability is established. Test-retest reliability can be established by computing
the correlation between the same tests administered at two different time periods.
(b) Validity
Factorial validity can be established by submitting the data for factor analysis. The results of
factor analysis (a multivariate technique) will confirm whether or not the theorized
dimensions emerge. Recall that measures are developed by first delineating the dimensions
so as to operationalize the concept. Factor analysis would reveal whether the dimensions are
indeed tapped by the items in the measure, as theorized. Criterion-related validity can be
established by testing for the power of the measure to differentiate individuals who are
known to be different Convergent validity can be established when there is high degree of
correlation between two different sources responding to the same measure (e.g. both
supervisors and subordinates respond similar to a perceived reward system administered to
them). Discriminant validity can be established when two distinctly different concepts are
not correlated to each other (as, for example, courage and honesty; leadership and motivation;
attitudes and behavior).
.
(i) It is used to establish relationship between two variables, both of which are
categorical in nature i.e. nominal data and also for higher scales. Example includes
where persons, events, or objects are grouped in two or more nominal categories such
as “yes-no” “favour-undecided-against” or class “A, B, C or D). For example, you
can to test the hypothesis that there is a relationship between educational achievement
and income level. The variable “educational achievement” is categorized as “primary”
“secondary” and “college”. The variable “income level” is categorized as “low
income”, “middle income”, and “high income”. The technique compares the
proportion observed in each category with what would be expected under the
assumption of independence between the two variables. If the observed frequency
greatly departs from what is expected, then you reject the null hypothesis that the two
variables are independent of each other. You then conclude that one variable is
related to the other
(ii) Observations recorded and used are collected on random basis
(iii) All the items in the sample must be independent
(iv) No group should contain very few items, say less than 10. In case where the
frequencies are less than 10, re-grouping is done by combining the frequencies of
adjoining groups so that the new frequencies become greater than 10
(v) The overall number of items must also be reasonably large. It should normally be at
least 50.
(vi) The constraints must be linear. Constraints that involve linear equations in the cell
frequencies of a contingency table (i.e. equations containing no squares or higher
powers of the frequencies) are known as linear constraints.
Illustration
You have collected the following data on ownership of Small Micro enterprises
(SME).
Women 50 450
Does the act of owning a SME dependent on sex? Test the hypothesis at 5 % significance
level.
9.6 Case Study: Data Analysis and Interpretation in a Business Research Project
[Source: Sekeran, 2003]
Consider the example of a case of business research project presented below In the case study
a brief description of the background of the company in which the research was carried out
and how the sample was obtained is given then discussions on the analysis done to test each
hypothesis and the results is presented
Since access to those who had left the company would be difficult, the research team
suggested to the president that they would talk to the current employees, and based on their
inputs and a literature survey, try to get at the factors influencing employees’ intentions to
stay with, or leave, the company. Since past research has shown that intention to leave (ITL)
is an excellent predictor of actual turnover, the president concurred.
The team first conducted an unstructured interview with about 50 employees at various levels
and from different departments. Their broad statement was: “We are here to find out how
you experience your work life. Tell us whatever you consider is important for you in your
job, as issues relate to your work, environment, the organization, supervision, and whatever
else you think is relevant. If we get a good handle on the issues involved, we may be able to
make appropriate recommendations to management to enhance the quality of your work life.
We would just like to talk to you now, and administer a questionnaire later”.
Each interview typically lasted about 45 minutes, and notes on the responses were written
down by the team members. When the responses were tabulated, it became clear that the
issues most frequently brought up by the respondents in one form or another, related to three
main areas: the job (employees said the jobs were dull or too complex; there was lack of
freedom to do the job as one wanted to, etc), perceived inequalities (remarks such as “other
companies pay more for the kind of jobs we do”; “compared to the work we do, we are not
adequately paid”; etc.); and burnout )comments such as “there is so much work to be done
that by the end of the day we are physically and emotionally exhausted”; “we feel the
frequent need to take time off because of exhaustion”; etc).
A literature survey confirmed that these variables were god predictors of intention to leave
and subsequent turnover. In addition, job satisfaction was also found to be a useful predictor.
A theoretical framework was developed based on the interviews and the literature survey, and
five hypotheses (stated later) were developed.
Next, a questionnaire was designed incorporating well-validated and reliable measures for the
four independent variables of job characteristics, perceived inequity, burnout, and job
satisfaction, and the dependent variable of intention to leave. Demographic variables such as
age, education, gender, tenure, job title, department, and work shift were also included in the
questionnaire. The questionnaire was administered personally to 174 employees who were
chosen on a disproportionate stratified random sampling basis. The responses were entered
into the computer. Thereafter, the data were submitted for analysis to test the following
hypotheses, which were formulated by the researchers.
1. Men will perceive less equity than women (or women will perceive more equity than
men)
The researcher submitted the data for computer analysis using the SPSS Version 11.0 for
Windows software program.
We will now proceed to discuss the results of these analyses and their interpretation. In
particular, we will examine the following:
a) The establishment of Cronbach’s alpha for the measures
b) The frequency distribution of the variables
c) Descriptive statistics such as the mean and standard deviation
d) The Pearson correlation matrix
e) The results of hypotheses testing
The result indicates that the Cronbach’s alpha for the six-item Intention to Leave measure is
.82. The closer the reliability coefficient gets to 1.0, the better. In general, reliabilities less
It is important to note that all the negatively worded items in the questionnaire should first be
reversed before the items are submitted for reliability tests. Unless all the items measuring a
variable are in the same direction, the reliabilities obtained will be incorrect.
Reliability Output
Reliability Coefficients 6 items
Alpha = .8172 Standardized item alpha = .8168
The variance for burnout, job satisfaction, and the job characteristics is not high. The
variance for ITL and perceived equity (distributive justice) is only slightly more, indicating
that most respondents are very close to the mean on all the variables.
In sum, the perceived equity is rather low, not much burnout experienced, the job is perceived
to be fairly enriched, there is average job satisfaction, and there is neither a strong intention to
stay with the organization nor to leave it.
The Pearson correlation coefficient is appropriate for interval-and ratio-scaled variables, and
the Spearman Rank or the Kendall’s Tau coefficients are appropriate when variables are
measured on an ordinal scale. Any bivariate correlation can be obtained by clicking the
relevant menu, identifying the variables, and seeking the appropriate parametric or
nonparametric statistics.
It is important to note that no correlation exceeded .59 for this sample. If correlations were
higher (say, .75 and above), we might have had to suspect whether or not the correlated
Hypothesis 1: Use of t-test. Hypothesis 1 can be stated in the null and alternate as follows:
H10: There will be no difference between men and women in their perceived inequities.
Statistically expressed: H10 is: W = M
Where W the equity is perceived by women and M is the equity perceived by men.
H1A: Women will perceive more equity than men (or men will perceive less equity than
women).
Statistically expressed: H1A is: W M
A t-test will indicate if the perceived differences are significantly different for women than
for men. The results of the t-test done are shown in Output 3. as may be seen, the difference
in the means of 2.43 and 2.34 with standard deviations of .75 and .76 for the women and men
on perceived equity (or distributive justice) is not significant (see table showing t-test for
Equality of Means). Thus hypothesis 1 is not substantiated.
t Test Output
Group Statistics
N Mean Std Std Error
Deviation Mean
Dist Treatment Male 149 2.43 .75 .052
Justice Female 25 2.34 .76 .154
Hypothesis 2: Use of ANOVA. The second hypothesis can be stated in the null and
alternate as follows:
H20: The job satisfaction of individuals will be the same irrespective of the shift
Statistically expressed, H20 is: 1 = 2 = 3
Where 1 , 2 and 3 signify the means on the job satisfaction of employees working in shifts
1, 2, and 3, respectively.
H2A: The job satisfaction of individuals will not be the same (will vary) depending on which
shift they work.
Statistically expressed, H2A is: 1 2 3
Output 4: ANOVA
Choose:
Analyze
Compare Means
One-way ANOVA…
(Select the dependent variable/s and one independent factor variable)
Oneway ANOVA Output
ANOVA
Since there are more than two groups (three different shifts) and job satisfaction is measured
on an interval scale, ANOVA is appropriate to test this hypothesis. The results of ANOVA,
testing this hypothesis, are shown in output 12.6.
The df in the third column refers to the degrees of freedom, and each source of variation has
associated degrees of freedom. For the between-groups variance, df = (K – 1), where K is the
total number of groups or levels. Because there were three shifts, we have (3 – 1) = 2 df.
The df for the within-groups sum of squares equals (N – K), where N is the total number of
respondents and K is the total number of groups. If there were no missing responses, (N - K)
should be (174-3) = 171. However, in this case, there were 12 missing responses, and hence
the associated df is (162 – 3) = 159.
The mean square for each source of variation (column 5 of the results) is derived by dividing
the sum of squares by its associated df. Finally, the F value itself equals the explained mean
square divided by the residual mean square.
MS exp lained
F=
MS residual
In this case, F = 3.327 (.831/.249). This F value is significant at the .04 level. This implies
that hypothesis 2 is substantiated. That is, there are significant differences in the mean
satisfaction levels of workers in the three shifts, and the null hypothesis can be rejected.
The F test used here is called the overall or omnibus F test. To determine among which
groups the true differences lie, other tests need to be done, as discussed in chapter 9. The
Duncan Multiple Range Test was performed for the purpose (Output not shown). The
results showed that the mean job satisfaction for the three groups was 3.15 for the first shift,
2.91 for the second shift, and 3.23 for the third shift. The second shift with the low job
satisfaction is the one that is significantly different from groups 1 and 3 at the p .05 level.
Hypothesis 3: Use of ANOVA. Hypothesis 3 can be stated in the null and the alternate as
follows:
H30: There will be no difference in the intention to leave of employees at the five different
five different job levels.
Hypothesis 4: Use of Chi-Square Test. Hypothesis 4 can be stated in the null and alternate
as follows:
Since both variables are nominal, a chi-square ( x 2 ) test was done, the results of which are
shown in Output 12.8. The cross-tabulation count indicates that, of the full-time employees,
103 work the first shift, 25 work the second and 18 the third shift. Of the part-time
employees, 16 work the first shift, 8 the second shift, and 4 the third shift.
It may be seen that the X2 value of 2.31, with two degrees of freedom, is not
significant. In other words, the part-time/full-time status and the shifts worked are not
related. Hence hypothesis 4 has not been substantiated.
Chi-Square Tests
Value df Asymp. Sig.
(2-sided)
Pearson 2.312 2 .314
Chi-Square
Likelihood ratio 2.163 2 .339
Linear-by-linear 1.103 1 .294
Association
N of valid cases 174
To test this hypothesis, multiple regression analysis was done. The results of regressing the
four independent variables against intention to Leave can be seen in Output 12.9.
The first table in the Output lists the four independent variables that are entered into the
regression model and R (.548) is the correlation of the four independent variables with the
dependent variable, after all the intercorrelations among the four independent variables are
taken into account.
In the Model Summary table, The R Square (.30), which is the explained variance, is
actually the square of the multiple R (.548)2 The ANOVA table shows that the F value of
16.72 is significant at the .0001 level. In the df (degree of freedom) in the same table, the
first number represents the number of independent variables (4), the second number (156) is
the total number of complete responses for all the variables in the equation (N), minus the
number of independent variables (K) minus 1. (N – K – 1) [(161 – 4 – 1) = 156]. The F
statistic produced (F = 16.72) is significant at the .0001 level.
What the results mean is that 30% of the variance (R-square) in Intention to Leave has been
significantly explained by the four independent variables. Thus, hypothesis 5 is
substantiated.
The next table titled Coefficients helps us to see which among the four independent variables
influences most the variance in ITL (i.e. is the most important). If we look at the column
Beta under Standardized Coefficients, we see that the highest number in the beta is -.37 for
job satisfaction, which is significant at the .0001 level. It may also be seen that this is the
only independent variable that is significant. The negative beta weight indicates that if ITL is
to be reduced, it is necessary to enhance the job satisfaction of employees.
Regression Output
Model Summary3,4
Variables Std. Error
Entered Removed R R Square Adjusted of the
R Square Estimate
Model 1 Job Char .548 .300 .282 .578
Dist Just
Burnout
Job
Sat.1,2
1
Indep. vars: (constant) Job Char, Dist Just, Burnout, Job Sat
2
All requested variables entered
3
Dependent Variables: ITL
4
Method: Enter
ANOVA2
Sum of df Mean F Significance
Squares Square
Model 1 Regression 22.366 4 5.591 16.717 .0001
Residual 52.180 156 .335
Total 74.546 160
1
[Link] (constant) Job Char, Dist Justice, Burnout, Job Sat
2
Dependent variable: ITL
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta T Sig
It is informative to find that the perceived equity, though not significantly different for men
and women as originally hypothesized, is nevertheless rather low for all (see Output 12.3).
The Pearson correlation matrix (Output 12.4) indicates that perceived equity (or distributive
justice) is positively correlated to job satisfaction and negatively correlated to ITL. The
president will therefore be well advised to rectify inequities, in the system, if they do really
exist or clear misperceptions of inequities, if this were to be actually the case.
Increasing job satisfaction will no doubt help to reduce employees’ intention to quit, but the
fact that only 30% of the variance in Intention to Leave was significantly explained by the
four independent variables considered in this study still leaves 70% unexplained. In other
We have now seen how different hypotheses can be tested by applying the appropriate
statistical tests in data analysis. Based on the interpretation of the results, the research report
is then written, making necessary recommendations and discussing the pros and cons of each,
together with cost/benefit analysis. Limitations to the study are also specifically stated so
that the reader is made aware of the biases that might have crept into the study. This also
gives a professional touch to the study, attesting to its scientific orientation.
9.7 SUMMARY
.After data are obtained through questionnaires interviews, observation, or through secondary
sources, they need to be edited. The blank responses, if any, have to be handled in some
way, the data coded, and a categorization scheme has to be set up. The data will then have to
be keyed in, and some software program used to analyze them.
.
Editing involves making clear interpretation of information that may have been noted down
by the interviewer, observer, or researcher in a hurry so that it may be coded systematically
especially when information gathered relate to responses to open-ended questions of
interviews and questionnaires, or unstructured observations. Such editing should be done the
same day the data are collected.
It is often useful to set up a scheme for categorizing the variables such that the several items
measuring a concept are all grouped together. To do this responses to the negatively worded
questions have to be reversed so that all answers are in the same direction.
After coding the data is then entered through any software program to enable analysis
Appropriate statistical procedures are then applied to analyze the data addressing each
objective of the study. The results must be interpreted and conclusions made and reported
appropriately.
1. Sekeran, U. 2003.
2. Kothari, C.R (2004) – Chapter 7, 9 - 14
3. Nachmias and Nachmias. (2004) – Chapter14 -19
4. Cooper and Schilder (2011) – Chapters 15 - 19
The proposal will comprise of some preliminary pages and three chapters and the layout of
the paper should be as follows:
Title Page
Declaration Page
Dedication Page
Acknowledgements
Table of Contents
List of Tables
List of Figures
Acronyms/ Abbreviation
Operational definition of terms
Abstract
REFERENCES
(i) Cover Page: The cover page should include the following information:
▪ Title of the proposal
▪ The names of the author, Department and university
▪ The address of the author (optional)
▪ Name of supervisor(s) to receive the proposal
▪ Purpose of presentation of the proposal
▪ Date of presentation
(ii) Declaration Page
Statements by the student / candidate and supervisors declaring that the work
presented in the document is original
(iii) Dedication Page :
The candidate dedicates the work to a chosen individual whose name is mention and
the relationship is need be
(iv) Acknowledgements
Involve acknowledging those who aided in the process of writing the proposal by
supplying important information or tools of analysis
(v) Table of Contents
List of all the administrative subtitles (such as topic, declaration, dedication,
acknowledgement, acronyms, operational definition of terms, and abstract),
chapters/topics and sub-topics and the pages on which they appear in the document
(i) List of Tables
List of all tables and the pages on which they appear in the document
(ii) List of Figures
List of all figures and the pages on which they appear in the document
(viii) Acronyms/ Abbreviation
This should list and give the full meaning of the acronyms or abbreviations used in
the study. For example, WHO: World Health Organization.
(ix) Operational definition of terms
List the terms that you think are not very familiar the way you have used them in the
study. For example, Gross Domestic Product: refer to the value of goods and services
1.7 Organization of the Study- show arrangement of different sections of the study
The literature should entail an examination of what others have said/done in the field covered
by the study. The idea is to study the existing literature on the topic and relate it to the
research problem.
3.1 Introduction
Highlight what the chapter entails
REFERENCE
This section presents a list of the references made in the text/ paper. Within the text, only
Author’s surname and the year are indicated. However in the Reference list, the surname,
year, title, edition, town and publisher should be indicated using either the APA or Harvard
or Chicago styles of reference. However consistency is required whichever style of
referencing is chosen.
Example:
However, journal articles should appear in the references as: Young, K. H (1986) “Estimating
consumer Demand in Korea” Journal of Development Economics. Vol. 20 No. 2. North
Holland: Sciences publishers, Pp. 19 – 25.
Government documents are cited in the text as “Republic of Kenya (2002). While in the
reference it will appear as: Republic of Kenya (2002) National Development plan; 2002 –
2008. Nairobi: Government printer.
Where an Author has two cited articles or documents which may also have been published in
one year, these should appear in the reference list, for example, as:
- Republic of Kenya (2002a) National Development Plan: 2002 - 2008. Nairobi:
Government printer.
- Republic of Kenya, (2002b) Economic Survey. Nairobi: Government printer.
NOTE
1. Only documents that are cited within the text should appear in the bibliography.
2. Items in the bibliography must be arranged alphabetically.
APPENDICES
Additional information that is not directly part of the proposal, but which is considered to be
relevant for the understanding of the proposal, should be attached to the research proposal as
an annex. These include, the study’s work Plan, the Study’s Budget and any other
information deemed important e.g. graphs, data, computations, derivations etc. which if
included in the main text, would distract the reader from the general flow of ideas. Each
particular item should appear as an appendix on its own. Appendices should be labeled as I II
III and titles given.
NB:
The total number of pages of a research proposal should not normally exceed twenty-five
double-spaced typed pages.