RESEARCH METHODOLOGY
Complete Study Notes — All Modules
BSc Finance Semester IV | SVKM's Narsee Monjee Institute of Management Studies
SYLLABUS COVERAGE — ALL TOPICS INCLUDED
Module I: Introduction to Research (Meaning, Types, Characteristics, Limitations)
Module II: Research Process (8 Steps across 3 Phases)
Module III: Research Problem (Meaning, Sources — 4Ps, Criteria, Formulation Steps)
Module IV: Research Design (Exploratory, Descriptive, Causal; Cross-sectional &
Longitudinal;
Pre/Quasi/True/Statistical Experimental Designs; Internal & External Validity)
Module V: Types of Data (Primary & Secondary; Internal & External Secondary Data;
Evaluation of Secondary Data; Qualitative & Quantitative; All Collection Methods)
Module VI: Measurement & Scaling (4 Scales; Comparative & Non-Comparative Techniques)
Module VII: Questionnaire Design (10 Steps; All Question Types; Wording Criteria)
Module VIII: Sampling (Meaning, Advantages, Disadvantages; All Probability & Non-
Probability
Methods; Sample Size Calculation Formulas)
Module IX: Data Processing & Analysis (Editing, Coding, Classification, Tabulation,
Graphical Presentation; Central Tendency, Dispersion, OLS Regression)
Module X & XI: Hypothesis Testing (All Tests — Z, t, Paired-t, Two-sample, F, Chi-Square;
Confidence Intervals; Types of Errors)
Module XII: Report Writing (Types, Format, Steps)
MODULE I: INTRODUCTION TO RESEARCH
1.1 Meaning of Research
Research is a way of thinking — examining critically the various aspects of your day-to-day professional
work, understanding and formulating guiding principles that govern a particular procedure, and
developing and testing new theories that contribute to the advancement of your practice and profession.
In essence, research is a systematic, controlled, empirical, and critical investigation of hypothetical
propositions about the presumed relations among natural phenomena.
1.2 Types of Research
Research can be classified from three perspectives:
1. Application of the findings of the research study
2. Objectives of the study
3. Mode of enquiry used in conducting the study
Note: These types are NOT mutually exclusive — a study can belong to multiple categories
simultaneously.
A. Based on Application
Type Description
Pure Research Develops and tests theories/hypotheses that may not have immediate
practical application. Concerned with developing research methodology
itself. Examples: developing sampling techniques, validity assessment
procedures, stress measurement instruments.
Applied Research Solves specific practical problems of individuals or groups. Most social
science research is applied. Example: studying consumer preferences to
improve a product.
B. Based on Objectives
Type Purpose & Example
Descriptive Describes a situation, problem, or phenomenon systematically. Ex:
Determining market share, demographic profile of customers.
Correlational Establishes the existence of a relationship/association between two or
more variables. Ex: Impact of advertising on sales; relationship between
stress and heart attacks.
Explanatory Clarifies WHY and HOW a relationship exists between variables. Ex: Why
does stressful living result in heart attacks?
Exploratory Explores areas where little is known; often a precursor to a larger study.
Ex: New product idea exploration; small-scale feasibility study.
C. Based on Mode of Enquiry
Approach Description
Structured / Quantitative Used to determine the EXTENT of a problem. Fixed questions,
measurable data. Appropriate for finding 'how many' people have a
particular view.
Unstructured / Qualitative Used to EXPLORE the nature of a problem. Open-ended, flexible.
Appropriate for understanding different perspectives and reasons
behind phenomena.
1.3 Characteristics of Good Research (CRSVEC)
SIX CHARACTERISTICS — Remember: C-R-S-V-E-C
1. CONTROLLED: Minimises the effects of other factors when studying causality between
variables.
2. RIGOROUS: Procedures are relevant, appropriate and justified.
3. SYSTEMATIC: Procedures follow a logical sequence; steps cannot be taken haphazardly.
4. VALID & VERIFIABLE: Conclusions can be verified by you and others; based on correct
interpretation.
5. EMPIRICAL: Conclusions based on hard evidence gathered from real-life
experiences/observations.
6. CRITICAL: Process is foolproof; must withstand critical scrutiny.
1.4 Problems / Limitations of Research
• Time consuming and expensive — requires expert involvement and modern techniques
• Not an exact science — involves human subjects susceptible to error; cannot predict exactly
• Does not offer solutions — provides indicative information, not decisions
• Lack of management enthusiasm — some managers prefer intuition, find research costly
• Complex business environment — dynamic changes may make findings irrelevant
• Shortage of qualified staff — lack of skilled manpower affects research quality
• Sampling error — sample may not truly represent the population
• Limited applications — managers may view research as purely academic
• Researcher or respondent bias — findings influenced by inference bias; respondents may not be
frank
MODULE II: THE RESEARCH PROCESS
2.1 Three Phases of the Research Process
Phase Steps Involved
Phase I: Deciding What to Step 1: Formulating a research problem
Research
Phase II: Planning a Step 2: Conceptualising a research design Step 3: Constructing
Research Study data collection instrument Step 4: Selecting a sample Step 5:
Writing a research proposal
Phase III: Conducting a Step 6: Collecting data Step 7: Processing and displaying data
Research Study Step 8: Writing a research report
2.2 The 8 Steps in Detail
Step 1: Formulating a Research Problem
The most important step — all subsequent steps are influenced by it. Key questions to ask: What do I
want to find out? Do I have sufficient funds, time, knowledge and skills?
Step 2: Conceptualising a Research Design
'What you find depends on how it was found.' Select appropriate research design (quantitative,
qualitative, or mixed methods). Design must be valid, workable, and manageable.
Step 3: Constructing an Instrument for Data Collection
Construct research tools — interview schedules, questionnaires, observation notes, diaries. Alternatively,
use secondary data. Conduct a pilot study (pre-testing).
Step 4: Selecting a Sample
Select appropriate sample to represent the study population. Avoid bias. Choose between
random/probability or non-random/non-probability samples.
Step 5: Writing a Research Proposal
A detailed plan covering: what you propose to do, how you plan to proceed, and why you selected this
strategy.
Step 6: Collecting Data
Data gathering through: interviews, mailing questionnaires, focus group discussions, observation. Be
aware of ethical issues.
Step 7: Processing and Displaying Data
Analyse data using descriptive, quantitative (statistical), or qualitative approaches depending on the type
of information collected.
Step 8: Writing a Research Report
Document what you have done and what conclusions you have drawn. Format differs for quantitative vs
qualitative research.
MODULE III: RESEARCH PROBLEM
3.1 Meaning of Research Problem
A research problem is a situation that requires research and investigation. To solve a problem, one must
know what the problem is — like identifying a destination before beginning a journey. A large part of the
solution lies in knowing what one is trying to do.
3.2 Importance of Formulating a Research Problem
• First and most important step in research
• Quality and relevance of the entire research relies on problem formulation
• Determines the methodology and design of the project
• The clearer the research question, the easier all the next steps become
3.3 Sources of Research Problems — The 4 Ps
THE 4 Ps FRAMEWORK
PEOPLE: Select a group of individuals or community to examine issues relating to their lives
PROBLEMS: Ascertain attitudes towards an issue or examine problems people face
PROGRAMMES: Evaluate the effectiveness of an intervention or programme
PHENOMENA: Establish the existence of a regularity or occurrence
Most research studies are based on at least a combination of TWO Ps.
3.4 Criteria for Selecting a Research Problem
Criterion Explanation
Interest Research is time consuming and involves hard work; you must be
genuinely interested.
Magnitude Select a topic manageable within your time and resource constraints.
Measurement of Be clear about the indicators and measurement of concepts used in
Concepts your study.
Level of Expertise Ensure you have adequate expertise for the proposed task.
Relevance Study should add to existing body of knowledge, bridge current gaps,
and be useful in policy formulation.
Availability of Data Ensure that the data needed are available.
Ethical Issues Examine how ethical issues affect the study population and how they
can be overcome at the problem formulation stage.
3.5 Formulation of Research Problem — 7 Steps
4. Identify the main subject area (e.g., Alcoholism)
5. Dissect into sub-areas (e.g., Profile of alcoholics, causes, effects on family, community attitudes)
6. Select the sub-area of most interest (e.g., 'Effect of alcoholism on the family')
7. Raise research questions (e.g., What is impact on marital relations? How does it affect children?
What are financial effects?)
8. Formulate objectives — one main objective and specific sub-objectives
9. Assess objectives against work involved, time available, financial resources, technical expertise
10. Double-check: you are interested; you agree with objectives; you have resources and expertise
MODULE IV: RESEARCH DESIGN
4.1 Meaning of Research Design
Research design is the specification of methods and procedures for acquiring the information needed. It
is the overall operational framework of the project that stipulates what information is to be collected, from
which sources, by what procedures.
Three Basic Tenets:
11. Convert research questions and hypotheses into operational variables that can be measured
12. Specify the process to complete the above task efficiently and economically
13. Specify 'control mechanisms' to ensure the effect of other (extraneous) variables is controlled
4.2 (i) Exploratory Research Design
• Flexible approach — mostly qualitative investigation
• Sample size is not strictly representative
• Uses unstructured interviews
• Carried out at the start of a less-explored topic to identify possible relations or assess potential
Purposes of an exploratory study:
○ Check whether a topic has enough potential for more in-depth structured research
○ Develop a comprehensive and focused research question
○ Decide whether conclusive research is required at all
○ Sharpen hypotheses and objectives for the follow-up study
○ Improve methodology and framework best suited to the objectives
4.3 (ii) Descriptive Research Design
Structured and formal in nature. Intended for:
• Providing a detailed sketch/profile of the population being studied
• Introducing a temporal component — how things change over time
• Exploring the simultaneous occurrence of phenomena and variables
Cross-Sectional Studies
• Carried out at a SINGLE moment in time — applicable for a specific period only
• Sample is investigated only for the time coordinate of the study
• Lack information on causality
• Multiple cross-sectional: Two or more samples collected at same or different time points
Longitudinal Studies
• Temporal in nature — follows the SAME respondents over a period of time
• Involves selection of a representative panel with repeated measurements at fixed intervals
• Must account for attrition — dropouts must be replaced with members of similar characteristics
• Also called time-series design or panel design
Challenge: Structured surveys may cause artificial behaviour. Agencies take precautions — ensure
members act normally; replace dropouts with similar members; rotate panel members after a period.
4.4 (iii) Causal Research — Experimental Designs
Necessary Conditions for Causal Inference:
14. Concomitant Variation: Cause X and effect Y must occur/vary together (strong association
required, but association alone does not prove causality)
15. Time Order: Cause X must occur BEFORE or simultaneously with effect Y
16. Absence of Other Causal Factors: All external (extraneous) factors must be controlled through
experimental design
Key Variables in Experiments
Variable Definition
Independent Variable Explanatory variable; researcher assesses its effect on the outcome.
(IV)
Dependent Variable Outcome variable; expressed as a function of independent variables.
(DV)
Test Units Entities on which treatment is applied (individuals, organisations,
geographic areas).
Extraneous Variables Variables besides the IV that may affect the DV; must be controlled.
Moderating Variable Affects the direction or strength of the IV-DV relationship. Ex: Age
(MV) moderates Drug-Recovery relationship.
Intervening Variable Temporal occurrence between IV and DV; a conceptual mechanism. Ex:
Job Satisfaction intervenes between Flexi-time Schedule and
Productivity.
Control Variable Affects the DV but is held constant so its effect can be ignored.
Symbols Used in Experimental Designs
• O or Ot = Observation at time t
• X = Treatment applied
• R = Random assignment of test units
• Oc = Control group observation
A. Pre-Experimental Designs
Design Description
One-Shot Case Study Single observation AFTER treatment only. No randomization. Very
weak internal validity.
One-Group Pre-test Post- Observations BEFORE and AFTER treatment in one group. Not
test randomized.
Static Group Experimental group (X then O1) vs Control group (just O2). No pre-
test, no randomization.
B. Quasi-Experimental Designs
Design Description
Time Series O1 O2 O3 O4 X O5 O6 O7 O8 — Multiple observations before and
after treatment. No randomization. Timing of treatment not within
researcher's control.
Multiple Time Series Same as time series but includes a parallel control group with no
treatment.
C. True Experimental Designs
Design Description
Pre-test Post-test Control Experimental: R O1 X O2; Control: R O3 O4. Treatment effect =
Group (O2-O1)-(O4-O3). RANDOMIZED. Controls for most threats to
internal validity.
Post-test Only Control Experimental: R X O1; Control: R O2. No pre-test needed. Used for
Group evaluating programme effectiveness. Treatment effect = O1-O2.
Solomon Four Group 4 groups (2 experimental, 2 control). Guarantees maximum internal
validity. Separates pure treatment effects (T) from interaction
effects (I) and extraneous effects (E).
D. Statistical Designs
Design Application
Completely Randomized Tests effect of ONE IV on DV. Assumes no extraneous variables
Design (CRD) differ across groups. Uses One-way ANOVA.
Randomized Block Design Separates influence of ONE extraneous variable. Groups test units
(RBD) into homogeneous blocks (e.g., store sizes) before applying
treatments.
Latin Square Design (LSD) Separates influence of TWO extraneous variables. Requires equal
number of categories for both extraneous variables and treatment
variable.
Factorial Design (FD) Measures effect of TWO or MORE IVs at various levels
simultaneously. Can measure interaction effects — when combined
effect differs from sum of individual effects.
4.5 (c) Internal and External Validity Issues
Type Definition Threats to Validity
Internal Validity Changes in DV can be correctly History, Maturation, Testing effect,
attributed to the IV — not to Instrumentation, Statistical regression,
extraneous factors. Selection bias, Attrition/mortality,
Interactive effects
External Validity Results can be generalised to the Population validity (sample
wider population or other settings. representativeness), Ecological validity
(artificial settings), Interaction effects
with testing
Methods to control extraneous variables: Randomisation (most powerful), Matching, Statistical control,
Design control.
MODULE V: TYPES OF DATA
5.1 Primary vs Secondary Data
Type Definition
Primary Data Collected by the individual researcher/team specifically for their own
research use. Original, first-hand data. More expensive and time-
consuming to collect.
Secondary Data Available through regular/periodic data collection by other sources. Not
topical or research-specific. Cheaper and faster to access.
5.2 (b) Advantages and Disadvantages of Secondary & Primary Data
Advantages of Secondary Data
• Resource Advantage: Saves time, energy and cost of data collection
• Accessibility: Structured and compiled data is easier to use for research purposes
• Accuracy and Stability: Collected by experienced organisations — usually accurate and comply
with validity/reliability checks
• Assessment: Supports primary data; allows comparison over two time points; helps estimate
applicability to larger population
Disadvantages of Secondary Data
• Applicability: Time lag between data collection and use; different units of measurement may
cause issues
• Accuracy: Data quality checks vital — discrepancies, reporting errors, systematic bias, omission
and displacement issues may exist
Advantages of Primary Data
• Very specific to the research problem and therefore directly useful
• Quality and accuracy of data is not in doubt — collected by the researcher
• May lead to discovery of additional data and information during collection
Disadvantages of Primary Data
• Numerous decisions required — how, when, what and why to collect
• Cost of collection is very high
• Time-consuming process
5.3 (c) Types of Secondary Data — Internal & External
Type Sub-types & Examples
Internal Secondary Data Data generated within the organisation: • Company records •
Employee records • Sales data • Financial records • Other internal
publications and reports
External Secondary Data Data generated outside the organisation: • Published data:
Government/Non-Government sources (e.g., census reports, RBI
reports, industry publications) • Electronic databases (online
databases, internet sources) • Syndicate data: Agencies that collect
organisation/product-category-specific data for sale to multiple
clients
5.4 Evaluation of Secondary Data (4 Checks)
FOUR CHECKS WHEN EVALUATING SECONDARY DATA
1. METHODOLOGY CHECK: How was the data collected?
• Sampling method used • Research methodology employed
• Analytical tools used • Interpretation and reporting approach
2. ACCURACY CHECK: How reliable is the source?
• Source of data — how reliable? • Any reporting issues?
• Is there any misrepresentation of data?
3. TOPICAL CHECK: Is the data current enough?
• What is the time period of the data?
• What is the time lag between the research study and the data collected?
4. COST-BENEFIT ANALYSIS: Is it worth using?
• Measure the benefits of using the data minus the costs associated with
procuring it from organisations that sell data for research purposes.
5.5 (d) Types of Primary Data — Qualitative & Quantitative
Dimension Qualitative Research Quantitative Research
Objective Explore, describe or understand Quantify, generalise, predict
phenomena phenomena
Design Exploratory/descriptive; loosely Structured; measurable variables;
structured tests hypotheses
Sampling Small sample; flexible lengthy Large representative samples;
procedure measurable error
Data Collection In-depth; interactive; verbal and Formatted; structured; stimulus-
non-verbal response type
Data Analysis Textual; usually non-statistical Various levels of statistical testing
Deliverables Explain findings, understand Conclusive; clear indications for
reasons action
5.6 (e) Qualitative Data Collection Methods
i. Observation Method
Involves viewing and recording individuals, groups, organisations or events in a scientific manner.
• Standardised and structured vs Non-standardised and unstructured
• By respondent consciousness: Disguised vs Undisguised observation
• By setting: Natural environment vs Simulated environment
• By observer: Human vs Mechanical (store scanners, cameras, psychogalvanometers,
oculometers, pupilometers, voice pitch meters, people meters)
• By participation: Participating vs Non-participating
ii. Depth Interview (Personal Interview Method)
One-to-one interaction between interviewer and interviewee. Used for problem definition, exploratory
research, and primary data collection.
Format Description
Unstructured No defined guidelines; begins casually; very high subjectivity; difficult to
generalise
Semi-structured Broad areas defined; questions/sequence left to interviewer; probing is
critical
Structured Highest reliability and validity; prescribed sequence; can be primary data
collection instrument
Interview categories: At-home interviews, Mall-intercept interviews (20-30 min, cost-effective), CAPI
(Computer-Assisted Personal Interviewing), Telephone interviews, CATI (Computer-Assisted Telephone
Interviewing).
iii. Delphi Technique
An iterative process for gathering expert opinion. Experts respond to questionnaires in multiple rounds; a
summary is fed back after each round until consensus is reached. Used for forecasting and complex
decision-making where direct interaction is not feasible.
iv. Focus Group Discussion (FGD)
Collects information from a representative group in a neutral setting, guided by a moderator. Essentially a
sociological technique — group dynamics influence individual responses.
• Size: 8-12 members (ideal)
• Nature: Homogenous in demographic/psychographic traits and product knowledge
• Acquaintance: Members should be strangers to each other
• Setting: Neutral, informal and comfortable environment
• Moderator Key Skills: Listening, observation, flexibility, empathy with objectivity, summary and
closure
Types: Two-way, Dual moderator, Fencing moderator, Friendship group, Mini-groups, Creativity groups,
Brand-obsessive groups, Online focus groups.
Advantages Disadvantages
Idea generation and synergy Group dynamics can create
conformity/pressure
Faster process advantage Less scientifically rigorous
Good reliability & validity Statistical analysis is difficult
Rich qualitative data Moderator/investigator bias possible
v. Projective Techniques
Involve indirect questioning. Respondents are given ambiguous stimuli and project their underlying
needs, emotions, beliefs and attitudes onto the object.
Technique Description
Association Techniques Present a stimulus; respondent responds with first thing that comes
to mind. Ex: Word Association, Rorschach Inkblot Test.
Completion Techniques Incomplete sentence/object to be completed. Ex: 'Old age is ___'
Construction Techniques Respondent creates a story, picture, or dialogue. Ex: Story
Construction, Cartoon Test.
Choice/Ordering Respondents sort pictures or statements into categories.
Techniques
Expressive Techniques Respondent expresses feelings of a protagonist, not their own. Ex:
Psychodrama, Role Playing, Object Personification.
5.7 (f) Quantitative Data Collection — Survey Method
Surveys use structured questionnaires to collect data from large samples. Key methods: personal
schedules, mail, fax, email, and web-based surveys. Computer-assisted methods (CAPI, CATI) allow
complex skip patterns and question randomisation for eliminating order bias.
MODULE VI: MEASUREMENT AND SCALING
6.1 Meanings
• Measurement: Assigning numbers or symbols to the characteristics of certain objects according
to rules that provide accurate descriptions.
• Scaling: An extension of measurement; involves creating a continuum on which measurements
on objects are located.
6.2 (b) Properties of Scales — Four Types of Measurement Scales
FOUR LEVELS — LOWEST TO HIGHEST (Remember: NOIR)
1. NOMINAL — Classification/labelling only. No order, no distance.
2. ORDINAL — Classification + Order/Ranking. No equal intervals.
3. INTERVAL — Classification + Order + Equal Intervals. No true zero.
4. RATIO — Classification + Order + Equal Intervals + True Zero (most powerful).
1. Nominal Scale
• Lowest level of measurement — numbers assigned purely for IDENTIFICATION
• Higher number does NOT imply superiority over lower number
• Example: Gender (Male=1, Female=2); Marital status (Married=1, Single=2)
• Mathematical operations: Only counting (frequency distribution)
• Statistical measures: Mode only; Chi-square test
2. Ordinal Scale
• Tells whether an object has MORE or LESS of a characteristic than another
• Example: Ranking restaurant attributes from 1 (most important) to 5 (least important)
• Assigned ranks CANNOT be added, subtracted, multiplied or divided
• Statistical measures: Median, percentiles, quartiles; Rank-order correlation, sign test
3. Interval Scale
• Differences between scores have MEANINGFUL interpretation — equal intervals
• NO true zero point — zero is arbitrary (e.g., 0°C does not mean 'no temperature')
• Mathematical form: Y = a + bX where a ≠ 0
• Example: Likert scale (Strongly Disagree=1 to Strongly Agree=5); Temperature in
Celsius/Fahrenheit
• All arithmetic operations can be performed on the intervals
• Statistical measures: Arithmetic mean, standard deviation, correlation coefficient, t-test, Z-test,
regression, factor analysis
4. Ratio Scale
• Highest level — has a TRUE ZERO point (zero = complete absence of the attribute)
• The RATIO of measurements has meaningful interpretation (e.g., 20kg is twice 10kg)
• Example: Number of shops, weight, height, income, age, temperature in Kelvin
• ALL mathematical and statistical operations can be performed
Comparison Table of All Four Scales
Property Nominal Ordinal Interval Ratio
Labeled/Categories Yes Yes Yes Yes
Meaningful Order No Yes Yes Yes
Measurable No No Yes Yes
Difference
True Zero Point No No No Yes
Statistics Available Mode, Median, Mean, SD, t-test All statistics
frequencies percentiles
Common Variable Examples
Variable Scale Type Reason
Age, height, weight, income, Ratio Has true zero; ratios meaningful
share price
Temperature in Interval Equal intervals; no true zero
Fahrenheit/Celsius
Gender, country of birth, zip Nominal Classification only
code
Shirt size, exam grade, Ordinal Ordered but unequal intervals
perceived speed
6.3 Classification of Scales
Single vs Multiple Item Scales
Type Description Example
Single Item Scale One item to measure a construct Do you have a television?
Yes/No
Multiple Item Scale Many items together form the construct Which brand of TV do you
being measured own?
Sony/Samsung/LG/Others
6.4 (c) Scaling Techniques
A. COMPARATIVE SCALING TECHNIQUES
In comparative scales, respondents use a standard frame of reference and compare one object with
another.
1. Paired Comparison
Respondent is presented with TWO objects at a time and selects one. All possible pairs are presented.
Process: Create proportion matrix → Assign 1 if column brand preferred (proportion > 0.5) → Sum
columns → Rank by totals to get ordinal order.
Example: If proportion A preferred over B = 0.60 (> 0.5), assign 1 in A-row, B-column.
2. Rank Order Scaling
Respondents presented with several objects simultaneously and asked to rank them. No two objects can
have the same rank. Example: Rank 6 food joints from 1 (most preferred) to 6 (least preferred).
3. Constant Sum Rating Scale
Respondents allocate a TOTAL of 100 points among various objects/brands. More points = more
preferred. All points must sum to 100. Example: Distribute 100 points among 6 restaurants based on
preference.
B. NON-COMPARATIVE SCALING TECHNIQUES
Respondents evaluate only one object at a time; no frame of reference is used.
1. Graphic Rating Scale (Continuous)
Respondent places a tick on a continuous graphical line from 'Least Preferred' (1) to 'Most Preferred' (7).
Can also use smiley face scales. Score = measured distance from the left end.
2. Itemized Rating Scales
Respondents provided with a scale having brief descriptions for each response category. Design
decisions include: number of categories (typically 5 or 7), odd/even number, balanced vs unbalanced,
forced vs non-forced, physical form.
3. Likert Scale
• Also called a summated scale — individual item scores can be summed for a total score
• Respondents indicate degree of agreement/disagreement: Strongly Disagree(1) to Strongly
Agree(5)
• Assumption: All items measure some aspect of a single common underlying factor
• Typically 25-30 items in a research study
• Example: 'The company makes quality products' — rate from Strongly Disagree to Strongly Agree
4. Semantic Differential Scale
• Widely used to COMPARE IMAGES of competing brands, companies or services
• Respondent rates each attribute on a 5 or 7-point scale bounded by BIPOLAR
ADJECTIVES/PHRASES
• Example: Makes quality products [_ _ _ _ _ _ _] Does not make quality products
• KEY DIFFERENCE from Likert: Uses bipolar adjectives instead of degree of agreement
statements
• Results can be plotted as a 'snake diagram' to visually compare two companies
5. Stapel Scale
• Measures DIRECTION and INTENSITY of an attitude simultaneously
• Single adjective/phrase is placed in the centre
• Scale ranges from +5 (most positive) above to -5 (most negative) below
• Example: Rate 'Quality of Food' from +5 to -5 for a restaurant
• No bipolar adjectives needed — only one descriptor per attribute
MODULE VII: QUESTIONNAIRE DESIGN
7.1 Meaning of Questionnaire
A questionnaire is a data collection instrument with a pre-designed set of questions following a particular
structure. It can collect information from a large sample in a short time period.
When NOT to use a Questionnaire: At the exploratory stage when still identifying information areas;
when the number of respondents is small and data required is mostly subjective/open-ended.
7.2 Types of Questionnaires
Based on two parameters: (1) Degree of Structure/Formalization and (2) Degree of Concealment.
Type Description & Use
Formalized & Most frequently used. Structured questions AND respondent knows
Unconcealed the purpose. Ex: Investment behaviour survey with fixed response
options.
Formalized & Concealed Structured questions but purpose is hidden. Used to reveal latent
causes of behaviour. Can be quantified statistically.
Non-formalized & Unstructured open-ended questions; respondent knows purpose.
Unconcealed High validity but difficult to quantify beyond frequencies.
Non-formalized & Unstructured + hidden purpose. Used for projective techniques.
Concealed Reveals subconscious motivations. High skill required for
interpretation.
7.3 Steps in Designing a Questionnaire (10 Steps)
Step 1: Determine What Information is Needed (Convert Research Objectives into Information
Areas)
• Identify specific research questions the study will address
• Convert to objectives; clearly define all variables under study
• Formulate hypotheses; specify the information needed for the study
Step 2: Method of Administration (Type of Questionnaire)
• Personal schedule (face-to-face) — best for complex/sensitive issues; highest sampling control
• Self-administered — mail, fax, email, or web-based; lowest sampling control for web
• Computer-assisted allows complex skip patterns (branching) and question randomisation
Step 3: Content of the Questionnaire
• Include only questions that contribute to answering the research problem
• Include neutral questions at the beginning to build rapport
• Use disguised questions to hide purpose or sponsorship if needed
• Determine whether single or multiple questions are needed for each construct
Step 4: Motivating the Respondent to Answer
• Design questions to involve and motivate respondents
• Use filter/qualifying questions to weed out uninformed respondents
• Provide response categories (e.g., semantic differential adjectives) to ease response
Step 5: Determining the Type of Questions
Open-Ended Questions:
• Respondent answers in own words; researcher suggests no alternatives
• Used at beginning (warm-up), as probing questions, or at the end for suggestions
• Advantage: High validity; respondents free to express any views
• Disadvantage: Expensive to code; dependent on articulation; prone to misinterpretation in self-
administered surveys
Closed-Ended Questions:
• (i) Dichotomous: Only two alternatives (Yes/No). Easiest to code. Nominal level. Problem: forced
choice.
• (ii) Multiple Choice: Multiple response alternatives (checklist). Reduces researcher bias; fast to
administer. Difficult to design exhaustively.
• (iii) Scaled Questions: Attitudinal scales such as Likert, semantic differential, Stapel. Easy to
administer and code.
Step 6: Wording of the Questions (Criteria of Questionnaire Design)
Criterion Explanation & Example
Avoid Leading Questions Don't hint at desired answer. Bad: 'Don't you agree X is good?' Good:
'What is your opinion about X?'
Avoid Loaded Questions Don't ask sensitive questions directly. Bad: 'Have you ever cheated?'
Good: 'Do most people cheat?'
Avoid Double-Barrelled Don't ask two things in one question. Bad: 'Do Nokia AND Samsung
Questions have wide variety?' Good: Ask separately for each brand.
Use Clear, Simple Avoid jargon; questions must be self-explanatory and have only one
Language interpretation.
Avoid Ambiguity Each question must have exactly one clear interpretation by all
respondents.
Step 7: Sequence & Layout of Questions
17. Instructions: Greet respondent; introduce researcher; explain purpose of the questionnaire
18. Opening questions: Non-threatening; get respondent into the right frame of mind
19. Study questions: Main questions using funnel approach (general to specific questions)
20. Classification information: Socio-economic and demographic data (name, address, income, etc.)
21. Acknowledgement: Thank respondent for cooperation
Funnel approach: Like a funnel — initial questions are broad; as you proceed, questions become more
specific and restrictive.
Branching questions: Separate sets of questions for different possible answers to a filter question. Must
cover all possibilities.
Step 8: Physical Characteristics of the Questionnaire
• Good quality paper; booklet format if questions are many
• Uniform font style and spacing; each question and its options on the same page
• Don't crowd questions; maintain adequate line spacing
• Include response instructions per question where needed
Step 9: Pre-Test (Pilot Testing) of the Questionnaire
• Administer instrument on a small group from the study population
• Record all experiences including time taken to complete
• If a question gets no answers — rephrase it
• Always done face-to-face to observe verbal and non-verbal responses
• Can also be vetted by academic or industry experts
Step 10: Revise and Prepare Final Questionnaire
• Critically evaluate the rough draft for comprehensibility, lucidness, and organisation
• Incorporate feedback from pilot test — rephrase unclear questions, remove irrelevant ones
• Ensure all information needs are adequately addressed by the revised questionnaire
• Finalise the questionnaire and administer according to the sampling plan
MODULE VIII: SAMPLING
8.1 (a) Meaning of Sampling
Sampling is a tool which enables us to draw conclusions about the characteristics of the population after
studying only a small part of it. It is the process of selecting a sample or proportion of elements from a
population using a specific method.
Key Terms:
Term Definition
Population/Universe The aggregate of all units; the complete group about which the
researcher wants to make inferences.
Sample A finite subset of the population, selected to investigate its properties.
Sampling Frame A convenient, complete and up-to-date list of all units in the population
from which the sample is drawn.
Parameter Statistics (mean, SD, proportion) calculated from population values.
Denoted by Greek letters (μ, σ, P).
Statistic Statistics calculated from sample values. Denoted by English letters (x̄ ,
s, p).
Sampling Distribution All possible values of a statistic and their respective probabilities for a
given sample size.
Census Examination of each and every element of the population.
8.1 Advantages of Sampling over Census
ADVANTAGES OF SAMPLING
1. SAVES TIME AND COST: Reduces cost in monetary terms and staffing; reduces time to
collect and process data.
2. FEASIBILITY: Essential when testing is destructive (e.g., testing life span of light bulbs —
cannot test all bulbs).
3. FASTER DECISIONS: Decision-makers may not have time to wait for complete
enumeration; a sample can be done quickly.
4. MORE RELIABLE RESULTS: By studying a sample, fatigue is reduced and fewer errors
occur during data collection,
especially when a large number of elements are involved.
5. GREATER ACCURACY: Smaller scale of operation allows for more careful attention to
each unit.
8.1 Disadvantages of Sampling
DISADVANTAGES OF SAMPLING
1. UNRELIABLE SUB-POPULATION DATA: Data on sub-populations (e.g., a particular
ethnic group) may be too unreliable to be useful.
2. GEOGRAPHIC LIMITATIONS: Data for small geographical areas may also be too
unreliable; detailed cross-tabulation may not be practical.
3. SAMPLING ERROR: Estimates are subject to sampling error which arises as estimates
are calculated from a part (sample) of the population.
4. COMMUNICATION DIFFICULTY: May have difficulty communicating the accuracy of the
estimates to users who are not statistically trained.
When to use Census vs Sampling:
• Census appropriate: Small population OR lot of heterogeneity in variables of interest (e.g., top
management of a bank)
• Sampling appropriate: Large population or population difficult to access; field of investigation is
large
8.2 (b) Sampling Methods / Techniques
TWO BROAD CATEGORIES
PROBABILITY SAMPLING: Every element has a KNOWN, NON-ZERO chance of selection.
Used in CONCLUSIVE research.
→ Simple Random (SRSWR/SRSWOR), Systematic, Stratified Random, Cluster, Multi-
stage
NON-PROBABILITY SAMPLING: Elements do NOT have a known chance of selection. Used
in EXPLORATORY research.
→ Convenience, Judgmental/Purposive, Quota, Snowball
PROBABILITY SAMPLING METHODS
1. Simple Random Sampling (SRS)
Every element has an EQUAL and KNOWN probability of being selected. Two types:
• SRSWR (With Replacement): Chosen slip is PUT BACK. Probability of each unit = 1/N every
time. Same element can be selected more than once.
• SRSWOR (Without Replacement): Chosen slip is NOT put back. Probability of 1st unit = 1/N; 2nd
unit = 1/(N-1); nth unit = 1/(N-(n-1)).
Selection methods: Lottery method (physical draw), Random number tables, Computer-generated
random numbers.
Limitation: May not give a representative sample for large heterogeneous populations (e.g., if population
is 10,000 with 5,000 low-income, 3,500 middle, 1,500 high — SRS may select only low-income
households).
2. Systematic Sampling
Used when a complete ordered list of the population is available. Also called mixed sampling — only the
first element is randomly selected; the rest are systematically selected.
• Step 1: Calculate sampling interval K = N/n (N = population size, n = sample size). Round to
nearest integer.
• Step 2: Select a random start C from 1 to K.
• Step 3: Select elements at positions C, C+K, C+2K, C+3K, ... until sample of size n is obtained.
Example: N=100 shops, n=5. K=100/5=20. If C=8 randomly, select shops: 8, 28, 48, 68, 88.
Advantage: Very popular — requires only ONE random number. Ensures spread across population.
Takes care of SRS limitation.
3. Stratified Random Sampling
Most appropriate when population is HETEROGENEOUS. Divide population into strata such that: (i)
units WITHIN each stratum are HOMOGENEOUS; (ii) units BETWEEN strata are HETEROGENEOUS;
(iii) strata do not overlap.
After strata formation, a simple random sample is drawn from each stratum. Regarded as the MOST
EFFICIENT sampling system; ensures greater accuracy.
• PROPORTIONATE ALLOCATION: n1 = n × (N1/N). Sample from each stratum proportional to
stratum size in population.
• DISPROPORTIONATE ALLOCATION: More important/variable strata get relatively larger
samples. Ex: Bank oversamples big account holders (45 from strata 1, 40 from strata 2, 15 from
strata 3 out of 10,000 customers).
Stratification usually done on demographic variables: age, income, education, gender.
4. Cluster Sampling
Population divided into CLUSTERS (naturally occurring groups). Key features:
• Within each cluster: units are HETEROGENEOUS (opposite of stratified)
• Between clusters: units are HOMOGENEOUS
• Each element belongs to one and only one cluster
Random sample of 2-3 clusters chosen; ALL elements within selected clusters are studied. Useful when
population is widely geographically dispersed and SRS is impractical. Cost-effective for large dispersed
populations.
Example: Estimate money spent on entertainment — divide city into blocks (clusters), randomly select 2-
3 blocks, enumerate all households within selected blocks.
5. Multi-Stage Sampling
Sampling in multiple stages using different methods at each stage. Ex: Stage 1 = select states randomly;
Stage 2 = select districts within chosen states; Stage 3 = select households within chosen districts.
NON-PROBABILITY SAMPLING METHODS
i. Convenience Sampling
Also called accidental sampling. Units selected based on ease of availability — whoever happens to be
available at the survey spot. Ex: neighbours, friends, family members, colleagues, passers-by.
• Used to obtain information quickly and inexpensively
• Used in PILOT STUDIES and pre-testing of questionnaires
• Used in EXPLORATORY RESEARCH only — NOT suitable for conclusive research
• Sampling error cannot be estimated — results not generalisable
ii. Judgmental / Purposive Sampling
Experts choose what they believe to be the best sample. Choice of units depends exclusively on the
investigator's judgment. Success depends on excellence in judgment and knowledge about the
population.
• Most suitable when a small sample of a few specialised units is needed
• Most common application: Business-to-Business marketing research
• Disadvantage: Results affected by personal prejudices and bias of investigator; results not
comparable to other samples
Example: Studying performance of sales staff — VP Sales identifies best representatives of each
performance grade category.
iii. Quota Sampling
Similar to stratified sampling but non-random. Population divided into subgroups; a quota is set for each
subgroup. Interviewers fill quotas by convenience. The selection within quotas is left to the interviewer's
discretion — introducing potential bias.
iv. Snowball Sampling
Used when it is difficult to identify members of the desired population. Each respondent after being
interviewed acts as a reference to other respondents having the same characteristics.
• Used when population is unknown and rare (e.g., deep-sea divers, families with triplets, walking
stick users, doctors specialising in a rare ailment)
• Popular business study method for rare/hidden populations
• Problem: Difficult to make the initial contact; may be difficult to get a representative sample
Example: Studying satisfaction levels of members of an elite country club — members refer other
members.
8.3 (c) Sample Size Calculation
The size of a sample depends upon: characteristics of the population, type of information required, cost
involved.
Key factors determining sample size:
• (a) Variability of population: Higher variability (larger σ) → larger sample needed
• (b) Confidence attached to estimate: Higher confidence (higher Z) → larger sample
• (c) Allowable error/margin of error: Smaller acceptable error (smaller e) → larger sample
Important: The size of the POPULATION does not influence the size of the sample (for large
populations).
SAMPLE SIZE FORMULAS
FOR ESTIMATING POPULATION MEAN:
n = Z²σ² / e²
Where: Z = standard normal value (1.96 for 95% confidence; 2.576 for 99%)
σ = population standard deviation
e = acceptable margin of error (x̄ - μ)
FOR ESTIMATING POPULATION PROPORTION (p known):
n = Z²pq / e²
Where: p = population proportion; q = 1 - p; e = margin of error
FOR ESTIMATING POPULATION PROPORTION (p unknown):
Use maximum value of pq = 1/4 (when p = q = 1/2)
n = Z² / 4e²
Interpretation: Sample size is directly proportional to variability (σ) and confidence (Z),
and inversely proportional to allowable error (e).
MODULE IX: DATA PROCESSING AND ANALYSIS
9.1 (a) Data Processing: Meaning
Data processing refers to the systematic sequence of operations performed on collected data to convert
it into a meaningful, analysable form. It involves organising and preparing the raw data for statistical
analysis.
9.2 (b) Steps in Processing Data
5-STEP DATA PREPARATION PROCESS
1. DATA EDITING → 2. DATA CODING → 3. DATA CLASSIFICATION → 4. DATA
TABULATION → 5. GRAPHICAL PRESENTATION
Step 1: Data Editing
Purpose: Ensure data is complete, accurate, legible, and consistently formatted.
• Field Editing: Done by field investigators at end of every field day. Checks for inconsistencies,
non-response, illegible responses, incomplete questionnaires.
• Centralized In-House Editing: Done at the researcher's end. Methods: Backtracking (returning
to respondents for missing data), Allocating missing values (mean substitution, etc.), Plug value
(assigning a predetermined value), Discarding unsatisfactory responses.
Conditions leading to editing: Qualifying/skip logic was overlooked; respondent used same response for
all questions; incomplete form; form filled by non-representative person; forms not proportional to
sampling plan.
Step 2: Data Coding
The process of identifying and denoting a numeral to the responses given by the respondent.
• Pre-Coding (Closed-ended): Dichotomous: Yes=1, No=0. Ranking: numbers 1 to n. Checklists:
separate column per item (Yes=1, No=0). Scaled questions: SA=5, A=4, N=3, D=2, SD=1.
• Post-Coding (Open-ended): After data collection, responses are read and categories created.
Each category treated as a separate variable with Yes/No coding.
Code Book must be: Appropriate to the research objective, Comprehensive, Mutually exclusive, Single
variable entry.
Key terms: Field (space reserved for one variable), Record (all data from one respondent), File
(collection of all records), Data Matrix (spreadsheet format).
Step 3: Data Classification
• Classification by attributes: Mostly categorical (e.g., gender, occupation, marital status)
• Classification by class intervals: Exclusive (10-20, 20-30) or Inclusive (10-19, 20-29)
Step 4: Data Tabulation
Arrangement of data into an orderly arrangement of rows and columns to subject it to statistical analysis.
• Simple tabulation: One variable at a time (frequency distribution table showing count and
percentage)
• Cross-tabulation: Two or more variables simultaneously — reveals relationships between
variables
Step 5: Graphical Presentation
Visual representation of data to reveal patterns, trends and distributions at a glance.
Tool Description & When to Use
Frequency Distribution Shows count and percentage for each category. First step in data
Table exploration.
Pie Charts Circular chart showing proportional share of each category. Best for
nominal data with few categories.
Bar Charts Vertical or horizontal bars representing frequency/count. Best for
comparing categories.
Histograms Bars for continuous data showing frequency distribution; often
includes normal curve overlay. Shows shape of distribution.
Stem and Leaf Diagrams Shows distribution of continuous data while retaining actual values.
Useful for small datasets.
9.3 (c) Revision — Central Tendency, Dispersion & Regression
Measures of Central Tendency
Measure Formula/Description When to Use
Mean (Arithmetic Sum of all values / n. Sensitive to Interval/ratio data with normal
Average) outliers. distribution.
Median Middle value when data is arranged Skewed data or ordinal scale.
in order. Not affected by outliers.
Mode Most frequently occurring value. Nominal data; can be bimodal or
multimodal.
Measures of Dispersion
Measure Formula/Description
Range Maximum value − Minimum value. Simple but sensitive to outliers.
Standard Deviation (S or σ) S = √[Σ(xi - x̄ )² / (n-1)]. Average distance from the mean. Most
commonly used measure of dispersion.
Variance S² = Σ(xi - x̄ )² / (n-1). Square of standard deviation. Used in many
statistical tests.
OLS Regression (Ordinary Least Squares)
Finds the best-fitting straight line through data points: Y = a + bX
• Y = Dependent variable (outcome being predicted)
• X = Independent variable (predictor)
• a = Y-intercept (value of Y when X = 0)
• b = Slope (change in Y for each one-unit change in X)
The line minimises the sum of squared residuals (differences between observed and predicted Y values).
Used to understand and predict the relationship between two continuous variables.
9.4 Statistical Software Packages
• MS Excel — basic statistical analysis and charts
• MINITAB — statistical quality control and analysis
• SAS (System for Statistical Analysis) — comprehensive statistical software
• SPSS (Statistical Software for Social Sciences) — most widely used in social science research
MODULE X: HYPOTHESIS TESTING
10.1 (a) Hypothesis: Meaning
A hypothesis is a specific, testable prediction about the relationship between two or more variables.
Hypothesis testing is a statistical process of either rejecting or retaining a claim or belief related to a
business context, product, service, or process. It is a scientific method to check whether a
claim/conclusion is true or false.
The objective is to either REJECT or RETAIN a null hypothesis. Hypothesis is an integral part of
predictive analytics techniques such as multiple linear regression and logistic regression.
10.2 (b) Types of Hypothesis
Type Description Language Used
Null Hypothesis States there is NO relationship or NO Uses words like 'no relationship',
(H₀) difference between variables. The 'no difference', 'equal to'.
equality sign is ALWAYS part of H₀. We
START by assuming H₀ is true.
Alternative Complement of H₀. States that a Uses words like 'relationship
Hypothesis (H₁ or relationship EXISTS or a difference exists', 'greater than', 'less than',
Hₐ) EXISTS. What the researcher typically 'different from'.
believes.
10.3 (c) Types of Errors
Decision: Reject H₀ Decision: Fail to Reject H₀
H₀ is TRUE (person is TYPE I ERROR (α) — False CORRECT DECISION
innocent) Positive. Rejecting H₀ when it is
actually true.
H₀ is FALSE (person CORRECT DECISION (Power = TYPE II ERROR (β) — False
is guilty) 1-β) Negative. Failing to reject H₀
when it is actually false.
• Type I Error (α): Conditional probability of rejecting H₀ when it is true. Set by choosing
significance level. Also called 'False Positive'.
• Type II Error (β): Conditional probability of retaining H₀ when it is false. Also called 'False
Negative'.
• Power of Test (1-β): Probability of correctly rejecting H₀ when it is false. We want HIGH power.
10.4 (d) Steps Involved in Hypothesis Testing
22. Describe the hypothesis in words using a population parameter (mean, proportion, SD)
23. Define null and alternative hypotheses (H₀ and H₁) based on the claim
24. Identify the appropriate test statistic (based on sampling distribution: Z, t, F, χ²)
25. Decide criteria for rejection — choose significance level α (typically 0.01, 0.05, or 0.10). Level of
significance is the probability of Type I error.
26. Find the calculated value — compute the test statistic from the data using the formula
27. Find the tabulated value — get the critical value from the statistical table using α and degrees of
freedom
28. Make decision: If calculated value > tabulated (critical) value → REJECT H₀. If calculated value ≤
tabulated value → FAIL TO REJECT (Accept) H₀.
10.5 (e) Types of Hypothesis Tests
One-Tailed vs Two-Tailed Tests
Test Type When to Use H₁ Form Rejection Region
Right-Tailed Claim is 'greater than' or H₁: μ > k Right tail only
'more than'
Left-Tailed Claim is 'less than' or H₁: μ < k Left tail only
'fewer than'
Two-Tailed Claim is 'different from' or H₁: μ ≠ k Both tails (α/2 each)
'not equal'
Summary: Which Test to Use
Situation Test
Population mean, σ KNOWN, n ≥ 30 One-sample Z-test
Population mean, σ UNKNOWN or n < 30 One-sample t-test
Two population means, σ known Two-sample Z-test
Two population means, σ unknown, equal variance Two-sample t-test (pooled)
Two population means, σ unknown, unequal Welch's t-test
variance
Same subjects measured before and after (paired) Paired t-test
Population proportion (large sample) Z-test for proportion
Two proportions compared (large samples) Two-sample Z-test for proportions
Equality of two population variances F-test
Observed distribution vs expected (categorical) Chi-square Goodness of Fit
Independence of two categorical variables Chi-square Test of Independence
(Association)
1. One-Sample Z-Test (Parametric — Continuous Variable)
Z-TEST: Used when σ is KNOWN and/or n ≥ 30
Formula: Z = (x̄ - μ) / (σ/√n)
Where: x̄ = sample mean, μ = hypothesised mean, σ = population SD, n = sample size
Critical Values: α=0.05 two-tailed → ±1.96 | right-tailed → +1.645 | left-tailed → -1.645
α=0.01 two-tailed → ±2.576 | right-tailed → +2.326 | left-tailed → -2.326
Example: Company claims avg weight = 500g. Sample of 100: x̄ =498g, σ=5g. H₀: μ=500, H₁: μ≠500.
Z=(498-500)/(5/√100)=-4. |Z|=4 > 1.96 → REJECT H₀.
2. One-Sample t-Test (Parametric — Continuous Variable)
t-TEST: Used when σ is UNKNOWN (estimate S from sample)
Formula: t = (x̄ - μ) / (S/√n)
Where: S = sample SD = √[Σ(xi - x̄ )² / (n-1)]
Degrees of freedom: df = n - 1
Critical t-value from t-distribution table using df and α
DECISION: If |t-calculated| > t-critical → REJECT H₀
Key Note: Use Z-test if population σ is known; use t-test if σ is unknown (must estimate from sample
data).
3. Two-Sample Z-Test (Independent Samples — σ Known)
Two-Sample Z-Test Formula
Z = [(x̄ ₁ - x̄ ₂) - (μ₁ - μ₂)] / √(σ₁²/n₁ + σ₂²/n₂)
Use when comparing two independent population means and BOTH σ values are known
4. Two-Sample t-Test (Independent Samples — σ Unknown)
Case 1: Equal Variances (Pooled t-Test)
Pooled variance: Sp² = [(n₁-1)S₁² + (n₂-1)S₂²] / (n₁+n₂-2)
t = [(x̄ ₁ - x̄ ₂) - (μ₁ - μ₂)] / [Sp × √(1/n₁ + 1/n₂)]
df = n₁ + n₂ - 2
Case 2: Unequal Variances (Welch's t-Test)
t = [(x̄ ₁ - x̄ ₂) - (μ₁ - μ₂)] / √(S₁²/n₁ + S₂²/n₂)
df = complex formula (round down to nearest integer)
Use when population SDs are unknown and NOT assumed equal
5. Paired Sample t-Test
Used when the SAME subjects are measured TWICE (before and after an intervention). Examples:
weight before/after yoga; cholesterol before/after medication; alcohol before/after breakup.
Paired t-Test Formula
t = (d̄ - D) / (Sd/√n)
Where: d̄ = mean of differences (x₂ - x₁)
D = hypothesised mean difference (usually 0)
Sd = standard deviation of differences
df = n - 1
6. Z-Test for Proportion
Z-Test for Proportion
One sample: Z = (p̂ - p) / √(p(1-p)/n)
Two sample: Z = (p̂ ₁ - p̂ ₂) / √[p̄ (1-p̄ )(1/n₁ + 1/n₂)]
Where p̄ = pooled proportion = (x₁ + x₂) / (n₁ + n₂)
Used for large samples where sampling distribution of proportion follows normal distribution
7. F-Test — Testing Equality of Population Variances
F-Test Formula (Right-Tailed)
F = S₁² / S₂² (Always place LARGER variance in numerator)
H₀: σ₁² = σ₂² (variances are equal)
H₁: σ₁² > σ₂² (one variance is larger)
df₁ = n₁ - 1 (numerator degrees of freedom)
df₂ = n₂ - 1 (denominator degrees of freedom)
Decision: If F-calculated < F-critical → RETAIN H₀ (variances are equal)
Example: S₁²=0.28 (n₁=10), S₂²=0.14 (n₂=11). F=0.28/0.14=2; df₁=9, df₂=10, α=0.01 → F-critical=5.26.
Since 2 < 5.26 → Retain H₀ — variances are equal.
8. Chi-Square Goodness of Fit Test (Non-Parametric — Categorical)
Tests whether observed distribution MATCHES an expected/theoretical distribution. Used when total is
NOT directly given.
Chi-Square Goodness of Fit
χ² = Σ (O - E)² / E
Where: O = Observed frequency, E = Expected frequency
E = Total / Number of categories (for uniform/equal distribution assumption)
df = n - 1 (n = number of categories)
H₀: Given distribution is correct (fits the expected pattern)
Decision: If χ²-calculated < χ²-critical → RETAIN H₀
Example: Dice tossed 120 times. Face 1=20, 2=22, 3=17, 4=18, 5=19, 6=24. E=20 each. χ²=1.7. df=5,
χ²-critical=11.07 (α=0.05). Since 1.7 < 11.07 → Retain H₀ — Dice is unbiased.
9. Chi-Square Test of Association / Independence (Non-Parametric — Categorical)
Tests whether two categorical variables are INDEPENDENT of each other. Used when totals ARE given
(contingency table).
Chi-Square Test of Independence
Expected frequency: E_ij = (Row Total × Column Total) / Grand Total
χ² = Σ (O_ij - E_ij)² / E_ij
df = (r-1)(c-1) where r = rows, c = columns
H₀: Variables are INDEPENDENT (not related)
H₁: Variables are DEPENDENT (related)
Decision: If χ²-calculated > χ²-critical → REJECT H₀ (variables are dependent/related)
Example: Testing if HT is independent of smoking. χ²=14.43 (calculated); df=(2-1)(3-1)=2; χ²-
critical=5.991 (α=0.05). Since 14.43 > 5.991 → Reject H₀ — HT and smoking are NOT independent.
10.6 (f) Confidence Intervals for Tests
CONFIDENCE INTERVAL FORMULAS
For mean (σ known): CI = x̄ ± Z(α/2) × (σ/√n)
For mean (σ unknown): CI = x̄ ± t(α/2, df) × (S/√n)
For proportion: CI = p̂ ± Z(α/2) × √(p̂ (1-p̂ )/n)
Common Z-values: 90% CI → Z = 1.645 | 95% CI → Z = 1.96 | 99% CI → Z = 2.576
Interpretation: A 95% CI means we are 95% confident the true population parameter lies
within this interval.
MODULE XI: REPORT WRITING
11.1 Meaning & Importance of a Research Report
A report is a formal document written for a variety of purposes. Reports communicate research findings
and are considered legal documents in the workplace — they must be objective, precise, accurate and
unambiguous.
Importance:
29. Tangible product of the research project — serves as a historical record and documentary
evidence
30. Basis for management decisions — value influenced by quality of written report
31. Many decision makers evaluate the entire research project based solely on the report quality
32. Management's decision to commission future research is influenced by perceived usefulness of
the report
11.2 Functions of Research Reports
• Dissemination of information — adds to knowledge of people interested in the research area
• Facilitate decision making — basis for rules, regulations, and management decisions
• Validate findings — allows readers to understand, validate, and replicate generalisations
• Provide guidance for future research — suggests scope for future research based on limitations
of current study
11.3 Essentials of a Good Report
Quality Description
Takes Readers into Written for specific readers; considers technical sophistication; avoids
Account jargon; defines necessary technical terms in appendix.
Easy to Follow Logical structure; headings and subheadings; short, clear sentences;
well-constructed paragraphs.
Presentable & Quality paper and binding; good typography; appropriate use of white
Professional space and variation in type size.
Objective Accurately presents design, results and conclusions without aligning to
management expectations.
Highlights with Visual aids (tables, graphs, pictures, maps) for key information.
Tables/Graphs
Brief and to the Point Succinct and concise; omit unnecessary information without sacrificing
completeness.
11.4 (b) Report Format — Content of Report
A. Prefatory Items
33. Letter of Transmittal: Covering letter with purpose, scope, authorisation, and limitations of the
study
34. Title Page: Name of report; names of authors; for whom prepared; nature; date. Title must
include: variables studied, type of relationship examined, population studied.
35. Authorisation Letter: Required especially for government and NGO reports; shows who
sponsored the research
36. Abstract/Synopsis/Executive Summary: Abbreviated form of entire report (~2 pages); includes
aim, procedures, main findings, conclusions/recommendations; written AFTER rest of report is
finished; contains NO new information
37. Table of Contents: Page numbers of each section and appendices; does not include title page or
abstract
B. Main Body of the Report
1. Introduction
• Problem Statement — title/topic of research
• Objectives — purpose of the report
• Scope — what is included and excluded
• Background to study — history, major players, previous research that led to the study
2. Research Methodology
• Sampling Design — target population and sampling method used
• Research Design — rationale for design choice; strengths and weaknesses
• Data Collection Methods — how data was collected; why methods were chosen; merits and
limitations
3. Analysis of Data
• Graphs, charts, tables, maps — each with title, figure/table number, and thorough labels
• Each dataset systematically displayed and analysed with explanatory paragraphs
4. Conclusions
• Drawn directly from the analysis section. At least one conclusion per analytical section.
5. Recommendations
• Suggestions for further action based on conclusions. Numbered sequentially.
6. References / Bibliography
• Publication details of all sources consulted; placed after Conclusions/Recommendations section
7. Appendices
• Questionnaires, original documents, mathematical formulations — items that support the report
but cannot be included in the main body
8. Glossary (if required)
• Definitions of technical terms or jargon used in the report; placed after the table of contents
11.5 Steps in Report Writing
38. Arrangement of Subject Matter: Logical development (mental connections) or chronological
development (time sequence)
39. Creating an Outline: Framework for the written work; reminder of key points to cover and stress
40. Writing the Rough Draft: Add content to sections; use word processing for spell/grammar checks
41. Preparation of the Final Bibliography: List of all relevant and consulted works
42. Writing the Final Proof: Critical evaluation of rough draft for comprehensibility, lucidness, and
organisation; incorporate improvements
11.6 (a) Types of Reports
Technical Report
• Written for an audience of RESEARCHERS
• Includes full documentation: data sources, research procedures, sampling design, instruments,
analysis methods
• Major emphasis: research methodology, assumptions, detailed presentation with supporting data
Outline: Executive Summary → Nature of Study → Methods/Techniques → Data → Analysis &
Findings → Conclusions → Bibliography → Technical Appendices → Charts/Graphs
Management Report
• Written for NON-TECHNICAL clients — more interested in results than methodology
• Sections in INVERTED ORDER — conclusions and recommendations come FIRST after
introduction
• Makes generous use of visual aids (graphics, charts, pictures)
• Emphasises practical aspects; uses simple, lucid language free of jargon
Outline: Summary of Results → Nature of Study → Method Employed (brief) → Results (with
visuals) → Conclusions & Findings (in simple language)
QUICK REFERENCE — ALL FORMULAS & MEMORY AIDS
All Key Formulas
Formula/Concept Expression
Z-test (one sample) Z = (x̄ - μ) / (σ/√n)
t-test (one sample) t = (x̄ - μ) / (S/√n), df = n-1
Sample SD S = √[Σ(xi - x̄ )² / (n-1)]
Paired t-test t = d̄ / (Sd/√n), df = n-1
Two-sample Z-test Z = [(x̄ ₁-x̄ ₂)-(μ₁-μ₂)] / √(σ₁²/n₁+σ₂²/n₂)
Pooled variance Sp² = [(n₁-1)S₁² + (n₂-1)S₂²] / (n₁+n₂-2)
Z-test for proportion Z = (p̂ - p) / √(p(1-p)/n)
Two-sample proportion Z = (p̂ ₁-p̂ ₂) / √[p̄ (1-p̄ )(1/n₁+1/n₂)]
F-test F = S₁²/S₂² (S₁² ≥ S₂²)
Chi-square χ² = Σ(O-E)²/E
Expected freq. (independence) E_ij = (Row Total × Col Total) / Grand Total
Sample size (mean) n = Z²σ² / e²
Sample size (proportion) n = Z²pq / e² OR n = Z²/4e² (p unknown)
Systematic sampling interval K = N/n (round to integer)
Stratified proportionate n₁ = n × (N₁/N)
95% Confidence Interval (mean, σ x̄ ± 1.96 × (σ/√n)
known)
Memory Aids
SCALES: Remember NOIR
NOMINAL: Names/categories only. (Gender, Country, Zip code) → Mode only
ORDINAL: Order/Rank. (Rankings, grades, shirt size) → Median, percentiles
INTERVAL: Equal gaps, NO true zero. (Temp °C, Likert scale) → Mean, SD, t-test
RATIO: Equal gaps + TRUE zero. (Age, height, income, weight) → All statistics
GOOD RESEARCH: Remember CR-SVEC
C — Controlled | R — Rigorous | S — Systematic
V — Valid & Verifiable | E — Empirical | C — Critical
RESEARCH PROBLEM SOURCES: 4 Ps
People | Problems | Programmes | Phenomena
TEST SELECTION SHORTCUT
σ KNOWN → Z-test | σ UNKNOWN → t-test | VARIANCES → F-test | CATEGORICAL →
Chi-square
SAME SUBJECTS BEFORE/AFTER → Paired t-test
REJECT H₀ if: calculated > critical (or p-value < α)
— END OF NOTES — Best of Luck in Your Examination! —