(PGDT 423): Educational Assessment and
Evaluation
1
Kashasho, Addisalem
Oldisha (MSc. & MA)
addimen2014@[Link]
Department of
psychology
Arba Minch University
August 17, 2024
Chapter-1
INTRODUCTION
2
Definition of Basic Terms
3
Quiz: - It is a short and informal test.
It is given at class time just at the
beginning, middle or at the end of
the lesson.
Test: - It is a series of questions with
different item types.
It is given formally while a course is on
progress.
The purpose is to;
Assess learning progress
Identify if there is learning
Cont’d…
4
Examination: - It is more formal and organized.
The number of items included is relatively
large.
It covers a large area of contents.
It is given at the end of a course or semester.
Its main purpose is :
To assign grade.
Measurement:-refers to any quantitative
description of learners’ performance.
It is usually expressed as a numerical value.
For example: a pupil solving 30 out 45
Physics problems correctly.
Cont’d…
5
Assessment: refers to the process of
collecting, analyzing, interpreting
and synthesizing any information about
students to support in decision-making.
Evaluation: - is a systematic process of
determining the extent to which instructional
objectives are achieved by students.
Moreover, it is the process of using
information to judge the goodness,
worthiness or quality of students’
achievement, teaching and educational
programs.
Cont’d…
6
In broader sense evaluation involves
quantitative (measurement), qualitative
(non-measurement) or both, and value
judgment.
The following simple mathematical arrangement
shows the relationship between measurement
and evaluation.
Evaluation = Quantitative description of
students’ behavior (measurement) +
qualitative description of students’
behavior (non-measurement) + value
judgment
Practice and purpose of
assessment
7
I. Assessment of Learning
This kind of assessment is usually
summative in nature, which is done at
the end of a learning task.
It provide evidence for teachers to make
decisions/judgments about students’
achievement against set goals and
standards.
It provide evidence of students’
achievement to parents, administrators,
educators and students themselves.
II. Assessment for Learning
8
Occurs while teaching and learning is on
progress, rather than at the end.
Teachers use assessment evidences to:
Monitor students learning progress and;
Inform their teaching.
Provide diagnostic information to teachers
about students’ prior knowledge and
formative information about the effects of
their instruction on student learning.
III. Assessment as learning
Makes assessment part of, not separate from,
the instructional process.
Cont’d…
9
This involves students in their own:
Continuous self-assessment and is designed to help
students become more self-directed learners.
Self-assessment
Assessment as learning also takes the form of peer
assessment, with peer interaction and feedback.
For teaching assessment provides information
about :
The attainment of objectives
The effectiveness of teaching methods
and
Learning materials
Purposes of Assessment
10
1. To inform and guide teaching and
learning.
It provides teachers with information about
what students know and can do.
In addition to helping teachers formulate the
next teaching steps, a good classroom
assessment plan provides a road map for
students.
Evaluation procedures aid the teacher by:
Providing knowledge concerning the
students' entry behaviors;
Helping to set, refine, and clarify realistic
goals for each student;
Cont’d…
11
Helping to evaluate the degree to which
the objectives have been achieved; and
Helping to determine, evaluate, and refine
the instructional techniques
[Link] help students set learning goals.
Assessment and evaluation aid the student by:
Communicating the teacher's goals,
Increasing motivation,
Encouraging good study habits, and
Providing feedback that identifies strengths
and weaknesses.
[Link] assign report card grades.
12
Grade reports provide for parents,
employers, schools, and other
stakeholders including the government,
post-secondary institutions and employers
with summary information about student
learning.
4. To motivate students.
Research has shown that students will be
confident and motivated when they experience
progress and achievement, rather than the
failure and defeat associated with being
compared to more peers that are
successful.
General Principles of Assessment
and Evaluation
13
Clearly specifying what is to be assessed has
priority in the assessment process.
An assessment procedure should be selected
because of its relevance to the characteristics
or performance to be measured.
Comprehensive assessment requires a variety of
procedures.
Proper use of assessment procedures
requires an awareness of their limitations.
Assessment is a means to an end, not an end in
itself.
Assumptions of Assessment and
Evaluation
14
It improves the quality of the
subject (students).
Effective assessment begins with clear
goals.
It requires appropriate feedback and
self-assessment.
It reduces the gap between assessment
and learning.
It is considered as sources of
motivation.
Various sources of errors are always
Types and Approaches to Assessment
15
1. Formal vs. Informal Assessment
Formal Assessment: Formal assessments
are where the students are aware that the
task they are doing is for assessment
purposes
They are frequently used in summative
assessments.
This usually implies a written document,
such as a test, quiz, or paper.
Cont’d…
16
Informal Assessment: is used here to
indicate techniques that can easily be
incorporated into classroom routines and
learning activities.
Informal assessment techniques can be
used at anytime without interfering with
instructional time.
Cont’d…
17
2. Formative vs. Summative Assessments
Formative Assessment: are ongoing
assessments, reviews, and observations
in a classroom.
Assist the learning process by providing
feedback to both students and
teachers.
known by the name ‘assessment for
learning’ and named as continuous
assessment
Cont’d…
18
Teachers can modify their teaching
approaches to provide enrichment or
remedial activities to more effectively guide
learners.
Summative Assessment
Summative assessment typically comes at the
end of a course(or unit) of instruction.
It evaluates the quality of students’
learning and assigns a mark to that students’
work based on how effectively learners have
addressed the performance standards and
criteria
Cont’d…
19
Assessment tasks conducted during the
progress of a semester may be regarded
as summative in nature if they only
contribute to the final grades of the
students.
The techniques used are determined by
the instructional goals.
The difference between formative
and summative assessment
20
Formative Assessment
Timing: Conducted throughout the
teaching-learning process
Method: Paper & pencil tests, observation,
quizzes, exercises, practical sessions
administered to the group and individually
Aim: To assess progress and recommend
remedial action for non-achievement of
objectives
Cont’d…
21
Remediationor enrichment or re-
teach the topic
Example: Quizzes, essays, diagnostic tests,
lab reports and anecdotal records
Summative assessment
Timing: Conducted at the end of a
teaching-learning phases (e.g. end of
semester or year)
Method :Paper & pencil tests, oral tests
administered to the group
Cont’d…
22
Aim :Grading to determine if the program
was successful and to certify students
and improve the curriculum
Example Final exams, national
examinations, qualifying tests.
Cont’d…
23
3. Criterion vs. Norm-referenced
Assessments
Criterion-referenced Assessment: It is
carried out against previously specified
criteria and performance standards.
Where a grade is assigned, it is assigned on
the basis of the standard the student has
achieved on each of the criteria:
Advantages
Quickly assessing what students have learned
Informing all students of the expected
standard
Cont’d…
24
Help to eliminate competition and may
improve cooperation.
Applied if a teacher’s main interest is to
pinpoint how well students have
mastered a particular skill
Norm referenced
This type of assessment has as its end point
the determination of student performance
based on a position within a cohort of
students – the norm group.
Advantage
25
Make comparisons across large numbers of
students or important decisions regarding
student placement and advancement.
Summary
Criterion-referenced assessment
emphasizes description of student’s
performance.
Norm-referenced assessment emphasizes
discrimination among individual students
in terms of relative level of learning.
Criterion vs. Norm-referenced
Assessments
Norm-referenced
Criterion-referenced 26
Assessment
Assessment Measure individual behavior in
Measure individual behavior in
reference to performance of other;
reference to a particular
not in terms of specific behavior.
criteria/performance standard.
Individual performance gain
Behavior is measured against some
objective performance standard. meaning through comparison with
All fail or all meet the criteria other.
Constructed to select out
because of an absolute
measurement. individuals along the scoring scale.
Looking for variability with in the Looking for variability with in the
individual environment. Example: individuals. Example: Different in
Instruction/materials. the intelligence, perception etc.
Used to assess whether or not a Not designed for such purposes
student has a particular skill. since score is relative
Should be used when there is no comparison.
limit to the number of people Should be used when we want a
possessing the skill. normative test to compare
Essential questions: Can this individuals and choose the best.
individual accomplish this task? Essential question: how well is the
individual doing compared to how
Cont’d…
27
4. Process versus Product Assessment
Process assessment: focuses on the
steps or procedures underlying a
particular ability or task, i.e., the
cognitive steps in performing a
mathematical operation or the procedure
involved in analyzing a blood sample.
Product assessment focuses on
evaluating the result or outcome of a
process.
Cont’d…
28
5. Maximum vs. Typical
Performance Assessment
Maximum Performance test: It is an
assessment used to assess how well
an individual performs when he/she is
motivated to put out his/her best effort?
Typical performance test: It is an
assessment used to assess how does
an individual usually behave in
normal or routine situation?
[Link] Vs convergent
29
Divergent assessments are those for which a
range of answers or solutions might be
considered correct.
An assessment has more than one answer.
For example, a Civics teacher might ask
his/her students to compare presidential and
parliamentary forms of government as
preferable forms of government for a country.
A student might favor a presidential form of
government by providing sound arguments and
valid examples.
Cont’d…
30
Another student also might come up with
still convincing ideas favoring
parliamentary form of government.
In both cases the answers are different but
influentially correct. So in divergent
assessments there might not be one
single answer.
Divergent assessment tools include essay
tests, and workout problems.
Cont’d…
31
A convergent assessment :are those
which have only one correct response that
the students is trying to reach.
They are generally easier to mark.
Quicker to deliver and give more specific
and directed feedback to individuals.
It can also provide wide curriculum
coverage.
Objective test items are the best
Functions of evaluation
32
[Link]
Concerned with students’ entry behavior
Focuses on whether the student has the
perquisite knowledge to begin the
planned instruction
Usually made based on summative
assessment
2. feed back
Feed back is the process of informing
students, parents ,teachers and
administrators regarding students.
Cont’d…
33
The function of feed back:
Guide students to the most effective
means of improving learning
[Link] and Remediation
Diagnostic tests are intended to :
Identify students deficiency
Locate the sources of difficulty.
Done by use of formative
assessment
4. motivation and guidance of learning
34
Evaluation should provide specific and
informative comments that give
students guidance in developing their
skills.
Encourage students evaluate their own
work rather than relying on external
evaluation by peers or teachers.
CHAPTER –TWO
II: Educational
35
Objectives
WHAT IS EDUCATIONAL
OBJECTIVE?
Developing educational Objectives
A. Educational goal 36
Stated as broad, long range out come to
work toward( not measurable)
Serve primarily in policy making and
general program planning
Example: Developing proficiency in the
basic skills of reading, writing and
arithmetic
B. General objectives
Stated in general(not measurable) terms to
encompass a set of specific learning out
comes
Cont’d…
37
Example: comprehend the literal meaning
of written materials.
C. Specific objectives:
Stated in terms of definite, measurable
and observable performance of
students
Served as performance indicator that
students have to demonstrate when they
have achieved that objectives
Example: Identify details that are explicitly
stated in a passage
II. Methods of stating instructional objectives
38
Objectives listed for a subject or a unit of
study should/ be :
Detailed enough
Communicate the intent of the
instruction
Overall guide in planning for teaching
and evaluation
Cont….
39
Thus this can be done by defining
objectives in two steps:
[Link] the general instructional
objectives as intended learning out
comes
[Link] under each general
instructional objectives, sample of
specific types of performance that
students are to demonstrate
Stating Instructional Objective
40
Why state objectives? As we have already
suggested, objectives give direction to
education:
Instructional objectives are stated in to
two forms:
General Objective: general objectives are
not directly measurable. It includes the
words like, know, understand, recognize,
realize, etc.
Specific Objectives: are action words that
performed by the students and observed by
the teacher. It includes the words like, list,
state, explain, identify, write etc.
Sources of Instructional
Objectives:
41
Instructional materials
Curriculum guides
Books on teaching methods
Different manuals
Large collection of objectives
Components of Instructional Objectives:
42
A=Audience: are students, individuals for
those the objective is prepared
B=Behavior: the action performed by the
students (knows, understand, recognize etc.)
C=Condition: the circumstances under
which the student performs the behavior
and the teacher observe the behavior.
D=Degree: expected level of
performance (accuracy level of students).
The objectives should be written in
43
Action or doing verbs
Unitary ,i.e., imply a single action
Stated in terms of measurable and
observable change in behavior
Realistic in a given context and etc.
In order to ensure that the instructional
objectives are stated in a specific terms,
check that it is SMART.
Classification of educational
objectives
44
BLOOM’S TAXONOMY
In this taxonomy Bloom et al (1956) divided
educational objectives into three domains.
These are cognitive domain, affective
domain and psychomotor domain.
Each domain is further categorized into
hierarchical levels.
Achievement of a higher level of skill
assumes the achievement of the
previous levels.
A. Cognitive domain
45
This involves those objectives that deal
with the development of intellectual
abilities and skills.
Levels of the Cognitive Domain
[Link]: recognition or recall of
previous learned information.
[Link]: is all about
internalization of knowledge (interprets,
translate or summarize given information.
Cont’d…
46
[Link]: use of abstractions in a
concrete situation
[Link]: the breaking down of a learnt
material into parts, ideas and devices for
clearer understanding
[Link]: Combining components to form
a new whole
[Link]: making a quantitative or
qualitative judgment about a piece of
communication, a procedure, a method, a
proposal, a plan etc
Knowle Comprehensi Evaluatio
Application Analysis Synthesis
dge on n
define classify apply analyze arrange appraise
identify describe compute appraise assemble assess
indicate discuss construct calculate collect choose
label explain demonstrate categorize compose compare
list express dramatize compare construct contrast
memori identify employ contrast create decide
ze locate give criticize design estimate
name paraphrase examples debate formulate evaluate
recall recognize illustrate determine manage grade
record report interpret diagram organize judge
relate restate investigate differentia perform measure
repeat review operate te plan rate
select suggest organize distinguis prepare revise
underlin summarize practice h produce score
e tell predict examine propose select
translate schedule experimen set-up value
shop t
sketch inspect
translate 47 inventory
use question
B. Affective domain
48
Affective domain has to do with feelings and
emotions.
It is concerned with interests, attitudes,
appreciation, emotional biases and
values.
Levels of the Affective Domain
[Link]: Freely attending to stimuli
[Link]: Voluntarily reaching to stimuli
[Link]: Forming an attitude toward a
stimulus
Cont’d…
49
[Link]: Bringing together different
values and building a consistent value system
by resolving any possible conflicts between
them.
[Link]: Behaving consistently
with an internally developed, stable value
system
Characterizati
Receiving Responding Valuing Organization
on
Observe Willing, Continuing Crystallize Ready
Be Enrich desire Form Revise
conscious Comply, Grow judgment View
Realize Explore Feel Relate
Approach
Obey,
Be Participate Weight Plan
Extend
sensitive Assume Is realistic Arrive
Attained Look, responsibility Judge
Exhibit Relay
Listen Enable
Engage, regulate Examine
Discrimin consider Initiate
ate Judge
Display, examine
Be alert Participate Is consistent
Cooperate Practice
50
Contribut
Levels of psychomotor domain
51
[Link]: Observing and patterning
behavior after someone else
[Link]: Being able to perform
certain actions by following written/oral
instructions and practicing
[Link]: Refining, becoming more exact.
Few errors are apparent
[Link]: Coordinating a series of
actions, achieving harmony and internal
consistent
Cont’d….
52
[Link]: Having high level
performance become natural, without
needing to think much about it.
Imitation Manipulation Precision Articulatio Naturalizatio
n n
Attempt Complete Achieve Adapt Naturally
copy follow automaticall alter perfectly
duplicat Perform y perform organize
e masterfull
play
imitate y customiz
produce
e
53
Checklists for Selecting of
Developing54 Objectives
Are the objectives relevant?
Are the objectives feasible given student and
teacher characteristics and school facilities?
Are all relevant objectives included?
Are the objectives divided into minimal and
developmental levels?
Are the objectives stated in terms of
student behavior (the product or outcome of
instruction) rather than the teacher's learning
or teaching activities?
Making Objectives Amenable to
Assessment and
55 Evaluation
Objectives should begin with an action
verb
Objectives should be stated in terms of
observable changes in behavior
Objectives should be stated in
unambiguous terms
Objectives should be unitary; that is, each
statement should relate to only a single
process.
Use of specific Learning
56
objectives
Help teachers and curriculum designers
make their educational goals explicit.
Communicate the intent of instruction to
students parents others teachers, school
administrators, and the public.
Provide bases for teachers to analyze what
they teach and to construct leaning
activities.
Describe specific performances against
which teachers can evaluate the success of
instruction.
Cont’d…
57
Communicate to students what they are
expected to learn this may empower
them to direct their own learning.
Make it easier to individualize
instruction.
Help teachers allocate and improve
instructional procedures and learning
objectives.
CHAPTER-THREE
58
III. Planning Classroom
Tests
HOW TO PLAN CLASSROOM
TEST?
Planning Classroom Tests
59
Guide lines in planning a classroom test:
Determine the purpose of the test;
Describe the instructional objectives and
content to be measured.
Determine the relative emphasis to be
given to each learning outcome
Select the most appropriate item formats
(essay or objective);
Develop the test blue print to guide the test
construction
Cont’d…
60
Prepare test items that is relevant to the
learning outcomes specified in the test
plan
Decide on the pattern of scoring and
the interpretation of result
Decide on the length and duration of
the test, and
Assemble the items into a test, prepare
direction and administer the test.
PROCEDURES IN TEST CONSTRUCTION
61
1. Purpose of the test should be
determined for:
o Pupils master of certain essential skills
& knowledge
o Measuring growth overtime
o Diagnosing pupils difficulty &
motivating
2. Preparing Table of specifications/ test blueprint
62
a)Content topics to assess
b)Types of thinking skills to assess
c)Specific learning target to assess
d)Emphasize number of tasks/points for each
learning target to be assessed
Example in developing table of specification
Instructional
63 Objectives
Contents
Understandin
Application
Knowledge
Synthesis /
Evaluation
Analysis
Total
g
Definition of terms 2 2 1 - - 25%
Characteristics of 2 2 1 2 1 40%
quality test
Domains of - 1 - 1 1 15%
educational objectives
Table of specification 1 1 1 1 20%
Total 5 6 3 4 2 100%
Purpose of table of
specification
64
Identify clearly the scope emphasized by
the test
Relate the objectives to the content
Use to balance construction of the test
Prevent testing from content areas and
taxonomical level where it is easy to
develop test items.
3. Select Appropriate Item Format
65
The decision about which type of item
to use will depend on:
The cognitive process to be measured
Content to be measured
The way the assessment tool will be
used and scored.
4. Prepare Relevant Test Item
66
I. Match items to the intended
learning outcomes
II. Obtain representative sample
The length of a test is an important factor
in obtaining a representative sample.
Length of the test depends on:
67
Purpose of testing
Type of item format used
Age and educational level of the pupils
The ability level of the students
Complexity of the items
The type of processes objective being tested
The amount of computation or quantitative
thinking required by the item,
III. Selecting proper
68
item difficulty
The difficulty level of the items to be included
in a classroom test depends largely on
whether the test is being designed to describe
the specific learning tasks pupils can perform
(criterion referenced test) or to rank the
pupils in order of their achievement (norm
referenced test).
IV. Eliminating irrelevant
barriers to 69the answer
Some common barriers are:
Ambiguous statement
Excessive wording
Difficult vocabulary
Complex sentence structure unclear
instruction
V. Preventing clues to the answer
Some common clues are:
Grammatical inconsistency
Verbal association
Specific determiners
Cont’d… 70
Phrasing of correct responses
Length of correct responses
Location of correct responses
CHAPTER-FOUR
71
IV: Test Construction
HOW TO CONSTRUCT A
TEST?
Test Construction
72
Styles & formats for writing test items:
1. Paper-and-pencil tests: are the traditional
assessment techniques where students are
required to respond to a set of questions in
writing.
2. Performance assessment tasks (also
called authentic assessments)
3. Oral test: refers to verbal communication
between examiner and testee.
General Principles of Test Construction
73
These principles state as if a test should:
1. measure clearly defined learning outcome.
[Link] representative sample of learning
task
[Link] most appropriate items that measure
desired learning outcome
4. be reliable for accurate interpretation
5. be appropriate to the ability level and improve
learning
Cont’d… 74
6. Be free from extraneous factors (reading difficulty,
ambiguity, long and complex sentences, difficult
vocabularies, etc). In short, use simple and clear
language.
7. Be independent of all others-the answer of an item
should not require answer to the other.
8. Be one correct or best answer on which agreement is
made.
9. Not be quoted directly from the text-it makes students
memorize rather than requiring high level of thinking.
10. Avoid tricky questions
Classification of test items
75
Objective test items:
Supply items: completion or short
answer items
Selection items: true/false, multiple
Choice, and matching
Subjective items:
Extended Response
Restricted Response
Constructing Objective Test Items
76
A. Those that require the student to supply the
answer (supply type items).
1. Supply type items:
1.1. short answer type uses a direct question
E.g. In which century psychology became a
scientific study?
1.2. completion test item consists of an
incomplete statement requiring the student to
complete. E.g. Psychoanalysis theory is formulated
by _____.
B. Those that require the student to select the
answer from a given set of alternatives (selection
type items).
True/False, multiple choices, and matching.
CONSTRUCTING SUPPLY TEST ITEMS
77
Is a free response type of item in which
the student give their response in words,
phrase, symbols, or numbers.
It is categorized in to Completion items and
short answer items.
They are essentially the same differing only in
the method presenting the problem.
In short answer item direct question is used.
Whereas the completion item consists of
incomplete statements.
Both are suitable for measuring a wide variety
of relatively simple learning outcomes such
as:
Cont’d…
78
A. Knowledge of terminologies. E. g. Line on a
weather map which joins points of the same
barometric pressure is called _________
B. Knowledge of methods or procedure e.g.
What device is used to detect whether a
solution is acid or base?
C. Knowledge of principles e.g. If the
temperature of the gas is held constant while
the pressure applied to it increased what will
happen to the volume?
D. Simple interpretive data e.g. How many
vowels are there in the word “ Evaluation”?
Advantages
79
Easy to construct
Reduce guessing.
Better for lower grade students
Disadvantages
Encourage student to spend their time on
rote memorizing of trivial details rather
than seeking important understanding.
Cont’d…
80
In other words it is not suitable to measure
higher order learning outcomes.
Scoring is not always completely objective.
Suggestions to construct supply test item.
Omit key words or phrases and substitute by
blank space so that the required short, definite,
clear, and explicit answer.
Avoid indefinite statement that may be
logically answered by several item.
Cont’d…
81
Avoid excessive blanks in a single item
when too many blanks are left an
incomplete statement has no meaning or it
becomes ambiguous.
Specify and announce in advance whether
scoring will take into account spelling.
Do not take statements directly from the
text book/handout as basis for short answer
item because it promotes memorization.
Cont’d…
82
Be sure that the question or statement
poses problem to the examinee.
A direct question is often more desirable
than an incomplete statement.
Eg. poor item- The first Ethiopian president
is________.
Better item- who is the first Ethiopian
president?
Cont’d…
83
The blank space should be near or at the
end of the sentence, so that the response
logically follows the stimulus
The length of the blank space should be
equal in length unless it gives clue to the
students. However, be certain to include
sufficient space for the longest response.
Do not include any specific determinate
(clues) such as an/ a.
If the problem requires a numerical answer
indicate the units in which it is to be
[Link] kg, m, km etc.
2. Selection type items
84
o True/False Test Items
o Matching Items
o Multiple-Choice Items
True/False Test Items
85
True/false items are declarative types of test.
This means, true-false items are constructed in
the form of declarative statements.
Advantages of true/false items:
Do not require the student much time for
answering.
Allows a teacher to cover a wide range of
content by using a large number of such items.
It can be scored quickly, reliably, and
objectively by anybody using an answer key.
Disadvantage s of true/false items
86
It encourage students for guessing,
Can often lead a teacher to write
ambiguous statements.
Do not discriminate b/n students of varying
ability,
Can often include more irrelevant clues
Can often lead a teacher to favor testing of
insignificant knowledge.
Suggestion to construct good true/false item
87
[Link] negative statements, and never use
double negatives. .
[Link] the use of specific quantifiers like
always, never, seldom, generally, some
times, usually, all, only, none and etc.
E g.(a)all mosquitoes cause malaria (poor)
(b)Malaria is caused by female
mosquito(better)
[Link] long complex sentences
Cont’d….
88
[Link] including two ideas in one
statement
Eg.(a) Dinsho national park is found in Bale,
while awash park is found in wolaita (poor)
(b)Dinsho national park is found in
Bale(Better)
[Link] opinions to a source or authority
E.g. (a)Human cognitive dev’t is divided in to
four stages(poor)
(b)According to J. Piaget, human cognitive
dev’t is divided in to four stages (Better).
Cont’d…
89
[Link] directly copying from text or note
book because it encourages rote
memorization than encouraging higher
order learning
[Link] statement should not be all true or
false
[Link] presenting answers in a manner
forming a pattern,
Example: TFTF(poor),TTFF(poor),TTFTFFT
(Better)
Matching Items
90
Used to assess homogenous concepts
The left side represent premises(A) while the
right side is represented by responses(B).
NB: They are a good choice if you’re interested
in finding out if your students have memorized
factual information.
Relates two things which have some logical
basis for association
Useful for testing student’s knowledge of
terms, definitions, dates, events, names,
principles and so on.
Advantages
91
Easy to construct and score
Enables to measure wide areas in short
period of time.
Require short period of reading and
response time
Allowing you to cover more content
Efficiency in time and space
Suitable to measure student’s ability to
identify association between two things
Limitations of matching test items
92
Do not measure complex learning out comes
Difficulty of finding homogenous materials
Highly suspecting to guessing
Suggestion for constructing matching tests
[Link] only homogenous materials
[Link] clear and informative directions.
o For example, whether the responses could be
used twice or not.
[Link] short lists of premises and responses
Cont’d…
93
[Link] relatively longer statements in the
premises and a shorter ones in the responses
list
[Link] responses in some systematic
order. For example, dates should be in
chronological order
[Link] all items to be in the same page
[Link] the position of answers
[Link] including many questions
[Link] if number of responses exceed that
of premises
Multiple-Choice Items
94
Most commonly used type of objective tests
Different levels of learning out comes ranging
from simple to complex
It consists four things:
1. Item’s stem: either a question or a
partially complete statement.
2. Alternatives, choices or options:
3. Correct response: is called the key
answer;
4. Distracters: the remaining alternatives.
Cont’d…
o The first type(stem) presents
95
the problems or
questions can be stated in complete or direct
question form
o Alternative present options.
o wrong alternatives are distracters or foils
o The function of distracters is to distract
examinees who do not know the correct
answer
Advantages
o Can measure variety of learning out comes
(simple to complex)
o Provide a wide variety of course or subject
content
Disadvantages
96
Take more time to construct(finding
plausible distracters take more time)
Do not measure ability to organize idea
Do not adequately measure problem
solving skills
Susceptible to guessing
Suggestion for constructing multiple
choice items:
[Link] stem should present only a single
problem in direct and clear way
Cont’d…
97
[Link] difficult vocabulary & unnecessary
complex sentence structure
[Link] much as possible , avoid use of
negatives. If reasonably used ,negatives
must be underlined, bold faced or
italics
[Link] not ask a question based on others
[Link] asking personal opinions
[Link] to use three to five alternatives
Cont’d… 98
7. Avoid using “All of the above”, because it is
usually the correct answer and make the item too
easy for the students with partial information.
[Link] repetition of words in the alternatives
[Link] verbal association between the stem
and correct answer.
E.g., An instrument used to determine the direction
of wind is:
A. Rain gage
B. Barometer
C. Wind vane
Cont’d…
99
[Link] grammatical clues
Example. (poor)The animal used for agricultural
purpose is an:
A. Goat B. Ox C. Cow
[Link] length of correct answer and distracters
should be of equal length
[Link] over lapping responses
Example: What is the estimated population of
Ethiopia?(poor)
A. over 65 million
B. Less than 90 million
C. Millions
Cont’d…
10
0
According to 1993 E .C . the estimated
number of population in Ethiopia was
about(Better)
A.65 million
B.61 million
C.69 million
[Link] specific quantifiers like(none,
only, always and..etc)
Essay/subjective/test items
10
1
o Demand the student to provide response to a
question for which no simple response can be
cited
o judged by a person correcting it
o The well constructed essay items test
complex learning out comes.
o Permitted freedom of responses to the
candidates
o As a result of degree of freedom essay tests can
be divided in to two categories
A. extended response
B. restricted response
a)Extended response test items
10
2
o The candidates have freedom to
determine length of the responses and
the complexity level of the responses
o Most useful at analysis , synthesis or
evaluation levels of cognitive complexity
o Takes more time , and some times better to
be asked in term paper, assignment and
take home tests .
o Useful in assessing communication ability
B)Restricted response
10
items
3
o A candidates have limited freedom on way of
responding.
o The length of the responses and other
directions are given by the examiner.
The candidates are required to:
- Recall paper information, organize in
a suitable manner, derive conclusion
and etc.
o Example: Write the major differences between
matching and multiple choice items, in about
one page.
Cont’d…
10
4
When to use Essay questions?
[Link] the instructional objectives specify
high level learning out comes that can
not be assessed by objective tests.
[Link] the number of candidates is few(for
small class size)
[Link] a teacher has relatively more time
to correct the papers, then to write the test
[Link] test security is a consideration (to
minimize cheating &guessing)
Advantages
10
5
o Preparation takes relatively less time
o Assess higher order of learning out comes
o Avoid rote memorization and direct students
to further understanding of the subject matter
o Assess varieties of skills
Limitations
o Correction takes more time
o Correction is subjective
o Limit number of questions (Do not cover
more options)
Suggestion for writing essay items
10
6
[Link] clearly in mind what learning out
come or mental process is to be
asked
E.g. Refer to taxonomies about flow of idea
[Link] the question clearly and define un
ambiguously the task to the student
[Link] by such words or phrases as
compare and contrast, give
responses for, predict what would
happen if, and so on.
Cont’d…
10
7
[Link] not begin with such words as what,
who, when and list, because these
words ask only recalling of
information
[Link] presenting optional questions. E.g.
Answering 3 of 5 questions(poor)
because, all students must be evaluated
by similar questions
[Link] reasonable time and page
limits for each time.
CHAPTER FIVE
10
8
Assembling,
Administering, Scoring
And Analyzing
Classroom Test Item
5.1. Assembling
10
9
Preparing test early and constructing extra
test item is important because, it makes easier
to:
-Review
- Select and arrange the item for final
use.
Extra items make it easy to eliminate those
items found to be defective(imperfect).
Assembling test follow:
-Recording, Reviewing, Arranging, writing
direction, and reproducing
5.1.1. Recording of Test Items
11
0
o Write each item on a separate index card or
using computer to address and retrieve
individual items.
The card should contain information of:
o The instructional objectives
o The specific learning outcome
o The content to be measured .
Placing each item on a separate card
provides change if needed to:
- Review and edit, eliminate, add, or revise
with very little difficulty (by sorting the card).
5.1.2. Reviewing test item
11
1
Asking fellow teachers to review and
criticize the item.
The review should focus on:
The appropriateness of format for
learning outcome being measured, such as
True/false, matching, multiple-choice,
…
Thinking skill match the specific
learning outcome and content of the
subject matter.
Whether the point of the item is clear
Cont’d…
11
2
The expected difficulty level of the item.
One correct/best answer up on experts
agreed.
Whether the item is free from irrelevant
clues, racial (color), ethnical and
gender bias.
[Link]
Items should be arranged before they are
administered.
Made by systematic consideration of:
Cont’d…
11
The 3
type of item used
The learning outcomes
The difficulty of the items
The subject matter
The format of arrangement should be from simple
to complex
item that measure simple recall should precede
those that measure understanding and
application.
According to Grounld (1985) item format can be
arranged as:
True/False or alternative response test item,
Matching test item,
Short answer test item,
Cont’d… 11
4
Multiple-choice test item, and
Essay test item.
Each test item should be arranged from
simple to complex.
Beginning with easy items assist pupils
by raising their:
-motivation
- confidence
- working the remaining items that
follow.
o For example, the items in the multiple-choice
section might be arranged in the following
Cont’d…
11
5
knowledge of terms
knowledge of specific facts
knowledge of principles
Application of principles
N.B: items dealing with the same
instructional objectives should be
arranged together.
5.1.4. Writing test direction
According to Grounlud as cited in (Linn and
Miller, 2005) written, oral or both items
direction should include:
Cont’d… 11
6
Purpose of the test
Time allowed to complete the test
Value of the test item
Direction for responding
How to record the answer
What to do with cheating or misbehavior, etc.
The amount of detail devoted to each of these
points mainly depends on:
Age level of learners
comprehensiveness of test
complexity of test item and
the experience of the learners with the testing
procedures used.
A. Purpose of the test
11
Purpose of the test should
7 be expressed on the test
direction.
The purpose of exam might be for :
- diagnostic
- formative
- summative evaluation and etc…
B. Time allotted for completing the test
This enables students effectively distribute and use
their time and minimize unnecessary wastage of time
on few test items.
One has to take into consideration while Determining
the amount of time required to complete a test :
- type of test items used
- the age and ability of the pupils
-the complexity of the learning outcomes to be
Cont’d…
11
As a rough guide:
8
The average high school pupil should be
able to answer two true-false items, one
multiple-choice item or one short
answer item per minute of testing time.
However, elementary school pupils
generally require much more time per
item than high school pupils.
This happen due to the reading skill level
of the pupils.
C. Direction for responding
The test writer should clearly indicate how
Cont’d…
11
direction for true-false
9 or multiple choice items
can be relatively simple compared to essay
questions which frequently require special
directions concerning the expected answer.
D. Procedure for recording the answers
Using the test form or a separate answer sheets
for recording the answers depends on the :
-Length of the test
-Number of examinees
- Age of the pupils
The answers are generally recorded
directly on the test paper if the test is:
- Short, for small group of pupils and
Cont’d…
12
0
Separate answer sheets is preferred;
because :
-it reduces time needed for
scoring
-make it possible to use the test
papers over again.
5.1.5. Reproducing the test
Item should be spaced and arranged to :
- Read clearly
-Score with least amount of difficulty.
- have generous borders.
Cont’d… 12
1
When interpretive exercise is used the introductory
material be it a graph, charts or diagram the
item based on it should be on the same page.
Moreover,
The space for answer should be done on the left
side of the page.
Items should be numbered consequently
throughout the test
The entire test should be proof read (assessed
before it is distributed): Charts, graphs and
other pictorial must be checked; especially
carefully to be certain that the reproduction has
been accurate and the details are clear.
5.2. Administering the test
12
2
The most important thing in administering
any classroom test or assessment is that :
-all students must be given a fair
chance to demonstrate their
achievement.
This means the physical and
psychological environment should be
suitable to their best efforts in order not to
interfere with students’ performance
5.2.1. Physical Conditions
12
3
Physical Conditions are the physical setting
of the environment such as :
-Adequate place and seat in the room,
-Appropriate light
-Fresh air (comfortable temperature),
-Free from noise (launch period, noise of
band practice and others) and the like.
Clearly printed test papers and
adequate number of papers.
5.2.2. Psychological conditions
12
4
Psychological conditions are concerned with
the learners’ psychological conditions such as:
The examinees should be relaxed as much as
possible
Student should be informed as there is a
test earlier’
Adequate time should be provided to
complete the test
Do not give test during pleasant activities of
the examinees such as lunch time, sport
season and the like.
Cont’d…
12
5
Do not give test immediately before or
after long vacation or big public holly
day.
Try to establish positive mental attitude
because test do :
-induce anxiety and test anxiety
affect optimum performance.
Do not communicate at all one’s
examination is started and avoid
unnecessary interruption
Do not compromise with cheating.
[Link] that do not
match with test12 administration
6
i) Avoid giving hints to pupils who ask about
individual items
ii)Avid doing other activities during test
administration
5.3. Scoring and recording test item
depend on the types of items used.
comparing each student’s response to the
key.
spelling, grammar and the like are need to
considered and should be given separate score.
For objective items all the item should have
equal width.
[Link] Essay/workout type
test item
12
7
Use appropriate method to minimize biases
pay attention to the significant and relevant
aspects of the answer.
Prepare an outline of the expected answer in
advance.
Apply uniform standard to all papers in order
not to be affected by unique individual
response
Grade items one at a time.
Evaluate the answer without looking at the
pupil’s name
Cont’d…
12
8
Two methods of scoring Essay test:
(a) Point (Analytical) Scoring Method
Here, a teacher prepares key answers
before staring scoring test papers.
students’ answers are compared with the
key answers of the teacher; and
numerical points are given.
It is mostly appropriate for correcting
restricted essay questions.
(b) Rating (Global or Holistic)
Scoring Method
12
9
In this case, the teacher uses general
impressions of his own on students’
answers to judge quality of their answers,
without referring to any key answers.
When global scoring is used, the answer
should be rated for the:
- organization of ideas
- comprehensiveness of the answers,
- relevance of the ideas, coherence of
ideas and the likes.
[Link]
13
0
Student, parent, administrators, counselor,
teachers, and college admission officers.
Analyzing and Evaluating Test
Item Analysis
[Link] the test paper in order from the
highest to the lowest score
2. Divide the test papers in to two- upper
group and lower group on the basis of
their results or scores.
Cont’d…
13
1
Then take 27% of the test paper from both
the upper group (high achiever) and lower
group (low achiever).
[Link] each test item Count or tabulate the
number of pupils in the upper and lower
groups who selected each alternative
Chapter Six:
Item Analysis
13
2
HOW AND WHY WE
ANALYZE TEST ITEMS?
Item Analysis
13
3
It is the process involved in examining or
analyzing testee’s responses to each item
on a test with a basic determination of
judging the quality of item.
It helps to determine the adequacy of
the items within a test as well as the
appropriateness of the test itself.
Reasons for analyzing
13
4
Identify content that has not been
adequately covered and should be re-taught
Provide feedback to students,
Determine if any items need to be
revised in the event they are to be used
again or become part of an item file or bank,
Identify items that may not have
functioned as they were intended,
Direct the teacher's attention to
individual student weaknesses.
Item analysis procedures
13
5
Rank the papers in order from the highest
to the lowest score
Select one-third of the papers with the
highest total score and another one-third
of the papers with lowest total scores
For each test item, tabulate the number of
students in the upper & lower groups who
selected each option
Compute the difficulty of each item (% of
students who got the right item)
Item difficulty level index
13
6
It is a measure of the proportion of
examinees who answered the item
correctly; for this reason it is frequently
called the p-value.
If scores from all students in a group are
included the difficulty index is simply the
total percent correct.
Item difficulty index can be
calculated using the following
formula
13
7
P = R X 100 or P= U+L X 100
T T
o Indicated by percentage
o p< 25%=relatively difficult
o p>70%=relatively easy
o P 20%-80% an average (50%)
o Ranges from 0 -100% or 1.
As a rule of thumb in terms of difficulty
index or “P”
oIf P is greater than 0.70- the item is very easy
oIf P is between 0.40 to 0.70 –the item is
moderate difficulty and desirable
o If P is below 0.40 – the item is very difficult.
Cont’d…
13
8
An item with P value of zero (0) and P value of one
(1) does not contribute to measuring
individual difference.
When the entire test item is extremely difficult
majority of test scores will be very low
When the entire test item is extremely easy
majority of test scores will be very high.
Hence in both case it shares less reliability.
Thus, extreme “P” values directly restrict the
variability of test scores.
Item discrimination index
13
9
It is a numerical indicator that enables us to
determine whether the question
discriminates appropriately between
lower scoring and higher scoring
students.
When students who earn high scores are
compared with those who earn low
scores, we would expect to find more
students in the high scoring group answering
a question correctly than students from the
low scoring group.
Item discrimination index can be
calculated using the following
formula
14
0
D=U-L
1/2T
The interpretation of an item in relation to
item discrimination power can be seen as
follows:
Discrimination index ranges from negative one (-1) to
positive one (1)
An item has maximum positive discriminating
power if all pupils from the upper group got the
item right and all from the lower group miss it.
An item has zero discriminating power if all pupils
both from the lower and the upper group got the
item right or miss it.
Cont’d…
14
1
The item has negative discriminating power if more
pupils from the lower group than the upper group
got the items right.
Generally,
The higher the discrimination index the better the item
is.
An item is considered as having average
discriminating power is closer to 0.5.
An item with a maximum positive discriminating power
would be one (1) where all pupils in the upper group got
the item right and all the pupils in the lower group got
the item wrong.
An item is considered acceptable if its discrimination
index is 0.30 or higher.
Effectiveness of Destructors in
multiple choice
14
items
2
A distracter analysis evaluates the
effectiveness of the distracters in each item
by comparing the number of students
in the upper and lower groups who
selected each incorrect alternative (a
good distracter will attract more students
from the lower group than the upper
group).
Cont’d…. 14
3
Alternatives
Group *A B C D
upper 5 4 0 1
Lower 3 2 0 5
The following are the comments given:
14
4
The item has positive discrimination index-since 5 in the
upper group and 3 in the lower group got the item right.
Option “B” is a poor distracter because it attracts more
pupils from the upper group than from the lower group. This
may be due to some ambiguity incurred in the statement of
the item.
Option “C” is completely ineffective as a distracter
because it attracted no one.
Option “D” is functioning as intended.
It attracted a large proportion of pupils from the lower group.
One may improve the low discriminating power of the item by
removing any ambiguity in the statement of the item and
revising or replacing alternatives “B” and “C”.
Alter
Uses of item analysis
14
5
Help to judge the worth of quality of a test
Used in revising test
Used to build a test file for future test
Can lead to increased skill in test construction
Provide diagnostic value and help in planning
future learning activities
Provided a basis for discussing test results.
Interpreting item analysis data
14
6
In interpreting item analysis data we have to
consider the following points.
Item analysis data are not analogues to item
validity when we want to determine the validity
of a test we have to use some external criterion.
Discrimination index is not always a
measure of an item quality we cannot
automatically conclude an item with low
discrimination index is poor and should be
discarded.
Item with low but positive discrimination index
could be kept for mastery test.
Cont’d…
14
7
An item may have low discrimination index because
[Link] more difficult or easy item the lower its
discrimination power.
2. The purpose of the item in relation to the total test will
influence the magnitude of its discrimination power.
Item analysis data is tentative. The data are influenced
by
The nature of the group being tested
The number of students tested
The instructional procedure employed
The chance ( error) and
The position of the item in the test.
Cont’d… 14
8
Avoid selecting test item poorly on the basis
of their statistical properties. Rather choose
on the basis of difficulty and discrimination
power.
Statistical features can be affected by several
things such as:
[Link] guessing, Item difficulty can be affected
by guessing
[Link] of correct answer among alternative
and the serial location of the item in the test.
Individual Assignment 2
14
9
By formulating one item results of the upper and lower
scorers of students Calculate:
1. Item difficulty index of the result.
2. Item discrimination level of the result.
3. The effectiveness of the destructors.
Definition of Reliability
15
0
Reliability can be defined as the degree of consistency
between two measures of the same thing.
Approaches can be used to estimate
reliability
15
1
Measures of stability
Measures of equivalence
Measures of equivalence and stability
Measures of internal consistency
Scorer (judge) reliability
Factors Influencing Reliability
15
2
Equivalent Forms,
Test-retest,
Test Length,
Speed,
Group Homogeneity,
Difficulty of Items, and
Objectivity.
Validity
15
3
It is the extent to which certain inferences can be made
accurately from and certain actions should be based on-
test scores or other measurement.
It is the appropriateness, meaningfulness, and usefulness
of the specific inferences made from test scores.
Test validation is the process of accumulating evidence to
support such inferences
Kinds of Validity
15
4
Content validity
Criterion-related validity
Construct validity
Factors that influence validity
15
5
Unclear direction
Inappropriate level of difficulty
Poorly constructed items (clues to items)
Test item inappropriate for the objectives being measured
Improper arrangement of item
Identifiable pattern of answers
Cheating in exams, emotional disturbance of examines
15
6
The end!