0% found this document useful (0 votes)

5 views26 pages

Teacher and Principal Performance Evaluation Findings

Uploaded by

stormless

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views26 pages

Teacher and Principal Performance Evaluation Findings

Uploaded by

stormless

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

U.S.

Department of Education
November 2016

Early Implementation Findings

From a Study of Teacher and
Principal Performance Measurement
and Feedback: Year 1 Report
Executive Summary

Andrew J. Wayne
Michael S. Garet
Seth Brown
Jordan Rickles
Mengli Song
David Manzeske
American Institutes for Research

Melanie Ali
Project Officer
Institute of Education Sciences
This page has been left blank for double-sided copying.
Early Implementation Findings
From a Study of Teacher and Principal
Performance Measurement and
Feedback: Year 1 Report
Executive Summary

November 2016

Andrew J. Wayne
Michael S. Garet
Seth Brown
Jordan Rickles
Mengli Song
David Manzeske
American Institutes for Research

Melanie Ali
Project Officer
Institute of Education Sciences

NCEE 2017-4003
U.S. DEPARTMENT OF EDUCATION
This page has been left blank for double-sided copying.
U.S. Department of Education
John King
Secretary

Institute of Education Sciences

Ruth Curran Neild
Deputy Director for Policy and Research
Delegated Duties of the Director

National Center for Education Evaluation and Regional Assistance

Joy Lesnick
Acting Commissioner

November 2016

This report was prepared for the Institute of Education Sciences under Contract ED-IES-11-C-
0066. The project officer is Melanie Ali in the National Center for Education Evaluation and
Regional Assistance.

IES evaluation reports present objective information on the conditions of implementation and
impacts of the programs being evaluated. IES evaluation reports do not include conclusions or
recommendations or views with regard to actions policymakers or practitioners should take in
light of the findings in the report.

This report is in the public domain. Authorization to reproduce it in whole or in part is granted.
While permission to reprint this publication is not necessary, the citation should be:

Wayne, A. J., Garet, M. S., Brown, S., Rickles, J., Song, M., Manzeske, D., (2016). Early
Implementation Findings From a Study of Teacher and Principal Performance Measurement
and Feedback: Year 1 Report, Executive Summary (NCEE 2017-4003). Washington, DC:
National Center for Education Evaluation and Regional Assistance, Institute of Education
Sciences, U.S. Department of Education.

This report is available on the IES website at [Link]

Alternate Formats: Upon request, this report is available in alternate formats, such as Braille,
large print, audiotape, or computer diskette. For more information, please contact the
Department’s Alternate Format Center at 202-260-9895 or 202-205-8113.
This page has been left blank for double-sided copying.
Acknowledgments
This study was a collaborative effort and involved a diverse group of partners. We were fortunate
to have had the advice of our expert technical working group. Members included Thomas Cook,
Northwestern University; Thomas Dee, Stanford University; Laura Goe, Educational Testing
Service; Laura Hamilton, RAND; Daniel McCaffrey, Educational Testing Service; Catherine
McClellan, Clowder Consulting; Jonah Rockoff, Columbia University; Carla Stevens, Houston
Independent School District; John Tyler, Brown University; and Judy Wurtzel, Charles and Lynn
Schusterman Foundation.

We would also like to thank all those who provided the teacher and principal performance
feedback systems and training, including the organizations that supported the implementation of
the Classroom Assessment Scoring System (the University of Virginia and Teachstone), the
Framework for Teaching (the Danielson Group and Teachscape), and the VAL-ED (Discovery
Education). We appreciate the willingness and commitment of the school district leaders,
schools, principals, study-hired observers, and teachers to implement the intervention and data
collection activities, which involved a significant amount of time and energy.

We are also grateful to the AIR staff who worked diligently to coordinate the study’s
performance feedback activities in participating districts: Rebecca Herman, Fran Stancavage,
Matthew Clifford, Mariann Lemke, Susan Ward, Carmen Martinez, Muna Shami, Ben Kalina,
Marlene Darwin, Carla Hulce, Nicole Christie, Debbie Davidson-Gibbs, Mark Garabaldi, Jessica
Milton, Elaine Liebesman, Amy Potemski, Roshni Menon, Marian Eaton, Gur Hoshen, Zhongjie
Sun, and Michele Cadigan. Additional AIR staff worked tirelessly on data collection: Dorothy
Seidel, Lauren Staley, Cheryl Pruce, Sara Bardack, Makeda Amelga, and Lindsey Mitchell. For
their efforts to identify the partner districts, we thank the recruitment leaders Anja Kurki and
Rebecca Herman and the team of senior recruiters: Kirk Walters, James Taylor, Marlene Darwin,
Nicholas Sorensen, Mark Garibaldi, Carmen Martinez, Nicol Christie, Kathleen Perez-Lopez,
and Emily Rosenthal. The study authors are also grateful to Rachel Garrett, Jenifer Harr-Robins,
Luke Keele, and Paul Bailey for their help with data analyses. Finally, we are grateful to Connie
Conroy who provided administrative assistance throughout the project.

Acknowledgments vii Year 1 Draft Report

This page has been left blank for double-sided copying.
Disclosure of Potential Conflicts of Interest
The research team was comprised of staff from American Institutes for Research (AIR). None of
the research team members has financial interests that could be affected by findings from the
Early Implementation Findings From a Study of Teacher and Principal Performance
Measurement and Feedback. No one on the 10-member technical working group, convened by
the research team three times to provide advice and guidance, has financial interests that could be
affected by findings from the evaluation.

Disclosure of Potential Conflicts of Interest ix Year 1 Draft Report

This page has been left blank for double-sided copying.
Executive Summary
Educator performance evaluation systems are a potential tool for improving student
achievement. 1 By removing ineffective teachers and principals and/or through increasing the
effectiveness of the existing workforce, such systems may result in higher student achievement. 2

Emerging research suggests some promising features of performance evaluation measures. For
example, research suggests that, to measure classroom practice, additional observations of the
same teacher, beyond the first, form a more reliable measure of a teacher’s typical practice,
especially when more than one observer is used. 3 There is also some evidence from recent
research that giving more frequent, specific feedback on classroom practice may lead to
improvements in teacher performance and student achievement. 4

The U.S. Department of Education’s Institute of Education Sciences is conducting a study on the
implementation and impacts of teacher and principal performance measures that are consistent
with emerging research. As part of the study, eight districts were provided resources and support
to implement the following three performance measures in a selected sample of schools in 2012-
13 and 2013-14:
• a measure of teacher classroom practice with subsequent feedback sessions conducted
four times per year, based on a classroom observation rubric;
• a measure of teacher contributions to student achievement growth (i.e., value-added
scores), provided to teachers and their principals once per year; and
• a measure of principal leadership with subsequent feedback sessions conducted twice per
year, using a leadership survey administered to the principal, the principal’s supervisor,
and the principal’s teachers. 5

No formal “stakes” were attached to the measures – for example, they were not used by the study
districts for staffing decisions such as tenure or continued employment. 6 Instead, the measures
were used to provide educators and their supervisors with information regarding performance.
1
See Stecher et al. (2016); Weisburg, Daniel, Sexton, Mulhern, and Keeling, (2009).
2
Researchers studying a range of educator workforce interventions theorize that there are two key mechanisms
leading to improved student achievement: changes in the composition of the workforce and changes in the skills of
continuing workers. See, for example, Taylor and Tyler (2012) regarding teacher evaluation, Chiang et al. (2015)
regarding performance-based compensation, and Glazerman et al. (2010) regarding comprehensive teacher
induction.
3
See Bill & Melinda Gates Foundation (2012); Whitehurst, Chingos, and Lindquist (2014).
4
See Steinberg and Sartain (In Press); Taylor and Tyler (2012).
5
These measures are the kinds of measures emphasized through the Elementary and Secondary Education Act
flexibility waivers, as well as federal programs such as Race to the Top and the Teacher Incentive Fund. They also
are the kinds of measures that states are allowed to develop and support with Title II, Part A funds under the Every
Student Succeeds Act of 2015.
6
There were exceptions in three districts. In these districts, the observations conducted by principals as part of this
study counted in their official rating system if the teacher was due to be observed that year under the district’s
existing evaluation system.

Executive Summary ES-1 Year 1 Report

Such information might affect perceptions about performance, motivate improvement, and/or
lead to improved knowledge and skills among educators.

This report focuses on the first year of the two years of implementation, describing the
characteristics of the educator performance measures and teachers’ and principals’ experiences
with performance feedback. It is one of the few detailed descriptions of educator performance
measures implemented on a large scale in districts. 7 The final report will primarily focus on
impacts on outcomes including principal leadership, teacher classroom practice, and student
achievement but will also include descriptive information on the second year of implementation.

The main findings in this report are as follows:

• Educator performance measures were implemented generally as planned,
except that fewer than the intended number of educators accessed the
student growth reports. Most teachers received the intended four rounds of
observations and feedback sessions (mean = 3.8 observations and 3.7 feedback sessions
per teacher) by trained and certified observers. Only 40 percent of principals and 39
percent of teachers with value-added scores accessed their student growth reports. All
principals in the implementing schools received two reports about their leadership
practices based on their survey responses and that of their teachers and their supervisor.
They also met with their supervisors to discuss their reports, in both fall and spring.
• Both classroom observation and student growth measures differentiated
teacher performance, but observation scores were skewed toward the
upper end of the scale. In each classroom observation window, a large majority of
the teachers observed had classroom observation overall scores in the top two
performance levels (more than 85 percent), and very few teachers had overall scores in
the lower two levels (less than 15 percent). However, there was some variation in scores
across teachers, and both value-added scores and average classroom observation scores
over the year had sufficient reliability to capture performance differences among some
teachers. About one quarter of the teachers with value-added scores in reading/English
language arts and one half of the teachers with value-added scores in mathematics had
student growth reports indicating that their score in that subject was measurably below or
above the district average. 8
• The principal leadership measure differentiated performance, but there was
limited consistency in scores across survey respondent groups. Principals’
overall leadership scores were spread across the four performance levels. Half or more of

7
For an example, see Lipscomb, Terziev, and Chaplin (2015).
8
To test whether teachers’ value-added scores differed from the district average, we used an 80 percent confidence
interval. There were two reasons for the use of an 80 percent rather than a 95 percent confidence interval more
typical of statistical tests. First, the student growth report available to principals and teachers in the intervention
schools included the score with an 80 percent confidence interval. As part of the student growth report training,
educators were told to use this confidence interval to determine if their score differed from average performance in
the district. Second, the value-added scores were used for informational purposes and not decisions with
consequences for employment, for example. Given this, the 80 percent confidence interval was intended to balance
the risks associated with mistakenly classifying average performers as above or below average with the risks
associated with mistakenly classifying above or below average performers as average.

Executive Summary ES-2 Year 1 Report

principals in the schools selected to implement the intervention received overall scores on
the principal leadership measure in the lower two categories labeled basic or below basic
(70 percent in the fall and 51 percent in the spring administration). However, the
respondent groups (principals, teachers, and principal supervisors) often scored principals
differently.
• Both teachers and principals in schools selected to implement the
intervention reported receiving more feedback than those in schools in the
same districts selected to continue with business-as-usual. For example,
teachers in intervention schools reported more feedback sessions that were accompanied
by a rating and written narrative (3.0 versus 0.7 sessions). 9 They also reported spending
more total time in feedback sessions (80 minutes versus 18 minutes). Principals in
intervention schools reported more instances of receiving feedback with ratings (1.0
versus 0.4) and spending more total time in feedback sessions (60 minutes versus 41
minutes).

Study Overview
The purpose of this study is to describe teachers’ and principals’ experiences with the study’s
performance measures and feedback over two years, and to examine whether the information
provided by the measures and feedback affected educator and student outcomes. The study has
five research questions:
1. To what extent were the performance measures and feedback implemented as planned?
2. To what extent did the performance measures distinguish educator performance?
3. To what extent did educators’ experiences with performance feedback differ for
treatment and control schools?
4. Did the intervention have an impact on teacher classroom practice and principal
leadership?
5. Did the intervention have an impact on student achievement?
This report addresses the first three questions, focusing on the first year of implementation. A
subsequent report will address the first three questions focusing on the second year of
implementation. That report will also address the last two questions.

Study Design
The study examines the implementation and impacts of an intervention consisting of three
performance measures with feedback for teachers and principals through an experimental design
conducted in eight purposefully selected districts. We recruited districts that met the following
criteria: (1) had at least 20 elementary and middle schools, (2) had data systems that were
sufficient to support value-added analysis, and (3) had current performance measures and
feedback that were less systematic and intensive than that implemented as part of the study.
Recruited districts required fewer than four observations of teachers per year. In addition, the

9
Throughout this executive summary, all treatment-control differences noted are statistically significant using two-
tailed tests and applying a 5 percent critical value, unless noted otherwise.

Executive Summary ES-3 Year 1 Report

districts’ evaluation systems did not require the inclusion of student achievement information in
teacher ratings. None of the recruited districts used a leadership measure similar to that used by
the study.

The study used two different observation measures to make the study findings more broadly
relevant than would be the case if only one measure was used. Four of the eight study districts
used the Classroom Assessment and Scoring System (CLASS) and the other four study districts
used Charlotte Danielson’s Framework for Teaching (FFT). The observation rubrics were not
randomly assigned but rather assigned based on district preference. Thus, differences in the study
results in the CLASS and FFT districts cannot necessarily be attributed to the CLASS and FFT
observation systems; differences could occur due to other district characteristics.

Each study district identified a set of regular elementary and middle schools that were willing to
participate in the study. In these schools, the study focuses on the teachers of mathematics and
reading/English language arts in grades 4-8, as well as the principals. 10 The schools were
assigned by lottery to implement the three measures with feedback (the treatment group) or not
(the control group). Both groups continued to implement their district’s existing performance
evaluations and measures, and the treatment group additionally implemented the study’s
performance measures with feedback. In total, 63 treatment schools and 64 control schools
participated in the study.

Consistent with the recruitment criteria, the study districts are larger and more likely to be urban
than the average U.S. district. The study schools were similar to schools in the national
population in terms of enrollment and Title I status, but on average had a higher percentage of
students who were minorities.

Data for this report came from multiple sources as described next.

Data on the implementation of the intervention. We documented attendance at

orientation and training events related to the study’s performance measures. Online system
records maintained by the vendors of the measures were used for information on observer
certification test pass rates, the frequency and timing of teacher observations and feedback
sessions, and teachers’ and principals’ accessing of student growth reports. Surveys of observers
hired by the study and interviews with district officials provided further information regarding
the implementation of the observations and the district context, respectively.

Data on measures of educator performance. Data on measures of teacher classroom

practice, student growth, and principal leadership were collected through the vendors’ online
systems.

Data on educators’ experiences with performance feedback. In spring 2013, we

surveyed both the principals and teachers in all treatment and control schools. These surveys

10
Teachers of Kindergarten through grade 3 also participated in the study. This was done mainly to promote
schoolwide engagement in the implementation of the classroom practice and principal leadership performance
measures. These teachers are not included in the main study analyses, however, because student assessment data are
not available in Kindergarten through grade 3.

Executive Summary ES-4 Year 1 Report

collected information on the nature and frequency of performance information educators
received and their perceptions of that information.

Data on the characteristics of study participants. To compare the characteristics of

participants in the treatment and control groups, we collected data on school characteristics from
the 2011–12 Common Core of Data and collected data on principals’, teachers’, and students’
characteristics from district administrative records.

Analyses
To examine the implementation of the performance measures, we describe the extent to which
study participants received the training on the measures, carried out the performance
measurement activities, and received performance information and feedback as planned. We also
examined the characteristics of the ratings teachers and principals received, including whether
they distinguish between lower and higher performers. These analyses yielded the average rating
scores, the percentage of ratings in each performance level, and the variation in the ratings across
teachers and across principals. To assess whether the study’s intervention led to differences in
educators’ experiences with performance measurement and feedback, we compared survey
responses of teachers and principals in the treatment and control groups.

Detailed Summary of Findings

The following section provides additional information about the extent to which each of the
study measures was carried out as intended and whether the information from the measures
distinguished between lower- and higher-performing educators and thus could be used to identify
educators in need of support. These analyses pertain only to teachers and principals in the
treatment schools. This section also highlights the extent to which educators’ experiences with
performance information differed between the treatment and control groups.

The Classroom Practice Measure and Feedback

The teacher classroom practice measure was based on four classroom observations during the
school year. For each teacher, one observation was to be conducted by a school administrator
and the other three by observers hired by the study. After each observation, the observer was
expected to prepare a standard report with both ratings and narrative justification and to discuss
the report with the teacher during a feedback session. Both teachers and their principal had
access to the standard report.

How Many Observations Were Conducted and What Were Observers’

Qualifications?
• Observers were trained and certified as planned. Nearly all observers (92
percent for CLASS and 97 percent for FFT) completed all of the required training, which
lasted three days for CLASS and four days for FFT. All observers passed the certification
test, although it took multiple attempts to pass the test for half of the CLASS trainees and
17 percent of the FFT trainees.

Executive Summary ES-5 Year 1 Report

• The majority of teachers were observed the intended four times and
received feedback. The majority of teachers (73 percent for CLASS and 95 percent
for FFT) received all four observations, and the majority of teachers (57 percent for
CLASS and 94 percent for FFT) also received all four feedback sessions, as intended. On
average, teachers received 3.8 observations (3.7 for CLASS and 3.9 for FFT) and 3.7
feedback sessions (3.5 for CLASS and 3.9 for FFT) during the first year of the study.

What Were the Characteristics of the Classroom Practice Performance

Information Provided?
• CLASS reports provided separate scores for individual dimensions as well
as the teacher’s overall score and a sense of how their performance
compared with others; FFT reports provided only separate scores for
individual dimensions. The CLASS reports included scores for 12 dimensions of
teaching grouped into four teaching domains, as well as an overall score for the
observation and a score for each domain (emotional support, classroom organization,
instructional support, and student engagement). In addition, the CLASS reports included
comparisons with the district average scores and the teacher’s prior scores. The FFT
reports provided scores for up to 10 dimensions of teaching grouped into two teaching
domains (classroom environment and instruction). The FFT reports did not include an
overall score, domain scores, scores from past observations, or district average scores.
• Most of the CLASS observation reports identified at least one dimension of
classroom practice to improve and illustrated it with an example from the
observation, but less than a quarter of FFT reports did so. The observers were
required to write narrative text identifying at least one dimension of practice as a strength
and one dimension for improvement. The majority of the observation reports (76 percent
of CLASS reports and 71 percent of FFT reports) did so. In addition, three quarters of the
CLASS reports supported the identified dimension(s) for improvement with at least one
example from the observation, but less than a quarter (23 percent) of the FFT reports did
so. 11
• For both CLASS and FFT, observation scores were concentrated at the
upper end of the scale, limiting the degree of differentiation between lower-
and higher-performing teachers. Nearly all teachers had CLASS or FFT overall
scores for a given observation window in the top two performance levels (more than 95
percent of the CLASS scores and more than 85 percent of the FFT scores). Only a small
percentage of the teachers had scores consistent with the lowest two performance levels
(under 5 percent for CLASS and under 15 percent for FFT depending on the observation
window). 12 (See exhibits ES.1 and ES.2.) While most teachers had overall scores in the
top two performance levels, many teachers had dimension-level scores at different
performance levels (e.g., in the first window 61 percent of CLASS teachers and 69
percent of FFT teachers received scores at multiple performance levels).

11
The findings reported here are based on an analysis of 160 randomly selected reports.
12
Teachers observed using the FFT instrument did not receive an overall score or overall performance level for each
observation window. For analytic purposes, the study’s evaluation team calculated each teacher’s average score in
each observation window based on the 1 to 4 rating for each dimension of practice.

Executive Summary ES-6 Year 1 Report

• Teachers’ overall classroom observation scores, averaged across all four
windows, contained measurement error, but provided some reliable
information to distinguish between lower- and higher-performing teachers
and were positively correlated with teacher value-added scores. Classroom
observation scores averaged across the four observation windows had some reliability to
help distinguish average teacher performance (reliability estimated between .42 and .50
for CLASS and .69 and .75 for FFT). These estimates, while lower than conventional
thresholds for measures used in research, are consistent with findings from other studies
of classroom observation reliability. 13 In addition, the CLASS and FFT four-window
average scores were positively, although weakly, associated with teachers’ prior-year
value-added scores (correlations of .09 and .17, respectively). 14
• Differences in a teacher's ratings across observations limited how much
one could learn about persistent performance from a single observation.
Less than half of the variation in teacher scores from a given observation window
reflected stable classroom practice over the year. The reliability estimate for a single
observation was .24 for CLASS scores and .49 for FFT scores, which indicates that 24
percent of the variation in CLASS scores and 49 percent of the variation in FFT scores
reflected stable practice over the year.

13
See Casabianca et al. (2013); Ho and Kane (2013); Kane and Staiger (2012).
14
Although the correlations between classroom observation overall scores and value-added scores were modest in
magnitude, these correlations are consistent with the magnitudes found by other studies (Chaplin et al. 2014; Kane
and Staiger 2012; Kane et al. 2011) and likely underestimate the strength of the true association because of
measurement error in both the observation scores and the value-added scores.

Executive Summary ES-7 Year 1 Report

Exhibit ES.1. Distribution of treatment teachers across performance levels based on
CLASS overall scores, by observation window

Exhibit Reads: Of treatment teachers in CLASS districts observed in window 1, 74 percent had a CLASS overall score at the highly
effective performance level, 24 percent at the effective performance level, and 2 percent at the developing effectiveness
performance level. Less than 1 percent of teachers had an overall score at the ineffective performance level.
NOTE: Performance level distributions are based on teachers’ overall CLASS ratings in each window. Sample size = 262 teachers
in window 1, 307 teachers in window 2, 309 teachers in window 3, and 272 teachers in window 4. Reported percentages may not
sum to 100 percent because of rounding.
a
Within a window, less than 1 percent of teachers had an overall score at the ineffective performance level.
SOURCE: Teachstone Online System.

Executive Summary ES-8 Year 1 Report

Exhibit ES.2. Distribution of treatment teachers across study-defined performance levels
based on FFT overall scores, by observation window

Exhibit Reads: Of treatment teachers in FFT districts observed in window 1, 4 percent had an FFT overall score between 3.50 and
4.00, 84 percent had a score between 2.50 and 3.49, and 12 percent had a score between 1.50 and 2.49. Less than one percent of
teachers had an overall score below 1.50.
NOTE: The distribution in each window is based on teachers’ FFT overall scores categorized into study-defined performance levels.
To create the overall scores and performance levels, the study’s evaluation team first calculated an overall score by averaging the
teacher’s ten FFT dimension scores, each of which was rated on a 1 to 4 scale. The overall scores were then categorized into
study-defined performance levels by rounding them to the nearest whole number. This created four performance levels aligned with
the FFT dimension scores. An FFT dimension score of 1 corresponds to unsatisfactory, 2 corresponds to basic, 3 corresponds to
proficient, and 4 corresponds to distinguished. Average FFT scores and overall performance levels were not provided in the FFT
reports teachers received. Sample size = 216 teachers in window 1, 219 teachers in window 2, 220 teachers in window 3, and 217
teachers in window 4. Reported percentages may not sum to 100 percent because of rounding.
a
Within a window, less than 1 percent of teachers had an overall score below 1.50.
SOURCE: Teachscape Online System.

The Student Growth Measure

The measure of student growth was designed to provide teachers with information on their
contribution to student achievement, using value-added methods. Value added methods involve
predicting the test score each student would have received, accounting for prior achievement and
other characteristics, if the student had been taught by the average teacher in the district. A
teacher’s value added score is obtained by comparing the average actual performance of the
teacher’s students to the average of the students’ predicted scores.

Teacher value-added scores were generated for all teachers of students in grades 4–8
reading/English language arts and mathematics in each district using the achievement data for

Executive Summary ES-9 Year 1 Report

the students that each teacher taught in the previous two years. 15 Individual teachers in the
treatment schools in these grades were given access to a report on their scores during the first
year of implementation. Treatment principals were also given access to a report that included
their teachers’ student growth reports as well as school average value-added scores, overall and
by subject and grade.

Who Received the Student Growth Performance Information?

• A large majority of teachers had sufficient data to produce student growth
reports. Overall, student achievement data were sufficient to compute value-added
scores and produce student growth reports for 80 percent of the teachers, who were in
grades 4-8.
• Although most teachers and principals participated in the student growth
report training, less than half of the teachers and principals accessed their
reports. Overall, 85 percent of teachers and 81 percent of principals participated in a
webinar prior to the release of the student growth reports. The webinar oriented the
participants to the value-added scores, the content of the student growth reports, and how
to access them. The online reporting system showed that 40 percent of the teachers with
value-added scores and 38 percent of the principals accessed their student growth reports.

What Were the Characteristics of the Student Growth Performance Information

Provided?
• Student growth reports included school and individual teachers’ value-
added scores. The teacher report included a teacher’s overall and subject-specific
value-added scores (both reading/English language arts and mathematics for those who
taught both subjects) with an indication of their percentile ranking relative to other
teachers in the district, and the average teacher score in the district and school. All scores
included confidence intervals/standard error information to indicate the precision of the
estimated scores. Each teacher could also access a roster that included the number and
names of students used to calculate their score. For each teacher in his or her school, the
principal could view an overall value-added score, scores by subject and grade, and
scores across time. Principals could also view school average scores overall and by
subject and grade.
• Many teachers with a student growth report had a value-added score that
measurably differed from the district average, particularly in mathematics.
The student growth reports available to teachers and principals included teachers’ value-
added scores along with an 80 percent confidence interval, which could be used to
determine whether the scores were “measurably” different from the district’s average
teacher. 16 For example, in mathematics, 25 percent of the teachers had a value-added

15
A value-added score for a given subject was produced for a teacher only if the teacher had at least 10 students who
had the necessary achievement data.
16
The student growth reports used an 80 percent confidence interval (i.e., the range of scores that have an 80 percent
chance of including the teacher’s “true” score) to identify scores that were “measurably” below or above average.
This benchmark was selected in order to appropriately balance the risk of misclassifying a teacher who is actually

Executive Summary ES-10 Year 1 Report

score that was considered measurably below the district average, and 28 percent had a
score that was considered measurably above average. See exhibit ES.3.

Exhibit ES.3. Distribution of treatment teachers based on whether their value-added

score was considered measurably above or below the district average, by subject

Exhibit Reads: For treatment teachers with mathematics value-added scores, 28 percent had scores considered measurably above
the district average.
NOTE: Distributions of teachers are based on whether the 80 percent confidence interval for a teacher’s value-added score was
above or below the district average. To indicate the amount of uncertainty around each teacher’s score, the student growth reports
included 80 percent confidence intervals, which showed the range of scores that have an 80 percent chance of including the
teacher’s “true” score. This benchmark was selected in order to appropriately balance two types of risks within the context of an
intervention designed to provide feedback on performance without explicit consequences such as promotion or dismissal: (1) the
risk of misidentifying truly average teachers as below- or above-average, and (2) the risk of misidentifying teachers who were truly
below- or above-average as average teachers. Sample size = 338 teachers with mathematics value-added scores and 321 teachers
with reading/English language arts value-added scores. Reported percentages may not sum to 100 percent because of rounding.
SOURCE: AIR value-added system.

The Principal Leadership Measure and Feedback

Feedback on principal leadership was based on the Vanderbilt Assessment of Leadership in
Education (VAL-ED), a 360-degree survey assessment administered twice a year to principals,
principal supervisors, and teachers. The VAL-ED includes six “core components” of principal
performance: high standards for student learning, rigorous curriculum, quality instruction,

average as above or below average, against the risk of misclassifying a teacher who is actually above or below
average as average. One consideration in striking this balance was that the study districts agreed that the value-
added scores would not be used for decisions with consequences for employment. This reduced the potential
downside associated with misidentifying an average teacher as below average.

Executive Summary ES-11 Year 1 Report

culture of learning and professional behavior, connections to external communities and
performance accountability. Principals are also rated on six “key processes”: planning,
implementing, supporting, advocating, communicating, and monitoring. A report for each
principal was generated after each administration of the VAL-ED, and the principal’s supervisor
was expected to discuss the report with the principal in a feedback session.

How Was the Principal Leadership Measure Implemented?

• All principals and their supervisors received training on using VAL-ED. All
principals and their supervisors participated in a two-hour VAL-ED training in summer
2012. During the school year, all principals’ supervisors also received a one-hour training
to prepare them to conduct the feedback sessions. In addition, teachers were offered a
one-hour introduction to VAL-ED at the beginning of the school year, as well as an
orientation webinar during the school year.
• All VAL-ED reports incorporated input from the principal, the principal’s
supervisor, and most teachers. All principals and their supervisors completed the
VAL-ED rating form, and a high percentage of teachers in each treatment school (80
percent in fall and 90 percent in spring on average) also completed the form.
• All VAL-ED feedback sessions occurred as planned. In both fall and spring, all
principals met with their supervisors to discuss their VAL-ED reports. Principal
supervisors reported feedback sessions lasting on average 52 minutes in the fall and 46
minutes in the spring.

What Were the Characteristics of the Principal Leadership Performance

Information Provided?
• The VAL-ED reports present scores and performance levels, as well as
percentile ranks, for each dimension of leadership. VAL-ED reports present an
overall score, a score for each core component, and a score for each key process. For
each of these 13 scores, the report additionally presents a performance label and a
percentile rank, relative to the principals included in a national VAL-ED field test. Each
score (i.e., overall score, core component scores, and key process scores) is an average
across the three respondent groups (i.e., principal, supervisor, and teachers), with each
group weighted equally. The report additionally shows the scores received from each
respondent group separately.
• The VAL-ED ratings classified some principals as lower-performing and
some as higher-performing. In the fall, principals’ overall scores were distributed
across the four performance levels (8 percent of principals were labeled distinguished, 22
percent proficient, 43 percent basic, and 27 percent below basic). In the spring
administration, half the principals received an overall score associated with a
performance level of proficient or distinguished and half received a score at the basic or
below basic level. 17 (See exhibit ES.4.)

17
The increase in average VAL-ED overall scores from the fall to spring is primarily a product of an increase in the
principal self-ratings. Average ratings of principal leadership based on the three respondent groups were similar in

Executive Summary ES-12 Year 1 Report

Exhibit ES.4. Distribution of treatment principals across performance levels based on
VAL-ED overall scores, by assessment window

Exhibit Reads: In fall 2012, 8 percent of treatment principals had a VAL-ED overall score at the distinguished performance level, 22
percent at the proficient level, 43 percent at the basic level, and 27 percent at the below basic level.
NOTE: Performance level distributions are based on principals’ VAL-ED overall scores at each assessment window. The overall
score is an average of the scores from the principal’s supervisor, teachers, and the principal’s own self-rated score, with each group
weighted equally. Sample size = 63 principals for both fall 2012 and spring 2013. Reported percentages may not sum to 100
percent because of rounding.
SOURCE: Fall 2012 and Spring 2013 VAL-ED Surveys.

• VAL-ED ratings provided by principals, supervisors, and teachers in the fall

were often too different to form a reliable measure, but the spring ratings
were consistent enough to distinguish between some lower- and higher-
performing principals. To provide information about a principal’s overall
effectiveness, the VAL-ED scores should communicate a consistent (i.e., reliable) message
about the principal’s effectiveness across the three respondent groups (the principal, the
principal’s supervisor, the principal’s teachers). Based on the literature on 360-degree
surveys, we would expect correlations between respondent group scores between .25
and .35. 18 In the fall, however, agreement among the three respondent groups’ overall
scores was low, with correlations ranging from .06 to .27. In the spring, correlations were
higher (between .26 and .38), and thus the reports provided a more consistent message

the fall; however, in the spring, principal self-ratings were higher on average (3.76) than the ratings from their
supervisors (3.50, p-value of the difference <.05) and teachers (3.57, p-value of the difference < .05).
18
For the VAL-ED correlations, see Porter et al. (2010). For the literature on 360-degree surveys, see Conway and
Huffcutt (1997).

Executive Summary ES-13 Year 1 Report

about a principal’s effectiveness. Viewing and discussing the fall reports may have led
principals and their supervisors to better align their ratings in the spring.

Educators’ Performance Evaluation Experiences

The study’s performance measures were intended to provide educators with performance
information that was more frequent, systematic, and useful as a guide for professional growth
than the information that they normally receive. To assess whether this occurred, we compared
the treatment and control groups’ responses on surveys administered in the spring. Teacher
surveys were usually completed at the beginning of the last of the four observation windows.
Principal surveys were completed prior to the spring VAL-ED feedback session, which usually
occurred at the end of the school year.

What Were Teachers’ Experiences?

• Treatment teachers reported receiving more feedback on both their
classroom practice and their students’ achievement growth than control
teachers. Treatment teachers reported receiving more feedback sessions with ratings
and a written narrative than control teachers (3.0 versus 0.7 instances). The average
treatment teacher also received a larger amount of oral feedback than the average control
teacher (80 minutes versus 18 minutes). Furthermore, relative to control teachers,
treatment teachers were more likely to report receiving value-added scores (45 percent
versus 24 percent) and less likely to report receiving test scores for individual students or
classroom average scores. 19
• Among those who reported receiving feedback, treatment teachers
indicated somewhat more positive perceptions than control teachers about
the information they received on their classroom practice but not about the
information on their students’ achievement. Although most teachers in both
treatment and control groups reported agreeing or strongly agreeing that the feedback on
their classroom practice provided specific ideas about how to improve, treatment teachers
were more likely to report so (87 percent versus 79 percent). Almost all teachers
(approximately 92 percent) in both groups indicated that the feedback on classroom
practice was a fair assessment of their performance. Control teachers were more likely
than treatment teachers to report that the student achievement information they received
was easy to understand (89 percent versus 78 percent). However, less than half of the
teachers in both groups agreed or strongly agreed that the achievement information was a
fair assessment of their performance (49 percent for treatment teachers and 43 percent for
control teachers, not a statistically significant difference) or a fair indicator of teacher
effectiveness for all teachers (40 percent for treatment teachers versus 29 percent for
control teachers, a statistically significant difference).

19
This finding should be interpreted with caution because some teachers may not have had a correct understanding
of the term “value-added scores.” As a validity check, we compared treatment teachers’ responses with electronic
records indicating which teachers had accessed their own value-added scores in the online system, and we found that
34 percent of the treatment teachers who reported receiving value-added scores did not access their student growth
reports in the online system, and 17 percent of treatment teachers who reported not receiving value-added scores
actually accessed their online student growth reports.

Executive Summary ES-14 Year 1 Report

What Were Principals’ Experiences?
• Treatment principals reported receiving more feedback than control
principals. Treatment principals reported receiving feedback more often than control
principals (2.0 versus 1.4 instances) and more instances of oral feedback with ratings (1.0
versus 0.4 instances). The average treatment principal also received more oral feedback
than the average control principal (60 minutes versus 41 minutes). However, treatment
principals were no more likely than control principals to report that their supervisors’
feedback focused on specific topics related to VAL-ED.
• Among those who reported receiving feedback, most principals in both
treatment and control schools had positive perceptions about the feedback
they received. The majority (more than 70 percent) of the principals in both treatment
and control schools agreed that the feedback they received was a fair assessment of their
performance, and approximately two thirds or more of the principals agreed that the
feedback they received contained specific ideas for improving their performance. Among
those who received feedback, there was no statistically significant difference between
treatment and control principals in their perceptions of the feedback.

Future Report
This report focuses on findings from the first year of implementation of the study’s three
performance measures with feedback. Findings about the second year of implementation will be
presented in the second-year study report. The second-year report also will present findings on
the impact of the study’s performance measures and feedback on teacher classroom practice,
principal leadership, and student achievement.

Executive Summary ES-15 Year 1 Report

Impact of Performance Feedback in Education
No ratings yet
Impact of Performance Feedback in Education
314 pages
EDUC 6 - Assessment in Learning 1
No ratings yet
EDUC 6 - Assessment in Learning 1
155 pages
Peer Observation in Teacher Evaluation
No ratings yet
Peer Observation in Teacher Evaluation
30 pages
Strategies to Boost Student Achievement
No ratings yet
Strategies to Boost Student Achievement
4 pages
Barriers to Data-Driven School Improvement
No ratings yet
Barriers to Data-Driven School Improvement
26 pages
Teacher Effectiveness and Performance Evaluations
No ratings yet
Teacher Effectiveness and Performance Evaluations
53 pages
School Reform Strategies Analysis
No ratings yet
School Reform Strategies Analysis
9 pages
HR Practices and Teacher Performance Alignment
No ratings yet
HR Practices and Teacher Performance Alignment
28 pages
Suervision and Instruction
No ratings yet
Suervision and Instruction
5 pages
Revisiting the Widget Effect in Education
No ratings yet
Revisiting the Widget Effect in Education
38 pages
Principals' Perspectives: Professional Learning and Marginal Teachers On Formal Plans of Improvement
No ratings yet
Principals' Perspectives: Professional Learning and Marginal Teachers On Formal Plans of Improvement
35 pages
Teacher Evaluation Systems in Schools
No ratings yet
Teacher Evaluation Systems in Schools
54 pages
TFA and Teaching Fellows Math Teacher Effectiveness
No ratings yet
TFA and Teaching Fellows Math Teacher Effectiveness
264 pages
Effective Teaching - Chapt 1
No ratings yet
Effective Teaching - Chapt 1
9 pages
Impact of Classroom Assessment on Learning
No ratings yet
Impact of Classroom Assessment on Learning
18 pages
Research-Based Teacher Evaluations Guide
No ratings yet
Research-Based Teacher Evaluations Guide
10 pages
Streamlining Teacher Evaluation Systems
No ratings yet
Streamlining Teacher Evaluation Systems
13 pages
Evaluating Teacher Effectiveness Challenges
No ratings yet
Evaluating Teacher Effectiveness Challenges
6 pages
Taylor & Francis, LTD
No ratings yet
Taylor & Francis, LTD
19 pages
What Makes Great Teaching FINAL 4.11.14
No ratings yet
What Makes Great Teaching FINAL 4.11.14
57 pages
Hill 2013
No ratings yet
Hill 2013
15 pages
Teacher Experiences in Educational Design Research
No ratings yet
Teacher Experiences in Educational Design Research
19 pages
Ed 520519
No ratings yet
Ed 520519
34 pages
Differentiated Instruction Strategies for Teachers
No ratings yet
Differentiated Instruction Strategies for Teachers
9 pages
Effective Feedback Program for Principals
No ratings yet
Effective Feedback Program for Principals
17 pages
Improving Classroom Processes Through Observation
No ratings yet
Improving Classroom Processes Through Observation
11 pages
21st Century Student Assessment Strategies
No ratings yet
21st Century Student Assessment Strategies
46 pages
Effective Feedback for Teacher Improvement
No ratings yet
Effective Feedback for Teacher Improvement
12 pages
Enhancing Teacher Effectiveness
No ratings yet
Enhancing Teacher Effectiveness
8 pages
Teacher Development Strategies for Impact
No ratings yet
Teacher Development Strategies for Impact
25 pages
SEST2GAG1ZB3UPWUH Providing Instructional Leadership To Improve Student Learning 11-7-25
No ratings yet
SEST2GAG1ZB3UPWUH Providing Instructional Leadership To Improve Student Learning 11-7-25
22 pages
Effective Teaching Strategies Manual
83% (6)
Effective Teaching Strategies Manual
301 pages
Formative Assessment for Student Success
No ratings yet
Formative Assessment for Student Success
2 pages
Strengthening Teacher Evaluation - What District Leaders Can Do
No ratings yet
Strengthening Teacher Evaluation - What District Leaders Can Do
6 pages
Getting It Right
No ratings yet
Getting It Right
44 pages
Leadership Strategies for Student Success
No ratings yet
Leadership Strategies for Student Success
5 pages
Reflective Practitioner Reading List
No ratings yet
Reflective Practitioner Reading List
7 pages
Teacher Performance and Educational Quality
No ratings yet
Teacher Performance and Educational Quality
12 pages
Implementing Teacher Evaluation Systems
No ratings yet
Implementing Teacher Evaluation Systems
55 pages
Evidence-Based Teacher Development
100% (1)
Evidence-Based Teacher Development
36 pages
Actionable Assessment - A Step-by-Step Guide To Responsive Teaching and Student Growth
No ratings yet
Actionable Assessment - A Step-by-Step Guide To Responsive Teaching and Student Growth
181 pages
edTPA Impact on Teacher Preparation
No ratings yet
edTPA Impact on Teacher Preparation
35 pages
Enhancing EFL Teachers' Evaluation Skills
No ratings yet
Enhancing EFL Teachers' Evaluation Skills
20 pages
Effective Classroom Assessment Strategies
No ratings yet
Effective Classroom Assessment Strategies
7 pages
Teacher Performance and HR Alignment
No ratings yet
Teacher Performance and HR Alignment
27 pages
Teacher Performance Impact on Schools
No ratings yet
Teacher Performance Impact on Schools
5 pages
Practical Strategies for AfL and DI
No ratings yet
Practical Strategies for AfL and DI
23 pages
Framework for Great Teaching Assessment
No ratings yet
Framework for Great Teaching Assessment
55 pages
Enhancing Teacher Evaluation Practices
No ratings yet
Enhancing Teacher Evaluation Practices
16 pages
Student-Centered Learning Strategies
No ratings yet
Student-Centered Learning Strategies
69 pages
Teacher Preparation Program Evaluation
No ratings yet
Teacher Preparation Program Evaluation
52 pages
Classroom Organization and Reading Skills
No ratings yet
Classroom Organization and Reading Skills
5 pages
Impact of PD on Filipino Teachers' Assessment
No ratings yet
Impact of PD on Filipino Teachers' Assessment
24 pages
Teacher Performance Assessments Overview
No ratings yet
Teacher Performance Assessments Overview
36 pages
Subject Index for Chinese Science Bulletin
No ratings yet
Subject Index for Chinese Science Bulletin
21 pages
High-Quality Education in Juvenile Justice
No ratings yet
High-Quality Education in Juvenile Justice
38 pages
Adult Reading Habits Survey Findings
No ratings yet
Adult Reading Habits Survey Findings
6 pages
American Indian Children ECLS-B Findings
No ratings yet
American Indian Children ECLS-B Findings
90 pages
Group Bias in Outgroup Evaluations
No ratings yet
Group Bias in Outgroup Evaluations
7 pages
American Bible Society Record - Feb 1929
No ratings yet
American Bible Society Record - Feb 1929
16 pages
Bible Society Activities in Port Elizabeth
No ratings yet
Bible Society Activities in Port Elizabeth
16 pages
One Dollar a Year: Bible Society Update
No ratings yet
One Dollar a Year: Bible Society Update
16 pages
American Bible Society Record March 1930
No ratings yet
American Bible Society Record March 1930
16 pages
Bible Society Record: October 1931
No ratings yet
Bible Society Record: October 1931
16 pages
1930s Biblical Version Comparisons
No ratings yet
1930s Biblical Version Comparisons
16 pages
Tourist Satisfaction at GWK Bali
No ratings yet
Tourist Satisfaction at GWK Bali
13 pages
Factors Affecting EMIS in Kisumu East
No ratings yet
Factors Affecting EMIS in Kisumu East
118 pages
LAS Prac1 Q4W1to4
No ratings yet
LAS Prac1 Q4W1to4
15 pages
Biotite Thermobarometry via Machine Learning
No ratings yet
Biotite Thermobarometry via Machine Learning
23 pages
Module 3 - Sample 2
No ratings yet
Module 3 - Sample 2
25 pages
Effects of Smoking on Grade 9 Students
No ratings yet
Effects of Smoking on Grade 9 Students
12 pages
Challenges in Ethiopia's Purchasing System
No ratings yet
Challenges in Ethiopia's Purchasing System
39 pages
IDEA vs TATA Indicom: PCO Analysis
No ratings yet
IDEA vs TATA Indicom: PCO Analysis
101 pages
Korean vs. Local Cuisine Preferences in Teens
No ratings yet
Korean vs. Local Cuisine Preferences in Teens
25 pages
TB Unit8ProgressCheckMCQPartB 69d72f0e9dc165.69d72f104a3847.86719802
No ratings yet
TB Unit8ProgressCheckMCQPartB 69d72f0e9dc165.69d72f104a3847.86719802
10 pages
Purchase Orders Overview
No ratings yet
Purchase Orders Overview
14 pages
Statistical Method
No ratings yet
Statistical Method
191 pages
Research Methodology Syllabus Overview
No ratings yet
Research Methodology Syllabus Overview
2 pages
Communication's Role in Corporate Success
No ratings yet
Communication's Role in Corporate Success
19 pages
Cost Management and Analysis Quiz
No ratings yet
Cost Management and Analysis Quiz
14 pages
Factors Impacting Female Students' Performance
No ratings yet
Factors Impacting Female Students' Performance
42 pages
Fieldwork Methodologies in Development Research
No ratings yet
Fieldwork Methodologies in Development Research
33 pages
Leveraging Artificial Intelligence For Smart Investment Decisions
No ratings yet
Leveraging Artificial Intelligence For Smart Investment Decisions
12 pages
CA Articleship Interview Questions
No ratings yet
CA Articleship Interview Questions
12 pages
Research Protocol Format Guide
No ratings yet
Research Protocol Format Guide
3 pages
IBC183 Descriptive Statistics Overview
No ratings yet
IBC183 Descriptive Statistics Overview
11 pages
Functional Bid Landscape Forecasting
No ratings yet
Functional Bid Landscape Forecasting
16 pages
Safety Awareness at Bengal Metal Works
No ratings yet
Safety Awareness at Bengal Metal Works
58 pages
Arabic Language's Impact on Saudi Ads
No ratings yet
Arabic Language's Impact on Saudi Ads
15 pages
Social Media Cybercrime: Scams & Identity Theft
No ratings yet
Social Media Cybercrime: Scams & Identity Theft
9 pages
Document Analysis and Data Processing
100% (1)
Document Analysis and Data Processing
13 pages
Understanding ABM Strand Choices
No ratings yet
Understanding ABM Strand Choices
1 page
Financial Planning for Individual Investors
No ratings yet
Financial Planning for Individual Investors
61 pages
Achievement Test for Ninth-Grade Social Studies
No ratings yet
Achievement Test for Ninth-Grade Social Studies
3 pages

Teacher and Principal Performance Evaluation Findings

Uploaded by

Teacher and Principal Performance Evaluation Findings

Uploaded by

U.S.

Early Implementation Findings

Institute of Education Sciences

National Center for Education Evaluation and Regional Assistance

This report is available on the IES website at [Link]

Acknowledgments vii Year 1 Draft Report

Disclosure of Potential Conflicts of Interest ix Year 1 Draft Report

Executive Summary ES-1 Year 1 Report

The main findings in this report are as follows:

Executive Summary ES-2 Year 1 Report

Executive Summary ES-3 Year 1 Report

Data on the implementation of the intervention. We documented attendance at

Data on measures of educator performance. Data on measures of teacher classroom

Data on educators’ experiences with performance feedback. In spring 2013, we

Executive Summary ES-4 Year 1 Report

Data on the characteristics of study participants. To compare the characteristics of

Detailed Summary of Findings

The Classroom Practice Measure and Feedback

How Many Observations Were Conducted and What Were Observers’

Executive Summary ES-5 Year 1 Report

What Were the Characteristics of the Classroom Practice Performance

Executive Summary ES-6 Year 1 Report

Executive Summary ES-7 Year 1 Report

Executive Summary ES-8 Year 1 Report

The Student Growth Measure

Executive Summary ES-9 Year 1 Report

Who Received the Student Growth Performance Information?

What Were the Characteristics of the Student Growth Performance Information

Executive Summary ES-10 Year 1 Report

Exhibit ES.3. Distribution of treatment teachers based on whether their value-added

The Principal Leadership Measure and Feedback

Executive Summary ES-11 Year 1 Report

How Was the Principal Leadership Measure Implemented?

What Were the Characteristics of the Principal Leadership Performance

Executive Summary ES-12 Year 1 Report

• VAL-ED ratings provided by principals, supervisors, and teachers in the fall

Executive Summary ES-13 Year 1 Report

Educators’ Performance Evaluation Experiences

What Were Teachers’ Experiences?

Executive Summary ES-14 Year 1 Report

Executive Summary ES-15 Year 1 Report

You might also like