Summary
Manual rating updates can leave prompt aggregate metrics out of sync with the stored result rows. The rating route updates the row's success, score, and gradingResult, but the prompt-level aggregate updates are currently tied to pass/fail transition branches.
Why this matters
Prompt metrics are derived elsewhere by summing the stored result rows. The rating route should preserve that invariant after manual edits.
Concrete cases that can drift:
- a passing result keeps
pass=true but changes score
- a failing result keeps
pass=false but changes score
- a score-only or comment/highlight-only update reuses the rating route without adding a human assertion component
- clearing an existing human rating changes
componentResults and should decrement assertion counts
Proposed change
Update the rating route by applying before/after deltas from the stored row to the submitted row:
- update aggregate score by
nextScore - previousScore
- update pass/fail/error test counts from previous result category to next result category
- update assertion pass/fail counts from previous
componentResults counts to next componentResults counts
Validation
Focused coverage should include same-pass score changes, same-fail score changes, existing manual rating score changes, and clearing a manual rating.
Summary
Manual rating updates can leave prompt aggregate metrics out of sync with the stored result rows. The rating route updates the row's
success,score, andgradingResult, but the prompt-level aggregate updates are currently tied to pass/fail transition branches.Why this matters
Prompt metrics are derived elsewhere by summing the stored result rows. The rating route should preserve that invariant after manual edits.
Concrete cases that can drift:
pass=truebut changesscorepass=falsebut changesscorecomponentResultsand should decrement assertion countsProposed change
Update the rating route by applying before/after deltas from the stored row to the submitted row:
nextScore - previousScorecomponentResultscounts to nextcomponentResultscountsValidation
Focused coverage should include same-pass score changes, same-fail score changes, existing manual rating score changes, and clearing a manual rating.