Keep manual rating aggregate metrics in sync

## Summary

Manual rating updates can leave prompt aggregate metrics out of sync with the stored result rows. The rating route updates the row's `success`, `score`, and `gradingResult`, but the prompt-level aggregate updates are currently tied to pass/fail transition branches.

## Why this matters

Prompt metrics are derived elsewhere by summing the stored result rows. The rating route should preserve that invariant after manual edits.

Concrete cases that can drift:

- a passing result keeps `pass=true` but changes `score`
- a failing result keeps `pass=false` but changes `score`
- a score-only or comment/highlight-only update reuses the rating route without adding a human assertion component
- clearing an existing human rating changes `componentResults` and should decrement assertion counts

## Proposed change

Update the rating route by applying before/after deltas from the stored row to the submitted row:

- update aggregate score by `nextScore - previousScore`
- update pass/fail/error test counts from previous result category to next result category
- update assertion pass/fail counts from previous `componentResults` counts to next `componentResults` counts

## Validation

Focused coverage should include same-pass score changes, same-fail score changes, existing manual rating score changes, and clearing a manual rating.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keep manual rating aggregate metrics in sync #9346

Summary

Why this matters

Proposed change

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Keep manual rating aggregate metrics in sync #9346

Description

Summary

Why this matters

Proposed change

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions