Skip to content

Keep manual rating aggregate metrics in sync #9346

@fallintoplace

Description

@fallintoplace

Summary

Manual rating updates can leave prompt aggregate metrics out of sync with the stored result rows. The rating route updates the row's success, score, and gradingResult, but the prompt-level aggregate updates are currently tied to pass/fail transition branches.

Why this matters

Prompt metrics are derived elsewhere by summing the stored result rows. The rating route should preserve that invariant after manual edits.

Concrete cases that can drift:

  • a passing result keeps pass=true but changes score
  • a failing result keeps pass=false but changes score
  • a score-only or comment/highlight-only update reuses the rating route without adding a human assertion component
  • clearing an existing human rating changes componentResults and should decrement assertion counts

Proposed change

Update the rating route by applying before/after deltas from the stored row to the submitted row:

  • update aggregate score by nextScore - previousScore
  • update pass/fail/error test counts from previous result category to next result category
  • update assertion pass/fail counts from previous componentResults counts to next componentResults counts

Validation

Focused coverage should include same-pass score changes, same-fail score changes, existing manual rating score changes, and clearing a manual rating.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions