Skip to content

fix: gate the token encoding with a dynamic config for mixed brain scenarios (#10685)#10691

Open
lilydoar wants to merge 1 commit into
cloud/v1.32.0-156from
spk/dc-gate-cb-token
Open

fix: gate the token encoding with a dynamic config for mixed brain scenarios (#10685)#10691
lilydoar wants to merge 1 commit into
cloud/v1.32.0-156from
spk/dc-gate-cb-token

Conversation

@lilydoar

Copy link
Copy Markdown
Contributor

What changed?

Backport of #10685 onto the cloud/v1.32.0-156 release branch.

Gates the CHASM Nexus completion callback token encoding behind a new
per-namespace dynamic config callback.encodeInternalTokenWithEnvelope
(EncodeInternalTokenWithEnvelope, default false). When enabled the token
is the NexusOperationCompletion envelope (carrying the request ID); when
disabled it is the legacy bare base64-encoded ChasmComponentRef. Either form
is always decodable by UnpackNexusCallbackToken.

Why?

#10605 (present on this release branch) added a request ID to the encoded
callback token. That change is not backwards compatible and can cause
callbacks to fail to deliver in a split-brain / mixed-version fleet. Gating the
envelope encoding lets the new format stay off the wire until every server in
the fleet can read it, then it can be enabled per-namespace.

Scope of this backport — production code only

This is a production-source-only cherry-pick of 09e706e10. The test
changes from #10685 were intentionally omitted:

  • This release branch is ~201 commits behind main and uses an older test
    architecture: scheduleCommonOpts() takes no *testing.T, CallbacksSuite
    uses plain-testify TestXxx() methods (not the parametrized
    TestXxx(opts []testcore.TestOption) form), and the chasmContextFactory
    helper does not exist here.
  • fix: gate the token encoding with a dynamic config for mixed brain scenarios #10685's tests depend on that main-only infrastructure and do not apply
    mechanically. Backporting the infrastructure would substantially widen the
    patch and raise risk.
  • Test coverage for this change is validated on main.

Prerequisite #10605 (bc93fd9c5) is present on this branch, so the gate is
meaningful.

Files

  • chasm/nexus_completion.goGenerateNexusCallback gains an encodeToken bool
  • chasm/lib/callback/config.go — new EncodeInternalTokenWithEnvelope dynamic config
  • chasm/lib/callback/invocable_internal.go — whitespace
  • chasm/lib/scheduler/config.go — wire the gate into Config
  • chasm/lib/scheduler/{invoker_tasks,scheduler_tasks}.go — pass the gate per namespace

How did you test it?

  • go build ./chasm/...
  • go vet ./chasm/...
  • tests package compiles
  • gofmt clean

Default (false) preserves the existing legacy behavior on this release branch.

Potential risks

Low. Default-off dynamic config; no behavior change until explicitly enabled
per-namespace. Both token formats remain decodable.

Patch process

Self-service patch. Requires Manager+ approval covering patch justification,
customer impact, risk, and test plan, plus recording in the Cloud Releases
patch table before merge.

@lilydoar lilydoar requested review from a team June 13, 2026 00:22
@lilydoar lilydoar requested review from a team as code owners June 13, 2026 00:22
…enarios (#10685)

In #10605 a request ID was added to the encoded callback token to fix a
bug with CaN callbacks not being find-able by schedules. This change
will put this migration behind a dynamic config because the change is
not backwards compatible.

scenario

- [ ] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [X] added new functional test(s)

Adding the dynamic config gating should reduce the risks, there is a
functional test to validate turning this on will still have the callback
trigger as expected
@lilydoar lilydoar force-pushed the spk/dc-gate-cb-token branch from 07dc72e to 4399d0d Compare June 13, 2026 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants