Skip to content

kafka: add a reserved Redpanda-specific Kafka API-key range#30731

Open
nguyen-andrew wants to merge 4 commits into
redpanda-data:devfrom
nguyen-andrew:kafka-redpanda-api-range
Open

kafka: add a reserved Redpanda-specific Kafka API-key range#30731
nguyen-andrew wants to merge 4 commits into
redpanda-data:devfrom
nguyen-andrew:kafka-redpanda-api-range

Conversation

@nguyen-andrew

@nguyen-andrew nguyen-andrew commented Jun 6, 2026

Copy link
Copy Markdown
Member

Shadow linking needs to read the roles on its source cluster. Those roles are
normally read over the Admin API, but some deployment modes do not expose it,
leaving the Kafka API as the only available path to the source cluster. Reading
roles over that path requires a Redpanda-specific Kafka API, which did not exist
before.

Apache assigns Kafka API keys sequentially and is still under 100 today, so
placing Redpanda's range well above that keeps it clear of future standard
assignments and of keys other Kafka implementations may already use. This PR
reserves a Redpanda range starting at key 15000 and makes request dispatch,
flex-version (tagged-field) parsing, and per-key metrics aware of it, without
bloating the standard-range data structures. The plumbing is kept separate from
the standard tables so max_api_key() continues to derive only from standard
Apache keys.

DescribeRedpandaRoles (key 15000) is the first API in the range and the one
shadow linking needs; here it serves as the proving example and simply returns
an empty role list. The API is recognized and dispatched internally but is
intentionally not advertised in ApiVersions yet, so it is not externally
discoverable until it returns real data. That, along with the role_store wiring
and enumeration, lands in the stacked follow-up PR.

This is the base of a two-PR stack; the follow-up (describe-redpanda-roles-api)
wires the cluster role_store into the handler, returns real role data with
name filters, and advertises the API in ApiVersions as its final step.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x

Release Notes

  • none

Copilot AI review requested due to automatic review settings June 6, 2026 03:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a reserved Redpanda-specific Kafka API key range (base key 15000) and the supporting plumbing so these keys can be dispatched, parsed (flex/tagged fields), and measured (per-key metrics/probes) without inflating the standard “dense” key-indexed tables. Adds DescribeRedpandaRoles (15000) as the first custom API, with wire schema + round-trip tests and a stub handler that enforces cluster DESCRIBE authorization and returns an empty role list.

Changes:

  • Add DescribeRedpandaRoles protocol schemata, generated message types, and encode/decode round-trip tests.
  • Extend handler dispatch, flex-version lookup, and handler probe/metrics plumbing to support a rebased custom-key table for the reserved range.
  • Fix throughput-controlled API-key bitmap indexing so out-of-range keys don’t throw and disconnect clients.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/v/kafka/server/tests/handler_probe_test.cc Adds a regression test ensuring reserved-range keys map to custom probe storage without OOB access.
src/v/kafka/server/tests/handler_interface_test.cc Adds tests for reserved-range dispatch and flex-version/schema recognition.
src/v/kafka/server/tests/BUILD Registers new server-side unit tests and adds needed deps.
src/v/kafka/server/handlers/handlers.h Adds a separate redpanda_request_types handler type-list for reserved-range APIs.
src/v/kafka/server/handlers/handler_probe.h Adds _custom_probes storage for reserved-range per-handler probes.
src/v/kafka/server/handlers/handler_probe.cc Initializes and routes reserved-range probes via rebased offsets.
src/v/kafka/server/handlers/handler_interface.cc Adds reserved-range dispatch LUT (make_custom_lut) and lookup path in handler_for_key.
src/v/kafka/server/handlers/describe_redpanda_roles.h Declares the new DescribeRedpandaRoles handler type.
src/v/kafka/server/handlers/describe_redpanda_roles.cc Implements the handler with authorization + audit failure handling and empty response.
src/v/kafka/server/connection_context.cc Bounds-checkes throughput-controlled API-key bitmap accesses to avoid std::out_of_range.
src/v/kafka/server/BUILD Adds the new handler sources/headers and protocol deps to the server library.
src/v/kafka/protocol/types.h Introduces redpanda_api_key_base constant (15000).
src/v/kafka/protocol/tests/describe_redpanda_roles_test.cc Adds round-trip encode/decode tests for new request/response data types.
src/v/kafka/protocol/tests/BUILD Registers the new protocol unit test.
src/v/kafka/protocol/schemata/generator.py Adds new struct type names to the schema generator’s struct-type list.
src/v/kafka/protocol/schemata/generator.bzl Adds describe_redpanda_roles to the list of generated messages.
src/v/kafka/protocol/schemata/describe_redpanda_roles_response.json Defines the wire schema for the new response (flexible v0+).
src/v/kafka/protocol/schemata/describe_redpanda_roles_request.json Defines the wire schema for the new request (nullable filter list, flexible v0+).
src/v/kafka/protocol/messages.h Adds new request schema include and defines protocol-side redpanda_request_types.
src/v/kafka/protocol/flex_versions.cc Adds a rebased flex-version table for reserved-range API keys and routes lookups accordingly.
src/v/kafka/protocol/describe_redpanda_roles.h Adds protocol request/response wrapper types for DescribeRedpandaRoles.

Comment thread src/v/kafka/server/handlers/handler_probe.cc
Comment thread src/v/kafka/server/tests/handler_interface_test.cc Outdated
Comment thread src/v/kafka/protocol/tests/describe_redpanda_roles_test.cc Outdated
@nguyen-andrew nguyen-andrew self-assigned this Jun 6, 2026
@nguyen-andrew nguyen-andrew marked this pull request as draft June 6, 2026 03:40

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.

The throughput-controlled API key bitmap is sized to max_api_key()+1
(~69 entries, the standard Kafka range), but the three sites that consult
it indexed with .at(request_key). A request carrying an API key beyond the
standard range therefore threw std::out_of_range in the throttle path,
before the request reached the handler router. The error path treats that
as a short-read disconnect, so the connection is dropped instead of the
key being rejected as an unsupported API.

Guard the three sites with an explicit size check so out-of-range keys
fall through as not throughput-controlled, mirroring the bounds check
handler_for_key already applies to its dispatch table.

This also makes the reserved Redpanda API-key range (15000+) safe to
dispatch.
Kafka API keys are assigned sequentially by Apache and are still under
100 today. To leave room for future standard assignments and avoid
colliding with keys other Kafka implementations may already use, custom
Redpanda APIs start at 15000, well above the standard range;
DescribeRedpandaRoles is the first and takes that base key.

A key that high can't simply be added to the standard dispatch
structures. Kafka request routing uses three tables indexed directly by
API key and sized to the largest key present: the handler lookup table
(handler_interface), the flex-version map (flex_versions), and the
per-shard handler_probe vector. Registering key 15000 in them would grow
each to ~15000 mostly-empty entries, and because the probe vector is
per-shard, that waste is multiplied across cores.

So the Redpanda reserved range is kept separate. max_api_key() still
derives only from the standard request_types, leaving those three
structures sized to the standard range, and each site gains a second
table for the reserved range. That table is rebased, indexed by
key - redpanda_api_key_base, so its size tracks the span of the custom
range (one entry today). Standard keys take the existing path unchanged;
only keys at or above the base fall through to this secondary lookup.
@nguyen-andrew nguyen-andrew force-pushed the kafka-redpanda-api-range branch from fc9aa50 to 335a5bd Compare June 6, 2026 04:44
@nguyen-andrew

Copy link
Copy Markdown
Member Author

Force pushes to address copilot review comments and pull auth out of describe_redpanda_roles_handler (to be added in the follow up stacked PR).

@nguyen-andrew

Copy link
Copy Markdown
Member Author

/ci-repeat 1

@vbotbuildovich

vbotbuildovich commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

CI test results

test results on build#85479
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkingReplicationTests test_with_restart {"storage_mode": "tiered"} integration https://buildkite.com/redpanda/redpanda/builds/85479#019e9b52-f2b3-416b-a8da-c1b55f67fff1 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0460, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1317, p1=0.2436, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart
test results on build#85564
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FLAKY(PASS) ShadowLinkTopicFailoverTests test_link_failover {"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "tiered_cloud", "with_failures": false} integration https://buildkite.com/redpanda/redpanda/builds/85564#019eae81-1655-41c3-b35b-cea4bb5a3673 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkTopicFailoverTests&test_method=test_link_failover
FLAKY(PASS) ShadowLinkingReplicationTests test_auto_prefix_trimming {"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "cloud", "with_failures": true} integration https://buildkite.com/redpanda/redpanda/builds/85564#019eae81-1651-4092-a0f7-3ac192c498af 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0044, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming
FLAKY(PASS) WriteCachingFailureInjectionE2ETest test_crash_all {"use_transactions": false} integration https://buildkite.com/redpanda/redpanda/builds/85564#019eae85-b4ac-40f2-ab7d-d3768feca5b8 10/11 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0954, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.2598, p1=0.0494, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

@nguyen-andrew nguyen-andrew marked this pull request as ready for review June 9, 2026 22:07
@nguyen-andrew nguyen-andrew requested review from a team, dotnwat, nvartolomei and pgellert and removed request for a team June 9, 2026 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants