Skip to content

[SQL] compiler support for grouping nodes into regions #6305

Merged
mihaibudiu merged 4 commits into
feldera:mainfrom
mihaibudiu:issue6285
May 29, 2026
Merged

[SQL] compiler support for grouping nodes into regions #6305
mihaibudiu merged 4 commits into
feldera:mainfrom
mihaibudiu:issue6285

Conversation

@mihaibudiu

Copy link
Copy Markdown
Contributor

The profiler uses a notion of "regions" to group related nodes together. So far, regions were used only for nodes which look like primitive nodes for the SQL compiler, but are actually expanded into graphs by DBSP. With this PR we add the ability of the compiler itself to group multiple operators into a region. The compiler promises that regions are always connected subgraphs of the circuit graph.

There are 4 commits, each with a separate change. (Two commits fix small issues found elsewhere.)

The compiler also groups operators used for producing the "zero" of aggregate functions without group-by using a new region "agg_zero". We will try to add more regions in the future, now that the infrastructure exists. These regions are introduced very early in compilation, and may not survive until the end, because the optimizer may rearrange these operators.

Fixes #6285

Describe Manual Test Plan

I ran all Java tests and some python tests manually

@mihaibudiu mihaibudiu requested a review from ryzhyk May 23, 2026 02:49
@mihaibudiu

Copy link
Copy Markdown
Contributor Author

@ryzhyk your review is mostly necessary for the new region APIs. These are initially written by Claude, but then massaged manually.

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting findings without pre-empting @ryzhyk on the core region API. Infrastructure looks sound; regions are intentionally excluded from OperatorHash so this doesn't touch checkpoint compat. Tests cover both the dbsp open/close semantics and the SQL TagRegions conflict path. A few real issues plus some cleanup below.

Findings (in order of seriousness):

  1. ExpandAggregateZeros.crdId is a static int (process-global, never reset). Two compilations in the same JVM share state, so the IDs leak from one program into the next — the first program's regions get IDs 0..N-1, the next program starts at N. In the Java test suite (MultiCrateTests.testAggregates and any other test that builds multiple circuits with global aggregates) the generated Rust differs between an isolated run and a run inside the full suite. Move it to an instance field of ExpandAggregateZeros, or reset it in preorder(DBSPCircuit). Inline.

  2. Doc/terminology drift: Circuit::create_region rustdoc says "Returns a [RegionId] that can be passed to [Circuit::open_region]", but the type is RegionName and there is no public RegionId. open_region/close_region parameter is named id: RegionName but the doc text repeatedly calls it id. Either rename the type to RegionId (matches the docs and the id() accessor) or rewrite the docs to consistently say RegionName. Inline.

  3. CloseRegion { name: _ } in monitor.rs discards the name and just pops the stack. The contract is "caller is responsible for pairing every open_region with a matching close_region" — fine for the generated code path, but close_named_region() could debug_assert_eq!(saved_name, name) (or at least the monitor could compare names) so caller mistakes don't silently land operators in the wrong region. Cheap, catches a real class of bug. Inline.

  4. TagRegions.DetectConflicts: when an operator has multiple RegionAnnotations, you sort by asVarName() (a string like agg_zero_<int>) and keep the first "alphabetically". That's lex sort, so agg_zero_10 < agg_zero_2 < agg_zero_3. Determinism is preserved but the tiebreaker isn't "smallest id" — if that's intentional, fine; if not, sort by (getTag(), getId()). Currently moot because GlobalAggregate is the only RegionAnnotation and a single operator only gets one of them, but the code is written generically and the test testAnnotationTwoAnnotations doesn't exercise this branch. Inline.

  5. Five trailing ;; in ExpandAggregateZero.postorder (lines 103, 109, 112, 116, 120). Compiles, but trips strict lints and is just noise. Inline.

Soft / non-blocking:

  • Circuit::open_region/close_region take RegionName by value, which forces every caller (including the two new unit tests) to .clone(). Switching to &RegionName would let CircuitEvent::open_region clone once at log time and free callers from the boilerplate.
  • PR description says "I ran all Java tests and some python tests manually." Code-generation changes that thread a new parameter through SingleOperatorWriter/NestedOperatorWriter/CircuitWriter deserve at least the standard Python integration suite — worth confirming green CI before merge.
  • Regression3Tests.issue6397 (LAG over -Infinity/0/Infinity) is a real fix but landing it in a PR titled "compiler support for grouping nodes into regions" makes archaeology harder later. Already a separate commit, so a quick git rebase --interactive to split off the doc + LAG + small fixes from the main region work would be ideal.

None are hard blockers on top of @ryzhyk's pending API review; #1 (static crdId) is the only one with real teeth.

Comment thread crates/dbsp/src/circuit/circuit_builder.rs Outdated
Comment thread crates/dbsp/src/circuit/trace.rs

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after force-push (ea37a584c8ce54). Same 4 commits, rebased onto fresh main.

Addressed from my prior review (4349733344):

  • ExpandAggregateZeros.crdId is now an instance field and reset in startVisit — finding #1 resolved.
  • ✅ Five trailing ;; typos in ExpandAggregateZero.postorder are gone — finding #5 resolved.
  • ⚠️ The rustdoc on create_region now correctly says RegionName, but the doc on open_region/close_region still reads "the circuit region identified by id" while the parameter is id: RegionName. Mostly cosmetic now — partial fix of finding #2.

Still open from the prior review:

  • Finding #3 (CloseRegion { name: _ } in monitor.rs discards the name; a debug_assert_eq! in close_named_region would catch open/close mismatches cheaply).
  • Finding #4 (TagRegions.DetectConflicts sorts duplicate RegionAnnotations by asVarName(), which is lex order on "<tag>_<id>"; sort by (getTag, getId) if the intent is "smallest id").

None are hard blockers; still deferring to @ryzhyk on the core region API. No new inline comments — the threads from review 4349733344 are still the canonical anchors.

@mihaibudiu

Copy link
Copy Markdown
Contributor Author
image

@mihaibudiu

Copy link
Copy Markdown
Contributor Author

Example generated code:
main crate:

        let s1 = create_operator_41b26629cf07e97f(&circuit, Some("782a1cb2522c307099fcede79d287ead5d4d5fc84514c112c24868852f8ba6e5"), &None, &sourceMap, &mut catalog, &s0.clone(), );
        let s2 = create_operator_841c73a1ef786f76(&circuit, Some("edf8fbb0722a230ae72214f5ab4022e86f28b3b2afd3c6ec02aa063fc2697c5b"), &None, &sourceMap, &mut catalog, &s1.clone(), );
        let s3 = create_operator_cc244695669b6ce5(&circuit, Some("b0c07c7b1aaea5ed9488033a09776e1f49cbbbf47b043f056bfa824f93af2d1f"), &None, &sourceMap, &mut catalog, &s2.clone(), );
        let s4 = create_operator_5caf05bf0d1f80a6(&circuit, Some("9d4f1c617f0dd3e6330d6c4ea5e06dd2c5b2c5ea5d53405a23e8f3c25f49195d"), &None, &sourceMap, &mut catalog, &s3.clone(), );
        let s5 = create_operator_16a7c2ef3578ce7f(&circuit, Some("a8230e23a98879e424ab1ab5d1b4e316656e573a702d118975894817c588ca77"), &None, &sourceMap, &mut catalog, &s3.clone(), );
        let agg_zero_0: Option<RegionName> = Some(circuit.create_region("agg_zero", 0));
        let s6 = create_operator_ddccfd92c10d2516(&circuit, Some("288f1196c5edb6fe88b9afac2146ed926901ad317305c8dddf0f10bceba61a40"), &agg_zero_0, &sourceMap, &mut catalog, &s5.clone(), );
        let s7 = create_operator_37a5c2a96b848a69(&circuit, Some("37a5c2a96b848a696ad8159429ae509d204887e6c87e82f5ee6802d7ae9bf2e7"), &agg_zero_0, &sourceMap, &mut catalog, );
        let s8 = create_operator_b90e84372eca4fa9(&circuit, Some("ed36497745ac7818f495e4ed9af9d74c650386e0a966c33f7f341dba088be915"), &agg_zero_0, &sourceMap, &mut catalog, &s7.clone(), );
        let s9 = create_operator_76c0442d61e62219(&circuit, Some("83bab8b7fee4523b9876255343c0b4ce359ec229d5882621f270fe98cb25a891"), &agg_zero_0, &sourceMap, &mut catalog, &s8.clone(), &s6.clon

operator crate:

pub fn create_operator_ddccfd92c10d2516(circuit: &RootCircuit, hash: Option<&'static str>, region: &Option<RegionName>, sourceMap: &'static SourceMap, catalog: &mut Catalog,
    i0: &Stream<RootCircuit, WSet<Tup8<Option<Array<Option<i8>>>, Array<i8>, Option<Array<Option<i16>>>, Array<i16>, Option<Array<Option<i32>>>, Array<i32>, Option<Array<Option<i64>>>, Array<i64>>>>,
) -> Stream<RootCircuit, WSet<Tup8<Option<Array<Option<i8>>>, Array<i8>, Option<Array<Option<i16>>>, Array<i16>, Option<Array<Option<i32>>>, Array<i32>, Option<Array<Option<i64>>>, Array<i64>>>>{
    if let Some(region) = region { circuit.open_region(region.clone()) };
    let operator_ddccfd92c10d2516: Stream<RootCircuit, WSet<Tup8<Option<Array<Option<i8>>>, Array<i8>, Option<Array<Option<i16>>>, Array<i16>, Option<Array<Option<i32>>>, Array<i32>, Option<Array<Option<i64>>>, Array<i64>>>> = i0.neg();
    operator_ddccfd92c10d2516.set_persistent_id(hash);
    if let Some(region) = region { circuit.close_region(region.clone()) };

    return operator_ddccfd92c10d2516;
}

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third pass after the 4-commit re-push (4c8ce54037d6ae). Still deferring to @ryzhyk on the core region API in circuit_builder.rs; flagging the SQL-side / monitor-side rough edges.

Resolved from prior reviews:

  • ✅ Finding #2: rustdoc on open_region/close_region now says "identified by name" — fully fixed.
  • (#1 and #5 stayed fixed.)

Still open from prior reviews:

  • Finding #3: CircuitEvent::CloseRegion { name: _ } in monitor.rs still discards the name; close_named_region would benefit from a debug_assert_eq! against the saved name to catch mismatched open/close pairs. (Re-anchored inline.)
  • Finding #4: TagRegions.DetectConflicts still sorts by asVarName(), which produces agg_zero_10 < agg_zero_2 (lex on "<tag>_<id>"). Sort by (getTag, getId) if "smallest id" is intended. Moot today (only GlobalAggregate exists, ids are dense), but the generic code is wrong. (Re-anchored inline.)

New observations on this re-push:

  1. NestedOperatorWriter.write: the new region: &Option<RegionName> parameter is never used in the body. Compare to SingleOperatorWriter, which wraps the generated body with if let Some(region) = region { circuit.open_region(region.clone()) }; ... close_region. In NestedOperatorWriter the function takes the parameter but goes straight into circuit.recursive(...) and then processChilds without ever opening the passed-in region. Result: nested operators that should belong to an outer region won't be enrolled in it, and Rust will emit an unused-variable warning for the parameter. Either open/close the region around the recursive block, or rename it _region and document that nested operators never inherit an outer region.

  2. IncrementalizeVisitor.postorder(DBSPConstantOperator): only differentiate gets .copyAnnotations(operator); integral does not. The chain is constant → differentiate → integral, and this.map(operator, integral) registers integral as the externally-visible replacement. If a DBSPConstantOperator carries a GlobalAggregate annotation (which is exactly what ExpandAggregateZero produces), the integrate node — the one downstream operators actually consume — loses the region tag. Likely you want .copyAnnotations(operator) on integral as well.

  3. CircuitPostfix.recordRegion deduplicates by asVarName() (a string), but TagRegions.DetectConflicts deduplicates by .equals(). Two different keys for the same logical "is this the same region" question. Today the only RegionAnnotation subclass has equals keyed on id only and asVarName keyed on tag + id, so they happen to agree. Add a subclass with a different getTag() and they diverge. Either pick one canonical key (I'd suggest (getTag, getId) everywhere) or document the contract on RegionAnnotation.

  4. RegionAnnotation.asVarName() is emitted directly as a Rust identifier with no validation. getTag() is a free-form string from each subclass; today "agg_zero" is fine, but anything containing a hyphen, space, or leading digit would silently generate non-compiling Rust. A Utilities.enforce on tag.matches("[A-Za-z_][A-Za-z0-9_]*") in the RegionAnnotation constructor (or as the contract on getTag()) would prevent a future foot-gun.

  5. TagRegions.RemoveRegions.postorder log message is missing spaces. .append("Removed").append(count).append("annotations from ") renders as "Removed3annotations from …". Trivial.

  6. DBSPAggregateZeroOperator javadoc: unclosed parenthesis"this is done by {@link ExpandAggregateZero}." is missing its ).

  7. Commit hygiene (all 4 new commits checked): subjects all follow [TAG] Imperative summary, all signed off, no trailing-;; typos this time. The [SQL] commit body is empty — would benefit from a one-line on why (region annotations route through CircuitPostfix.recordRegion so each region is declared exactly once across crate boundaries), but not a blocker.

  8. Test coverage: TestTagRegions (3 cases incl. conflict + duplicate-annotation) is solid; MultiCrateTests.testAggregates exercises the multi-crate codegen path for SUM(C0). open_close_region and separate_create_region_same_name cover the dbsp side. Good.

None are hard blockers; (1) and (2) are the only ones that look like real bugs rather than nits.

Ok(())
}

CircuitEvent::CloseRegion { name: _ } => self.close_named_region(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CloseRegion { name: _ } still discards the name — close_named_region just pops the saved region. A debug_assert_eq! between the saved-region path and the name the caller is closing would catch mismatched open/close pairs cheaply in debug builds. To do that, you'd need to also store the RegionName (not just the saved RegionId) on region_open_stack, or change close_named_region to take the name and verify against named_regions. (Re-flagging from prior review.)

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4th pass on force-pushed 1eb76c0. Substantive rework: Region replaced by abstract RegionAnnotation base class; GlobalAggregate now extends it; ExpandAggregateZero tags every synthesized node (map1/neg/delta/integral/constant/sum) with GlobalAggregate(crdId) so each global aggregation rewrite becomes a region; new TestTagRegions (162 LOC) covers overlapping-regions removal + parent inheritance + alphabetical tiebreaker. Nullable emitOpenRegion/emitCloseRegion helpers in ToRustVisitor are clean.

Resolved from prior passes:

  • TagRegions alphabetical tiebreaker (prior #4) is now exercised by testMultipleAnnotations/testInheritedAnnotation. Acceptable.
  • copyAnnotations on the differentiate for incremental constants — addresses the intent of my prior #b, but propagates onto the wrong operator. See inline.

Two new items below, otherwise good. Still deferring to @ryzhyk on the DBSP-side region API.

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5th-pass re-review on the 4-commit restructure (split into DBSP region API + profiler labels + docs + SQL compiler).

Both findings from review 4352004334 are addressed:

  • NestedOperatorWriter now wraps the generated body in if let Some(region) = region { circuit.open_region(region.clone()) } and the matching close_region before the return, so the region: &Option<RegionName> parameter is no longer dead and nested operators correctly inherit their outer region.
  • IncrementalizeVisitor.postorder(DBSPConstantOperator) now applies .copyAnnotations(operator) to the integral as well as the differentiate, so the region tag survives onto the user-visible this.map(operator, integral) replacement.

Also noting: TagRegions log spaces from review 4351698531 are fixed ("Removed " ... " annotations from "), and DetectConflicts/RemoveRegions now do a real connected-component pass with per-input ancestor tracking. Test coverage in TestTagRegions (162 LOC) exercises overlapping-regions + parent inheritance + tiebreaker.

Remaining nits (none blocking, leaving as-is — feel free to ignore):

  • Comparator.comparing(RegionAnnotation::asVarName) is still lex sort on "<tag>_<id>", so agg_zero_10 sorts before agg_zero_2. Numeric tiebreaker on getId() would be more intuitive if a region ever sees double-digit ids.
  • crates/dbsp/src/monitor.rs:473 CloseRegion { name: _ } still discards the name — a debug_assert_eq! against the saved region name would catch event-stream corruption cheaply.
  • RegionAnnotation.asVarName() is interpolated as a Rust ident with no validation; future tags containing hyphens or leading digits would break codegen. Easy to enforce in the abstract base if you want belt-and-suspenders.

No issue with the new DBSP API itself — deferring to ryzhyk on the public circuit_builder.rs surface (open_region/close_region/RegionName), it lines up with the trace event structure cleanly.

Approving.

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New commits reviewed (bd4451e, 748e036, 4c080f4). DBSP region API is well-tested — the two unit tests cover both single-region-reopen and distinct-region-same-name cases nicely. Profiler label fix and doc updates are clean. One doc-comment bug inline.

Comment thread crates/dbsp/src/circuit/circuit_builder.rs Outdated
/// Returns a [`RegionName`] that can be passed to [`Circuit::open_region`]
/// and [`Circuit::close_region`] to associate operators with the region
/// from multiple call sites.
fn create_region(&self, name: &str, id: u64) -> RegionName {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is misleading: this doesn't actually create a region

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename this to "create_region_name"

///
/// Emits an `OpenRegion` event. Every operator or subcircuit created after
/// this call (in the same thread) will belong to the region until
/// [`Circuit::close_region`] is called with the same `name`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe document what happens if the region exists/doesn't exist.

@mihaibudiu mihaibudiu enabled auto-merge May 26, 2026 21:06
@mihaibudiu mihaibudiu added this pull request to the merge queue May 26, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch May 27, 2026
@mihaibudiu mihaibudiu force-pushed the issue6285 branch 2 times, most recently from 85d23df to 4e76d52 Compare May 27, 2026 03:38
@mihaibudiu mihaibudiu enabled auto-merge May 27, 2026 03:38
@mihaibudiu mihaibudiu added this pull request to the merge queue May 27, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 27, 2026
@mihaibudiu mihaibudiu enabled auto-merge May 27, 2026 16:59
@mihaibudiu mihaibudiu added this pull request to the merge queue May 27, 2026
@gz gz removed this pull request from the merge queue due to a manual request May 27, 2026
@mihaibudiu mihaibudiu added this pull request to the merge queue May 28, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@mihaibudiu mihaibudiu enabled auto-merge May 29, 2026 05:48
@mihaibudiu mihaibudiu added this pull request to the merge queue May 29, 2026
Merged via the queue into feldera:main with commit 62e9c73 May 29, 2026
1 check passed
@mihaibudiu mihaibudiu deleted the issue6285 branch May 29, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SQL] Compiler should create regions for some operators for making the profile UI more ergonomic

3 participants