[SQL] compiler support for grouping nodes into regions #6305
Conversation
|
@ryzhyk your review is mostly necessary for the new region APIs. These are initially written by Claude, but then massaged manually. |
mythical-fred
left a comment
There was a problem hiding this comment.
Posting findings without pre-empting @ryzhyk on the core region API. Infrastructure looks sound; regions are intentionally excluded from OperatorHash so this doesn't touch checkpoint compat. Tests cover both the dbsp open/close semantics and the SQL TagRegions conflict path. A few real issues plus some cleanup below.
Findings (in order of seriousness):
-
ExpandAggregateZeros.crdIdis astatic int(process-global, never reset). Two compilations in the same JVM share state, so the IDs leak from one program into the next — the first program's regions get IDs 0..N-1, the next program starts at N. In the Java test suite (MultiCrateTests.testAggregatesand any other test that builds multiple circuits with global aggregates) the generated Rust differs between an isolated run and a run inside the full suite. Move it to an instance field ofExpandAggregateZeros, or reset it inpreorder(DBSPCircuit). Inline. -
Doc/terminology drift:
Circuit::create_regionrustdoc says "Returns a [RegionId] that can be passed to [Circuit::open_region]", but the type isRegionNameand there is no publicRegionId.open_region/close_regionparameter is namedid: RegionNamebut the doc text repeatedly calls itid. Either rename the type toRegionId(matches the docs and theid()accessor) or rewrite the docs to consistently sayRegionName. Inline. -
CloseRegion { name: _ }inmonitor.rsdiscards the name and just pops the stack. The contract is "caller is responsible for pairing everyopen_regionwith a matchingclose_region" — fine for the generated code path, butclose_named_region()coulddebug_assert_eq!(saved_name, name)(or at least the monitor could compare names) so caller mistakes don't silently land operators in the wrong region. Cheap, catches a real class of bug. Inline. -
TagRegions.DetectConflicts: when an operator has multipleRegionAnnotations, you sort byasVarName()(a string likeagg_zero_<int>) and keep the first "alphabetically". That's lex sort, soagg_zero_10 < agg_zero_2 < agg_zero_3. Determinism is preserved but the tiebreaker isn't "smallest id" — if that's intentional, fine; if not, sort by(getTag(), getId()). Currently moot becauseGlobalAggregateis the onlyRegionAnnotationand a single operator only gets one of them, but the code is written generically and the testtestAnnotationTwoAnnotationsdoesn't exercise this branch. Inline. -
Five trailing
;;inExpandAggregateZero.postorder(lines 103, 109, 112, 116, 120). Compiles, but trips strict lints and is just noise. Inline.
Soft / non-blocking:
Circuit::open_region/close_regiontakeRegionNameby value, which forces every caller (including the two new unit tests) to.clone(). Switching to&RegionNamewould letCircuitEvent::open_regionclone once at log time and free callers from the boilerplate.- PR description says "I ran all Java tests and some python tests manually." Code-generation changes that thread a new parameter through
SingleOperatorWriter/NestedOperatorWriter/CircuitWriterdeserve at least the standard Python integration suite — worth confirming green CI before merge. Regression3Tests.issue6397(LAG over-Infinity/0/Infinity) is a real fix but landing it in a PR titled "compiler support for grouping nodes into regions" makes archaeology harder later. Already a separate commit, so a quickgit rebase --interactiveto split off the doc + LAG + small fixes from the main region work would be ideal.
None are hard blockers on top of @ryzhyk's pending API review; #1 (static crdId) is the only one with real teeth.
mythical-fred
left a comment
There was a problem hiding this comment.
Re-review after force-push (ea37a58 → 4c8ce54). Same 4 commits, rebased onto fresh main.
Addressed from my prior review (4349733344):
- ✅
ExpandAggregateZeros.crdIdis now an instance field and reset instartVisit— finding #1 resolved. - ✅ Five trailing
;;typos inExpandAggregateZero.postorderare gone — finding #5 resolved. ⚠️ The rustdoc oncreate_regionnow correctly saysRegionName, but the doc onopen_region/close_regionstill reads "the circuit region identified byid" while the parameter isid: RegionName. Mostly cosmetic now — partial fix of finding #2.
Still open from the prior review:
- Finding #3 (
CloseRegion { name: _ }inmonitor.rsdiscards the name; adebug_assert_eq!inclose_named_regionwould catch open/close mismatches cheaply). - Finding #4 (
TagRegions.DetectConflictssorts duplicateRegionAnnotations byasVarName(), which is lex order on"<tag>_<id>"; sort by(getTag, getId)if the intent is "smallest id").
None are hard blockers; still deferring to @ryzhyk on the core region API. No new inline comments — the threads from review 4349733344 are still the canonical anchors.
|
Example generated code: operator crate: |
mythical-fred
left a comment
There was a problem hiding this comment.
Third pass after the 4-commit re-push (4c8ce54 → 037d6ae). Still deferring to @ryzhyk on the core region API in circuit_builder.rs; flagging the SQL-side / monitor-side rough edges.
Resolved from prior reviews:
- ✅ Finding #2: rustdoc on
open_region/close_regionnow says "identified byname" — fully fixed. - (#1 and #5 stayed fixed.)
Still open from prior reviews:
- Finding #3:
CircuitEvent::CloseRegion { name: _ }inmonitor.rsstill discards the name;close_named_regionwould benefit from adebug_assert_eq!against the saved name to catch mismatched open/close pairs. (Re-anchored inline.) - Finding #4:
TagRegions.DetectConflictsstill sorts byasVarName(), which producesagg_zero_10 < agg_zero_2(lex on"<tag>_<id>"). Sort by(getTag, getId)if "smallest id" is intended. Moot today (onlyGlobalAggregateexists, ids are dense), but the generic code is wrong. (Re-anchored inline.)
New observations on this re-push:
-
NestedOperatorWriter.write: the newregion: &Option<RegionName>parameter is never used in the body. Compare toSingleOperatorWriter, which wraps the generated body withif let Some(region) = region { circuit.open_region(region.clone()) };...close_region. InNestedOperatorWriterthe function takes the parameter but goes straight intocircuit.recursive(...)and thenprocessChilds without ever opening the passed-in region. Result: nested operators that should belong to an outer region won't be enrolled in it, and Rust will emit an unused-variable warning for the parameter. Either open/close the region around the recursive block, or rename it_regionand document that nested operators never inherit an outer region. -
IncrementalizeVisitor.postorder(DBSPConstantOperator): onlydifferentiategets.copyAnnotations(operator);integraldoes not. The chain isconstant → differentiate → integral, andthis.map(operator, integral)registersintegralas the externally-visible replacement. If aDBSPConstantOperatorcarries aGlobalAggregateannotation (which is exactly whatExpandAggregateZeroproduces), the integrate node — the one downstream operators actually consume — loses the region tag. Likely you want.copyAnnotations(operator)onintegralas well. -
CircuitPostfix.recordRegiondeduplicates byasVarName()(a string), butTagRegions.DetectConflictsdeduplicates by.equals(). Two different keys for the same logical "is this the same region" question. Today the onlyRegionAnnotationsubclass hasequalskeyed onidonly andasVarNamekeyed ontag + id, so they happen to agree. Add a subclass with a differentgetTag()and they diverge. Either pick one canonical key (I'd suggest(getTag, getId)everywhere) or document the contract onRegionAnnotation. -
RegionAnnotation.asVarName()is emitted directly as a Rust identifier with no validation.getTag()is a free-form string from each subclass; today "agg_zero" is fine, but anything containing a hyphen, space, or leading digit would silently generate non-compiling Rust. AUtilities.enforceontag.matches("[A-Za-z_][A-Za-z0-9_]*")in theRegionAnnotationconstructor (or as the contract ongetTag()) would prevent a future foot-gun. -
TagRegions.RemoveRegions.postorderlog message is missing spaces..append("Removed").append(count).append("annotations from ")renders as"Removed3annotations from …". Trivial. -
DBSPAggregateZeroOperatorjavadoc: unclosed parenthesis —"this is done by {@link ExpandAggregateZero}."is missing its). -
Commit hygiene (all 4 new commits checked): subjects all follow
[TAG] Imperative summary, all signed off, no trailing-;;typos this time. The[SQL]commit body is empty — would benefit from a one-line on why (region annotations route throughCircuitPostfix.recordRegionso each region is declared exactly once across crate boundaries), but not a blocker. -
Test coverage:
TestTagRegions(3 cases incl. conflict + duplicate-annotation) is solid;MultiCrateTests.testAggregatesexercises the multi-crate codegen path forSUM(C0).open_close_regionandseparate_create_region_same_namecover the dbsp side. Good.
None are hard blockers; (1) and (2) are the only ones that look like real bugs rather than nits.
| Ok(()) | ||
| } | ||
|
|
||
| CircuitEvent::CloseRegion { name: _ } => self.close_named_region(), |
There was a problem hiding this comment.
CloseRegion { name: _ } still discards the name — close_named_region just pops the saved region. A debug_assert_eq! between the saved-region path and the name the caller is closing would catch mismatched open/close pairs cheaply in debug builds. To do that, you'd need to also store the RegionName (not just the saved RegionId) on region_open_stack, or change close_named_region to take the name and verify against named_regions. (Re-flagging from prior review.)
mythical-fred
left a comment
There was a problem hiding this comment.
4th pass on force-pushed 1eb76c0. Substantive rework: Region replaced by abstract RegionAnnotation base class; GlobalAggregate now extends it; ExpandAggregateZero tags every synthesized node (map1/neg/delta/integral/constant/sum) with GlobalAggregate(crdId) so each global aggregation rewrite becomes a region; new TestTagRegions (162 LOC) covers overlapping-regions removal + parent inheritance + alphabetical tiebreaker. Nullable emitOpenRegion/emitCloseRegion helpers in ToRustVisitor are clean.
Resolved from prior passes:
- TagRegions alphabetical tiebreaker (prior #4) is now exercised by
testMultipleAnnotations/testInheritedAnnotation. Acceptable. - copyAnnotations on the differentiate for incremental constants — addresses the intent of my prior #b, but propagates onto the wrong operator. See inline.
Two new items below, otherwise good. Still deferring to @ryzhyk on the DBSP-side region API.
mythical-fred
left a comment
There was a problem hiding this comment.
5th-pass re-review on the 4-commit restructure (split into DBSP region API + profiler labels + docs + SQL compiler).
Both findings from review 4352004334 are addressed:
- NestedOperatorWriter now wraps the generated body in
if let Some(region) = region { circuit.open_region(region.clone()) }and the matchingclose_regionbefore the return, so theregion: &Option<RegionName>parameter is no longer dead and nested operators correctly inherit their outer region. - IncrementalizeVisitor.postorder(DBSPConstantOperator) now applies
.copyAnnotations(operator)to the integral as well as the differentiate, so the region tag survives onto the user-visiblethis.map(operator, integral)replacement.
Also noting: TagRegions log spaces from review 4351698531 are fixed ("Removed " ... " annotations from "), and DetectConflicts/RemoveRegions now do a real connected-component pass with per-input ancestor tracking. Test coverage in TestTagRegions (162 LOC) exercises overlapping-regions + parent inheritance + tiebreaker.
Remaining nits (none blocking, leaving as-is — feel free to ignore):
Comparator.comparing(RegionAnnotation::asVarName)is still lex sort on"<tag>_<id>", soagg_zero_10sorts beforeagg_zero_2. Numeric tiebreaker ongetId()would be more intuitive if a region ever sees double-digit ids.crates/dbsp/src/monitor.rs:473CloseRegion { name: _ }still discards the name — adebug_assert_eq!against the saved region name would catch event-stream corruption cheaply.RegionAnnotation.asVarName()is interpolated as a Rust ident with no validation; future tags containing hyphens or leading digits would break codegen. Easy to enforce in the abstract base if you want belt-and-suspenders.
No issue with the new DBSP API itself — deferring to ryzhyk on the public circuit_builder.rs surface (open_region/close_region/RegionName), it lines up with the trace event structure cleanly.
Approving.
| /// Returns a [`RegionName`] that can be passed to [`Circuit::open_region`] | ||
| /// and [`Circuit::close_region`] to associate operators with the region | ||
| /// from multiple call sites. | ||
| fn create_region(&self, name: &str, id: u64) -> RegionName { |
There was a problem hiding this comment.
This name is misleading: this doesn't actually create a region
There was a problem hiding this comment.
I will rename this to "create_region_name"
| /// | ||
| /// Emits an `OpenRegion` event. Every operator or subcircuit created after | ||
| /// this call (in the same thread) will belong to the region until | ||
| /// [`Circuit::close_region`] is called with the same `name`. |
There was a problem hiding this comment.
maybe document what happens if the region exists/doesn't exist.
85d23df to
4e76d52
Compare
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>

The profiler uses a notion of "regions" to group related nodes together. So far, regions were used only for nodes which look like primitive nodes for the SQL compiler, but are actually expanded into graphs by DBSP. With this PR we add the ability of the compiler itself to group multiple operators into a region. The compiler promises that regions are always connected subgraphs of the circuit graph.
There are 4 commits, each with a separate change. (Two commits fix small issues found elsewhere.)
The compiler also groups operators used for producing the "zero" of aggregate functions without group-by using a new region "agg_zero". We will try to add more regions in the future, now that the infrastructure exists. These regions are introduced very early in compilation, and may not survive until the end, because the optimizer may rearrange these operators.
Fixes #6285
Describe Manual Test Plan
I ran all Java tests and some python tests manually