Skip to content

Fix cdc_order_by with multiple fields.#6346

Merged
gz merged 1 commit into
mainfrom
cdc_order_by
May 29, 2026
Merged

Fix cdc_order_by with multiple fields.#6346
gz merged 1 commit into
mainfrom
cdc_order_by

Conversation

@gz

@gz gz commented May 29, 2026

Copy link
Copy Markdown
Contributor

PR #6245 caused a regression for cdc_order_by which prevented parsing of order by clauses with multiple fields.
That was because we didn't use the right DF function to parse it.

This corrects the mistakes and adds a series of tests, we didn't have any that run cdc_order_by with more than one field.

Describe Manual Test Plan

Checklist

  • Unit tests added/updated
  • Integration tests added/updated

Breaking Changes?

Not anymore.

@gz gz requested a review from ryzhyk May 29, 2026 02:54
PR #6245 caused a regression for `cdc_order_by` which prevented
parsing of order by clauses with multiple fields.
That was because we didn't use the right DF function to parse it.

This corrects the mistakes and adds a series of tests.

Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>
@gz gz force-pushed the cdc_order_by branch from e50f7fb to 3112223 Compare May 29, 2026 02:59
@gz gz added this pull request to the merge queue May 29, 2026

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix for the #6245 regression. The added validator (parse_comma_separated(parse_order_by_expr) + expect_token(EOF)) is the right shape, the new parse_cdc_order_by honors DataFusion's nulls_max default per direction, and the test ladder — validator shape table, parser defaults table, Rust multi-key integration with a same-__feldera_ts tiebreak, Python upsert with lsn tiebreak — covers exactly the original failure mode plus the directional/null-placement edges. A couple of small nits below, none blocking.


order_exprs
.into_iter()
.map(|order_expr| {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df.parse_sql_expr(&order_expr.expr.to_string()) round-trips the key expression through a string render of sqlparser's AST and reparses it via DataFusion. That works for everything in the test ladder, but it is a fragile contract — anything that formats ambiguously (quoting rules across dialects, function-call serialization, etc.) can silently change semantics between the two parses. Since you already have an order_expr.expr: sqlparser::ast::Expr, the more direct path is to use df.parse_sql_expr only once on the whole clause (e.g. via a tiny helper that hands the sub-Expr to DataFusion's SQL-to-logical-expr translator), or to push the per-key reparse far enough to fail loudly if the round trip drops anything. Not a blocker, but worth a TODO so future-you doesn't have to discover this the hard way.

/// against df's schema.
///
/// `cdc_order_by` is the body of an ORDER BY clause: a comma-separated list
/// of keys, each optionally annotated with ASC / DESC and NULLS

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_sql_order_by lives in adapterlib::utils::datafusion and parse_cdc_order_by re-implements the same prefix here (Parser::new(GenericDialect).try_with_sql + parse_comma_separated + EOF check), just with deltalake's sqlparser re-export. They have to stay in lock-step for validate_cdc_order_by to actually pre-validate what build_cdc_dataframe will accept. Worth extracting the &str -> Vec<OrderByExpr> step into a shared helper in adapterlib so the two sites can never drift.

let expect = |id: u32, label: &str| TestStruct {
id,
b: false,
i: None,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soft: the test expect closure hardcodes b: false and i: None, and only varies s. Fine here, but if you ever extend TestStruct you'll thank yourself for a TestStruct::with_s(id, s) constructor in the test helper module instead.

Merged via the queue into main with commit 9bbbea7 May 29, 2026
2 checks passed
@gz gz deleted the cdc_order_by branch May 29, 2026 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants