Skip to content

[SQL] Support for TIMESTAMP WITH TIME ZONE#6116

Merged
mihaibudiu merged 2 commits into
feldera:mainfrom
mihaibudiu:issue4146
Jun 5, 2026
Merged

[SQL] Support for TIMESTAMP WITH TIME ZONE#6116
mihaibudiu merged 2 commits into
feldera:mainfrom
mihaibudiu:issue4146

Conversation

@mihaibudiu

@mihaibudiu mihaibudiu commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Fixes #4146

This PR finally adds support for the last important missing SQL type TIMESTAMP WITH TIME ZONE
There are multiple ways to implement this; I have chosen a version which provides similar capabilities with most databases: timestamps with time zones are stored at runtime as simply UTC timestamps. Thus they are comparable to each other. No time zone information is preserved at runtime, though.

I will mark this PR as a draft for two reasons:

The Rust serialization code will need to be audited for compatibility with other standard formats (e.g., Debezium).

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several blockers; please mark this as draft as the description suggests, or address before merge.

Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread crates/sqllib/src/casts.rs
Comment thread crates/sqllib/src/casts.rs Outdated
Comment thread crates/sqllib/src/timestamp.rs Outdated
Comment thread docs.feldera.com/docs/sql/datetime.md Outdated
Comment thread docs.feldera.com/docs/sql/types.md Outdated
JsonFlavor::DebeziumMySql => Self {
time_format: TimeFormat::Micros,
date_format: DateFormat::DaysSinceEpoch,
// Why is this missing fractions of second?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray // Why is this missing fractions of second? comment with a question to nobody. Either answer it (Debezium MySQL really does emit second-precision timestamps in this layout) or open an issue and reference it. Don't ship a TODO disguised as a question.

@mihaibudiu mihaibudiu force-pushed the issue4146 branch 2 times, most recently from 48d602b to 4e74e03 Compare April 25, 2026 05:46
@mihaibudiu mihaibudiu marked this pull request as draft April 28, 2026 05:28
@mihaibudiu mihaibudiu marked this pull request as draft April 28, 2026 05:28
@mihaibudiu mihaibudiu force-pushed the issue4146 branch 2 times, most recently from add429a to 7edb1df Compare April 29, 2026 04:24

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review on tip 7edb1df (force-pushed since the prior review on 8b53059). Several of my prior blockers were silently fixed without a reply (good); a few are not yet addressed; and the new force-push introduced regressions in cast_to_TimestampTz_s and a Python test file that won't even parse. PR is now draft, which is the right call.

Prior blockers — status

  • timestamp.rs SerializeWithContext using timestamp_format instead of timestamp_tz_format — fixed.
  • serialize_table_record!(TestStruct[3] {...}) field count — fixed (now [4]).
  • DBSPTypeTimestampTz getMaxValue 8 nines — fixed (6 nines).
  • CastTests Americas/New_York — fixed.
  • TypeCompiler TZ branch warning text + precision constant — fixed (note: timestampPrecisionsWarned is still shared between the two branches; minor).
  • TimestampTz // Is this right for negative timestamps? TODO + Rustdoc copy-paste — gone (the methods were removed; if/when they come back, please don't reintroduce the bug).
  • TZ section docs typos / heading — fixed.
  • Non-actionable cast error — only partially: type name corrected to TIMESTAMP WITH TIME ZONE, but still no format hint as PR_REVIEW_RULES.md asks (see inline).
  • Magic-number Objects.hash(..., 14) — number changed to 16, still a bare magic constant (see inline).

Still open / not addressed

  • Test coverage. Same two integration tests as before (issue4146, issue4146a). Your own description still says "I need to implement many more tests - at least one for each function". This is the headline blocker for the PR.
  • Plain-TIMESTAMP doc typos data time / asssumed (datetime.md L196-197) — flagged previously to fix in the same PR; still present.
  • Stray rhetorical // Why is this missing fractions of second? comment in serde_config.rs:204.

New issues introduced by this force-push

  • cast_to_TimestampTz_s now uses only DateTime::parse_from_rfc3339. That's a regression: the previous version at least accepted YYYY-MM-DD HH:MM:SS[.f] and YYYY-MM-DD. RFC3339 also doesn't accept IANA zone names, even though both the docs and DBSPTimestampTzLiteral advertise IANA support. Net effect: literals accept '2020-01-01 10:10:10 America/New_York' but CAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE) will now fail at runtime.
  • python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py — the new TZ test data is broken in three ways (syntax error + two malformed strings). See inline; this file won't even import.
  • DBSPTimestampTzLiteral.getMicros has a negative-modulo footgun (see inline) — pre-1970 TZ literals will throw from LocalDateTime.ofEpochSecond because nanoOfSecond must be non-negative.

Keeping verdict at REQUEST_CHANGES until the test coverage gap and the regressed cast/python paths are addressed.

Comment thread crates/sqllib/src/casts.rs Outdated

#[doc(hidden)]
pub fn cast_to_TimestampTz_s(value: SqlString) -> SqlResult<TimestampTz> {
if let Ok(v) = DateTime::parse_from_rfc3339(value.str()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two regressions vs. the previous implementation, both visible to users:

  1. No IANA zone names. DateTime::parse_from_rfc3339 only accepts numeric offsets. But DBSPTimestampTzLiteral and docs/sql/datetime.md both promise IANA names (America/New_York, etc.). So TIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/New_York' works as a literal but CAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE) fails. Please use chrono_tz (or equivalent) to handle IANA names, then fall back to parse_from_rfc3339 for numeric offsets.
  2. Date-only and space-separated forms dropped. The previous version of this function accepted YYYY-MM-DD HH:MM:SS[.f] (no offset, treated as UTC) and YYYY-MM-DD. RFC3339 requires the T separator and an offset, so both of those casts now fail. At minimum, please keep the %Y-%m-%d %H:%M:%S%.f and %Y-%m-%d fallbacks (UTC-assumed, matching the literal grammar's "assumed to be in UTC").

A test that exercises each accepted format end-to-end (numeric offset, IANA name, date-only, no-offset) belongs with this fix.

Comment thread crates/sqllib/src/casts.rs Outdated
Comment thread python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py Outdated
Comment thread python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py Outdated
Comment thread python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py Outdated
JsonFlavor::DebeziumMySql => Self {
time_format: TimeFormat::Micros,
date_format: DateFormat::DaysSinceEpoch,
// Why is this missing fractions of second?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still here from prior review. Either answer the question (does Debezium MySQL really emit second-precision timestamps in this layout? — if so, say so) or open an issue and reference it. Don't ship a TODO disguised as a question.

@mihaibudiu mihaibudiu marked this pull request as ready for review May 13, 2026 22:16
@mihaibudiu

Copy link
Copy Markdown
Contributor Author

Most of the Python tests are validated because they have the same expected shape and results as tests for TIMESTAMP.

@mihaibudiu mihaibudiu requested a review from snkas May 13, 2026 22:43
@mihaibudiu

Copy link
Copy Markdown
Contributor Author

@snkas this is a big PR, but I think most scrutiny is needed in the Rust code dealing with connectors and pipeline manager, which is probably < 100 lines of code. Most of the code is in tests; see the above comment why the Python tests are probably correct.

@mihaibudiu mihaibudiu force-pushed the issue4146 branch 6 times, most recently from dbdd9a6 to 8091d80 Compare May 14, 2026 22:30

@snkas snkas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From pipeline-manager perspective, it's a new type added to the OpenAPI schema, that part looks good.

For the connectors: the input and output format of this new data type is still MicrosSinceEpoch, so an integer. When would one use TIMESTAMP instead of TIMESTAMP WITH TIME ZONE, and vice versa? It seems they are effectively the same in terms of the queries you can do on them (e.g., it's no possible to query "give me all the tuples with the timestamp in time zone +01:00" or so)?

@mihaibudiu

Copy link
Copy Markdown
Contributor Author

Our implementation with TIMESTAMP WITH TIME ZONE aligns with most major databases. For one, some datasets already include timestamps with time zones, so this will enable us to load them. Then there are functions to convert timezones. CONVERT_TIMEZONE is an example we already have, but there is a function (not yet implemented) AT TIME ZONE, which takes a timestamp at a given time zone and converts it to UDT. Indeed, we do not carry the actual time zone; time zones are relatively complex objects, which do not make sense without a TIMESTAMP, and whose definition evolves every year, they are not simply numeric offsets.

@mihaibudiu

Copy link
Copy Markdown
Contributor Author

@snkas any actionable feedback? Can we merge this and refine?
There's documentation about the semantics of the new type in the PR.

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big blockers from my prior two reviews are addressed: IANA zones now parse, the error message is actionable with the expected format, getMaxValue() is back to 6 fractional digits, DBSPTimestampTzLiteral.getMicros uses Math.floorDiv/Math.floorMod (no more negative-modulo footgun), the broken test_timestamp_tbl.py row is now valid Python with a clean +05:00 offset, and test coverage has expanded substantially (issue4146a with ten in-line cases, testCaseTz, testCompareTz, plus the new test_interval_*, test_cmp_operators.py, test_date_time_fn.py and TZ rows in test_timestamp_tbl.py). Approving. Two tiny leftovers inline.

JsonFlavor::DebeziumMySql => Self {
time_format: TimeFormat::Micros,
date_format: DateFormat::DaysSinceEpoch,
// Why is this missing fractions of second?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rhetorical TODO comment is still here from my first review. Either answer it (does Debezium MySQL truly emit second-precision timestamps in this flavor? — confirm and replace with a single-sentence explanation citing the upstream behaviour) or delete it. Floating questions to nobody are noise. Not a blocker.

@snkas snkas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so this PR has to benefits:

(1) It allows us to ingest data from sources that provide a column that has timestamp+timezone as its type. Would this not already be possible by adding an extra mapping to Timestamp? Given we don't store the timezone in any case.
(2) It opens up the possibility for SQL timezone-related functions. But if we don't store the timezone in this new type, wouldn't CHAR just still be the output type for both AT TIMESTAMP and CONVERT_TIMESTAMP? Documentation: https://docs.feldera.com/sql/datetime/#convert_timezone

As the PR description is not "Add type TIMESTAMP WITH TIMEZONE as an alias for TIMESTAMP", which would be my expectation so far based on our back-and-forth. How does the type differ?

// hardwire that for now. We can make it configurable in the future.
// FIXME: Also, the timezone should be `None`, but that gets compiled into `timezone_ntz`
// in the JSON schema, which Databricks doesn't fully support yet.
SqlType::Timestamp => DataType::Timestamp(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is this special case for Timestamp, wouldn't this also apply to TimestampTz? @ryzhyk added the comment here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that this will work - the special case was None not being supported

@mihaibudiu

Copy link
Copy Markdown
Contributor Author

In other databases an important difference between TIMESTAMP and TIMESTAMP WITH TIME ZONE is that the latter is usually adjusted for I/O based on the local timezone of the database, whereas the former is absolute. Now, Feldera databases always run in the UTC time zone, which has the nice benefit of making things deterministic. So these two do look very similar.

Think of the former as a NaiveDateTime and the latter as DateTime<Utc>. Both these types exist in Rust, and although they both have the same representation in Rust, they typecheck differently, and they do have different methods. The set of functions in Rust overlaps to a significant degree, but we will add more functions which are specific to only one of the two types.

@mihaibudiu mihaibudiu added this pull request to the merge queue May 19, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 19, 2026
@mihaibudiu

Copy link
Copy Markdown
Contributor Author

Merging this is blocked on #6133 - without this it requires a platform upgrade

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing only the two new commits since the 2026-05-16 approval (70d1371 + 0a3ea0a).

Calcite bump commit is trivial. The TIMESTAMP WITH TIME ZONE commit is large but mostly mechanical delegation from Timestamp -> TimestampTz, which is the right approach.

A handful of blocking issues I want fixed before this lands:

  1. Operator-precedence bug in the new TIMESTAMP_TZ guard in ExpressionCompiler.makeBinaryExpression. The || binds outside the opcode.isComparison() guard, so non-comparison ops between a non-TZ left and a TZ right (or the reverse) also hit this branch with the misleading 'comparison-only' message. testCompareTz happens not to exercise this path because Calcite already rejects the subtraction at type validation.

  2. New Variant::TimestampTz is added to the variant enum, but golden_row_variant is still a Tup25 covering exactly the pre-existing 25 variants. The whole point of bumping storage-test-compat is to catch rkyv-layout changes; without TimestampTz in the golden tuple, the new variant arm is not exercised by the storage compat test at all.

  3. The pre-existing aggregate test illarg_extract_abrv1_legal was renamed to date_part_abrv1_legal, and the new TZ counterpart is date_part_abrv1_legal_tz. discover_tests only registers classes whose name starts with illarg_, so both classes silently drop out of the runtime aggregate test suite. The DATE_PART abbreviation coverage (existing) and its TZ extension (new) are no longer running.

  4. cast_to_TimestampTz_u64 returns an error message that says TIMESTAMP instead of TIMESTAMP WITH TIME ZONE. The macro-generated wrappers below it got the message right; this one is the odd hand-written outlier.

Also a few non-blocking nits inline (section banner, datetime.md prose, lingering magic number from the prior round, two behaviour-change footnotes worth calling out in the commit body).

Connector coverage (Avro/JSON/Parquet/Delta/datagen/adhoc) and the new visitors all look complete; adding the new Variant enum arm at the end of the enum (not the middle) is the right call for forward-compat.

Comment thread crates/storage-test-compat/src/lib.rs Outdated
Comment thread python/tests/runtime_aggtest/illarg_tests2/test_date_time_fn.py Outdated
Comment thread crates/sqllib/src/casts.rs Outdated
Comment thread crates/sqllib/src/casts.rs Outdated
Comment thread docs.feldera.com/docs/sql/datetime.md Outdated
Comment thread crates/sqllib/src/timestamp.rs
Comment thread crates/sqllib/src/timestamp.rs

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocker still open (storage compat golden row); the other three are fixed. See inline.

Comment thread crates/storage-test-compat/src/lib.rs Outdated
Comment thread crates/sqllib/src/casts.rs
Comment thread python/tests/runtime_aggtest/illarg_tests2/test_date_time_fn.py
@mihaibudiu mihaibudiu force-pushed the issue4146 branch 2 times, most recently from d603a63 to b7e7cbc Compare June 2, 2026 18:48
@mihaibudiu mihaibudiu enabled auto-merge June 2, 2026 18:48
@mihaibudiu mihaibudiu force-pushed the issue4146 branch 2 times, most recently from 9937e73 to 9289004 Compare June 2, 2026 20:01
@mihaibudiu mihaibudiu added this pull request to the merge queue Jun 2, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 3, 2026
@mihaibudiu mihaibudiu added this pull request to the merge queue Jun 3, 2026
Comment thread crates/sqllib/src/timestamp.rs
Comment thread crates/sqllib/src/timestamp.rs
@mihaibudiu mihaibudiu removed this pull request from the merge queue due to a manual request Jun 3, 2026

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage-compat blocker resolved. Tup26 + GoldenRowVariant properly covers all 26 variant arms including TimestampTz, new golden files added alongside existing ones (no removals), README updated. All four blockers from the 5/22 review are now fixed. Two small nits inline.

Comment thread docs.feldera.com/docs/sql/datetime.md
@mihaibudiu mihaibudiu added this pull request to the merge queue Jun 5, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 5, 2026
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
@mihaibudiu mihaibudiu enabled auto-merge June 5, 2026 04:18
@mihaibudiu mihaibudiu added this pull request to the merge queue Jun 5, 2026
Merged via the queue into feldera:main with commit 2277c07 Jun 5, 2026
1 check passed
@mihaibudiu mihaibudiu deleted the issue4146 branch June 5, 2026 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for 'timestamp with time zone'

4 participants