[SQL] Support for TIMESTAMP WITH TIME ZONE#6116
Conversation
mythical-fred
left a comment
There was a problem hiding this comment.
Several blockers; please mark this as draft as the description suggests, or address before merge.
| JsonFlavor::DebeziumMySql => Self { | ||
| time_format: TimeFormat::Micros, | ||
| date_format: DateFormat::DaysSinceEpoch, | ||
| // Why is this missing fractions of second? |
There was a problem hiding this comment.
Stray // Why is this missing fractions of second? comment with a question to nobody. Either answer it (Debezium MySQL really does emit second-precision timestamps in this layout) or open an issue and reference it. Don't ship a TODO disguised as a question.
48d602b to
4e74e03
Compare
add429a to
7edb1df
Compare
mythical-fred
left a comment
There was a problem hiding this comment.
Re-review on tip 7edb1df (force-pushed since the prior review on 8b53059). Several of my prior blockers were silently fixed without a reply (good); a few are not yet addressed; and the new force-push introduced regressions in cast_to_TimestampTz_s and a Python test file that won't even parse. PR is now draft, which is the right call.
Prior blockers — status
- timestamp.rs SerializeWithContext using
timestamp_formatinstead oftimestamp_tz_format— fixed. serialize_table_record!(TestStruct[3] {...})field count — fixed (now[4]).- DBSPTypeTimestampTz
getMaxValue8 nines — fixed (6 nines). - CastTests
Americas/New_York— fixed. - TypeCompiler TZ branch warning text + precision constant — fixed (note:
timestampPrecisionsWarnedis still shared between the two branches; minor). - TimestampTz
// Is this right for negative timestamps?TODO + Rustdoc copy-paste — gone (the methods were removed; if/when they come back, please don't reintroduce the bug). - TZ section docs typos / heading — fixed.
- Non-actionable cast error — only partially: type name corrected to TIMESTAMP WITH TIME ZONE, but still no format hint as PR_REVIEW_RULES.md asks (see inline).
- Magic-number
Objects.hash(..., 14)— number changed to16, still a bare magic constant (see inline).
Still open / not addressed
- Test coverage. Same two integration tests as before (
issue4146,issue4146a). Your own description still says "I need to implement many more tests - at least one for each function". This is the headline blocker for the PR. - Plain-TIMESTAMP doc typos
data time/asssumed(datetime.md L196-197) — flagged previously to fix in the same PR; still present. - Stray rhetorical
// Why is this missing fractions of second?comment inserde_config.rs:204.
New issues introduced by this force-push
cast_to_TimestampTz_snow uses onlyDateTime::parse_from_rfc3339. That's a regression: the previous version at least acceptedYYYY-MM-DD HH:MM:SS[.f]andYYYY-MM-DD. RFC3339 also doesn't accept IANA zone names, even though both the docs andDBSPTimestampTzLiteraladvertise IANA support. Net effect: literals accept'2020-01-01 10:10:10 America/New_York'butCAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE)will now fail at runtime.python/tests/runtime_aggtest/aggregate_tests2/test_timestamp_tbl.py— the new TZ test data is broken in three ways (syntax error + two malformed strings). See inline; this file won't even import.DBSPTimestampTzLiteral.getMicroshas a negative-modulo footgun (see inline) — pre-1970 TZ literals will throw fromLocalDateTime.ofEpochSecondbecausenanoOfSecondmust be non-negative.
Keeping verdict at REQUEST_CHANGES until the test coverage gap and the regressed cast/python paths are addressed.
|
|
||
| #[doc(hidden)] | ||
| pub fn cast_to_TimestampTz_s(value: SqlString) -> SqlResult<TimestampTz> { | ||
| if let Ok(v) = DateTime::parse_from_rfc3339(value.str()) { |
There was a problem hiding this comment.
Two regressions vs. the previous implementation, both visible to users:
- No IANA zone names.
DateTime::parse_from_rfc3339only accepts numeric offsets. ButDBSPTimestampTzLiteralanddocs/sql/datetime.mdboth promise IANA names (America/New_York, etc.). SoTIMESTAMP WITH TIME ZONE '2020-01-01 10:10:10 America/New_York'works as a literal butCAST('2020-01-01 10:10:10 America/New_York' AS TIMESTAMP WITH TIME ZONE)fails. Please usechrono_tz(or equivalent) to handle IANA names, then fall back toparse_from_rfc3339for numeric offsets. - Date-only and space-separated forms dropped. The previous version of this function accepted
YYYY-MM-DD HH:MM:SS[.f](no offset, treated as UTC) andYYYY-MM-DD. RFC3339 requires theTseparator and an offset, so both of those casts now fail. At minimum, please keep the%Y-%m-%d %H:%M:%S%.fand%Y-%m-%dfallbacks (UTC-assumed, matching the literal grammar's "assumed to be in UTC").
A test that exercises each accepted format end-to-end (numeric offset, IANA name, date-only, no-offset) belongs with this fix.
| JsonFlavor::DebeziumMySql => Self { | ||
| time_format: TimeFormat::Micros, | ||
| date_format: DateFormat::DaysSinceEpoch, | ||
| // Why is this missing fractions of second? |
There was a problem hiding this comment.
Still here from prior review. Either answer the question (does Debezium MySQL really emit second-precision timestamps in this layout? — if so, say so) or open an issue and reference it. Don't ship a TODO disguised as a question.
|
Most of the Python tests are validated because they have the same expected shape and results as tests for TIMESTAMP. |
|
@snkas this is a big PR, but I think most scrutiny is needed in the Rust code dealing with connectors and pipeline manager, which is probably < 100 lines of code. Most of the code is in tests; see the above comment why the Python tests are probably correct. |
dbdd9a6 to
8091d80
Compare
snkas
left a comment
There was a problem hiding this comment.
From pipeline-manager perspective, it's a new type added to the OpenAPI schema, that part looks good.
For the connectors: the input and output format of this new data type is still MicrosSinceEpoch, so an integer. When would one use TIMESTAMP instead of TIMESTAMP WITH TIME ZONE, and vice versa? It seems they are effectively the same in terms of the queries you can do on them (e.g., it's no possible to query "give me all the tuples with the timestamp in time zone +01:00" or so)?
|
Our implementation with TIMESTAMP WITH TIME ZONE aligns with most major databases. For one, some datasets already include timestamps with time zones, so this will enable us to load them. Then there are functions to convert timezones. CONVERT_TIMEZONE is an example we already have, but there is a function (not yet implemented) AT TIME ZONE, which takes a timestamp at a given time zone and converts it to UDT. Indeed, we do not carry the actual time zone; time zones are relatively complex objects, which do not make sense without a TIMESTAMP, and whose definition evolves every year, they are not simply numeric offsets. |
|
@snkas any actionable feedback? Can we merge this and refine? |
mythical-fred
left a comment
There was a problem hiding this comment.
Big blockers from my prior two reviews are addressed: IANA zones now parse, the error message is actionable with the expected format, getMaxValue() is back to 6 fractional digits, DBSPTimestampTzLiteral.getMicros uses Math.floorDiv/Math.floorMod (no more negative-modulo footgun), the broken test_timestamp_tbl.py row is now valid Python with a clean +05:00 offset, and test coverage has expanded substantially (issue4146a with ten in-line cases, testCaseTz, testCompareTz, plus the new test_interval_*, test_cmp_operators.py, test_date_time_fn.py and TZ rows in test_timestamp_tbl.py). Approving. Two tiny leftovers inline.
| JsonFlavor::DebeziumMySql => Self { | ||
| time_format: TimeFormat::Micros, | ||
| date_format: DateFormat::DaysSinceEpoch, | ||
| // Why is this missing fractions of second? |
There was a problem hiding this comment.
This rhetorical TODO comment is still here from my first review. Either answer it (does Debezium MySQL truly emit second-precision timestamps in this flavor? — confirm and replace with a single-sentence explanation citing the upstream behaviour) or delete it. Floating questions to nobody are noise. Not a blocker.
There was a problem hiding this comment.
I see, so this PR has to benefits:
(1) It allows us to ingest data from sources that provide a column that has timestamp+timezone as its type. Would this not already be possible by adding an extra mapping to Timestamp? Given we don't store the timezone in any case.
(2) It opens up the possibility for SQL timezone-related functions. But if we don't store the timezone in this new type, wouldn't CHAR just still be the output type for both AT TIMESTAMP and CONVERT_TIMESTAMP? Documentation: https://docs.feldera.com/sql/datetime/#convert_timezone
As the PR description is not "Add type TIMESTAMP WITH TIMEZONE as an alias for TIMESTAMP", which would be my expectation so far based on our back-and-forth. How does the type differ?
| // hardwire that for now. We can make it configurable in the future. | ||
| // FIXME: Also, the timezone should be `None`, but that gets compiled into `timezone_ntz` | ||
| // in the JSON schema, which Databricks doesn't fully support yet. | ||
| SqlType::Timestamp => DataType::Timestamp( |
There was a problem hiding this comment.
There is this special case for Timestamp, wouldn't this also apply to TimestampTz? @ryzhyk added the comment here.
There was a problem hiding this comment.
I hope that this will work - the special case was None not being supported
|
In other databases an important difference between TIMESTAMP and TIMESTAMP WITH TIME ZONE is that the latter is usually adjusted for I/O based on the local timezone of the database, whereas the former is absolute. Now, Feldera databases always run in the UTC time zone, which has the nice benefit of making things deterministic. So these two do look very similar. Think of the former as a |
|
Merging this is blocked on #6133 - without this it requires a platform upgrade |
mythical-fred
left a comment
There was a problem hiding this comment.
Re-reviewing only the two new commits since the 2026-05-16 approval (70d1371 + 0a3ea0a).
Calcite bump commit is trivial. The TIMESTAMP WITH TIME ZONE commit is large but mostly mechanical delegation from Timestamp -> TimestampTz, which is the right approach.
A handful of blocking issues I want fixed before this lands:
-
Operator-precedence bug in the new TIMESTAMP_TZ guard in ExpressionCompiler.makeBinaryExpression. The
||binds outside theopcode.isComparison()guard, so non-comparison ops between a non-TZ left and a TZ right (or the reverse) also hit this branch with the misleading 'comparison-only' message. testCompareTz happens not to exercise this path because Calcite already rejects the subtraction at type validation. -
New
Variant::TimestampTzis added to the variant enum, butgolden_row_variantis still a Tup25 covering exactly the pre-existing 25 variants. The whole point of bumping storage-test-compat is to catch rkyv-layout changes; without TimestampTz in the golden tuple, the new variant arm is not exercised by the storage compat test at all. -
The pre-existing aggregate test
illarg_extract_abrv1_legalwas renamed todate_part_abrv1_legal, and the new TZ counterpart isdate_part_abrv1_legal_tz.discover_testsonly registers classes whose name starts withillarg_, so both classes silently drop out of the runtime aggregate test suite. The DATE_PART abbreviation coverage (existing) and its TZ extension (new) are no longer running. -
cast_to_TimestampTz_u64 returns an error message that says
TIMESTAMPinstead ofTIMESTAMP WITH TIME ZONE. The macro-generated wrappers below it got the message right; this one is the odd hand-written outlier.
Also a few non-blocking nits inline (section banner, datetime.md prose, lingering magic number from the prior round, two behaviour-change footnotes worth calling out in the commit body).
Connector coverage (Avro/JSON/Parquet/Delta/datagen/adhoc) and the new visitors all look complete; adding the new Variant enum arm at the end of the enum (not the middle) is the right call for forward-compat.
c9d718d to
f6ae7fe
Compare
mythical-fred
left a comment
There was a problem hiding this comment.
One blocker still open (storage compat golden row); the other three are fixed. See inline.
d603a63 to
b7e7cbc
Compare
9937e73 to
9289004
Compare
mythical-fred
left a comment
There was a problem hiding this comment.
Storage-compat blocker resolved. Tup26 + GoldenRowVariant properly covers all 26 variant arms including TimestampTz, new golden files added alongside existing ones (no removals), README updated. All four blockers from the 5/22 review are now fixed. Two small nits inline.
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Signed-off-by: Mihai Budiu <mbudiu@feldera.com>
Fixes #4146
This PR finally adds support for the last important missing SQL type
TIMESTAMP WITH TIME ZONEThere are multiple ways to implement this; I have chosen a version which provides similar capabilities with most databases: timestamps with time zones are stored at runtime as simply UTC timestamps. Thus they are comparable to each other. No time zone information is preserved at runtime, though.
I will mark this PR as a draft for two reasons:
The Rust serialization code will need to be audited for compatibility with other standard formats (e.g., Debezium).