feat(schema): represent, serialize and validate v3 column default values (1/4)#746
feat(schema): represent, serialize and validate v3 column default values (1/4)#746huan233usc wants to merge 1 commit into
Conversation
445f7ae to
4aade00
Compare
| std::shared_ptr<const Literal> initial_default_; | ||
| std::shared_ptr<const Literal> write_default_; |
There was a problem hiding this comment.
ReassignField constructs a new SchemaField via the 5-argument constructor which initializes initial_default_ and write_default_ to nullptr. When schema IDs are reassigned (e.g., copying a schema with fresh IDs via the Schema(get_id) path), all default values on fields are silently lost. We should copy all field properties including initialDefault and writeDefault.
There was a problem hiding this comment.
Good catch, confirmed. Defaults are now constructor args, and ReassignField passes the source field's initial_default_ptr()/write_default_ptr() through, so they're shared with the reassigned field, not lost. Added ReassignIdsPreservesDefaultValues.
| if (initial_default_json.has_value()) { | ||
| ICEBERG_ASSIGN_OR_RAISE(Literal literal, | ||
| LiteralFromJson(*initial_default_json, field.type().get())); | ||
| field = field.WithInitialDefault(std::move(literal)); | ||
| } | ||
| if (write_default_json.has_value()) { | ||
| ICEBERG_ASSIGN_OR_RAISE(Literal literal, | ||
| LiteralFromJson(*write_default_json, field.type().get())); | ||
| field = field.WithWriteDefault(std::move(literal)); | ||
| } |
There was a problem hiding this comment.
The deserialization first constructs a bare SchemaField, then conditionally calls WithInitialDefault/WithWriteDefault, each of which copies the entire field (including the shared_ptr<Type>). This is an unnecessary intermediate copy.
There was a problem hiding this comment.
Agreed — FieldFromJson now parses the defaults first and builds the field in one construction. Intermediate copy gone.
| SchemaField SchemaField::WithInitialDefault(Literal initial_default) const { | ||
| SchemaField copy = *this; | ||
| copy.initial_default_ = std::make_shared<const Literal>(std::move(initial_default)); | ||
| return copy; | ||
| } |
There was a problem hiding this comment.
I don't think it's need to copy the whole SchemaField, can we just set the initial_default_ field and return *this.
Also the following With methods.
There was a problem hiding this comment.
Agreed — moved defaults into the constructor and removed both With... methods, so construction no longer copies the field.
fe4bb8f to
1ee5b32
Compare
First of a multi-part split of column default value support (apache#730) — the schema foundation the read and evolution paths build on. Purely additive; no read/write behavior change on its own. - SchemaField carries `initial-default` / `write-default` (immutable std::shared_ptr<const Literal>) with copy-preserving WithInitialDefault / WithWriteDefault modifiers; getters return optional<reference_wrapper>. - JSON serde reads/writes `initial-default` / `write-default` via the existing single-value serialization. - Schema::Validate rejects default values below format v3 and validates they are non-null primitive literals matching the field type. - Generic schema projection maps a column missing from a data file with an initial-default to FieldProjection::Kind::kDefault. Read-path application (Parquet/Avro) and schema evolution follow in separate PRs. See apache#731 for the full end-to-end proof-of-concept.
1ee5b32 to
34470af
Compare
Part 1 of a multi-part split of #730 (column default values, item 2 of #637). The full
end-to-end implementation is in #731, kept open as the proof-of-concept; this series
lands it in reviewable pieces.
This PR is the schema foundation — representing, serializing and validating v3
column default values. It is purely additive and changes no read or write behavior on
its own.
What's in this PR
SchemaFieldcarriesinitial-default/write-default, stored asstd::shared_ptr<const Literal>(immutable payload shared across copies, like theadjacent
type_; the C++ analog of Java'sfinal Literal<?>). Getters returnstd::optional<std::reference_wrapper<const Literal>>(theSchema::FindFieldByNameidiom). Copy-preserving
WithInitialDefault/WithWriteDefaultmodifiers set them.initial-default/write-defaultusing the existingsingle-value serialization (all primitive types).
Schema::Validate: rejects default values below format v3(
kMinFormatVersionDefaultValues), and validates that a default is a non-nullprimitive literal matching the field type.
initial-defaultmaps to
FieldProjection::Kind::kDefaultcarrying the literal (the per-format readersconsume this in the follow-up PRs).
Follow-ups (stacked on this PR)
literal_util+ parquet projection/materialization)UpdateSchemaadd/update column defaults)Testing
Added tests