feat(utils): add tool to recover unapplied WAL data as Parquet files#7161
feat(utils): add tool to recover unapplied WAL data as Parquet files#7161javier wants to merge 8 commits into
Conversation
A forensic recovery tool that exports un-applied WAL data to Parquet files. Designed for the case where a table is suspended or its base storage is corrupt: the operator wants to extract whatever survives in the WAL before scrubbing the table. Lives in the utils/ module alongside RebuildIndex/RecoverVarIndex. Strict read-only - no CairoEngine boot, no writes, no exclusive locks. Safe to run against a live instance under snapshot semantics: anything visible in _txnlog at the time of the walk is in scope, anything appended after is ignored. The _txnlog header is read through a transient RO file handle, then TableTransactionLogV1/V2 is constructed without calling open() (which would mmap _txnlog read-write); the cursor opens its own RO handles internally for the record walk. Discovery scans the db root, picks up every directory that has a txn_seq/_txnlog file, filters out hidden entries, and by default skips sys.* tables (override with --include-system). Tables whose WAL is fully purged are skipped unless --include-empty is set. Single-table mode via --table-dir is still available for targeted runs. Per (walId, segmentId) the tool emits one Parquet file named <tableName>__wal<walId>__seg<segId>__seqTxn<lo>-<hi>.parquet under --output-dir. The committed row count comes from the segment's _event file (canonical source, works on both V1 and V2 sequencer logs). Segments referenced by _txnlog but missing on disk - the WAL-purge scenario - degrade gracefully with a one-line manifest entry and the walk continues. For each column the PartitionDescriptor is built directly from WalReader's mmap'd memory: VARCHAR/BINARY/ARRAY and fixed-width types are zero-copy. Two special cases: The designated timestamp column on WAL disk is 16 bytes per row (timestamp followed by rowID for O3 handling), not 8. The encoder expects 8 bytes per row, so we allocate a compact buffer and stride-copy only the timestamp halves. The detection key is column == reader.getTimestampIndex(), which fires for both TIMESTAMP and TIMESTAMP_NS designated timestamps. SYMBOL columns are trickier: the WAL's local <column>.c/.o/.k files at wal<N>/ only hold the cleanSymbolCount snapshot from the base table. New symbols added during this WAL's commits live in _event and aren't on disk in the same format. WalReader.getSymbolValue already merges both sources into an in-memory map. The tool walks each column's .d file for the maximum referenced key, then for every key from 0..maxKey resolves the value via WalReader.getSymbolValue and synthesises a (values, offsets) buffer pair in native memory in the layout PartitionEncoder's native code expects. Validates round-tripping the live demo_trades_today table's wal4/seg0 (16 rows) and wal5/seg0 (10,015 rows) via parquet_scan.
The 15-line WalDirectoryPolicy stub doesn't warrant its own file. Move it to a private static nested class inside WalToParquet so the utility lives in a single source file, matching the spirit of the other single-file utilities in the cliutil package.
Builds on the initial single-file utility with the items the original forensic-recovery plan called for: Manifest: per-table JSON manifest sits next to the Parquet files and records every segment the tool saw (written, skipped, or partial), every structural-change transaction in seqTxn order, the txnlog format/maxTxn header, and per-segment reasons. The operator now has a machine-readable trail of what was recovered vs lost. Per-row shoulder columns: _wal_id, _segment_id, _segment_txn, _seq_txn, _commit_ts are emitted by default so downstream consumers can dedupe recovered rows against whatever survives in the base table. The _segment_txn is derived per-row by replaying the segment's _event, seq_txn and commit_ts are mapped from the txnlog records. Opt out with --no-shoulder. Tier 2 (txnlog corrupt or missing): falls back to a filesystem scan of wal*/N/ directories. Cross-segment seqTxn ordering is unknown so files are written as <table>__wal<N>__seg<N>__tier2.parquet and the manifest reflects it. Tier 3 (segment _event corrupt or missing): row count is derived from the timestamp .d file size (16 B/row in WAL), and a direct-mmap emission path bypasses WalReader (which itself requires _event). SYMBOL columns are resolved from the WAL's on-disk symbol files - the base table snapshot at WAL open time - so codes referencing new-in-WAL symbols (which lived only in _event) are clamped to NULL. The manifest notes how many rows were affected per column. Files are suffixed __tier3.parquet. Tier 4 (segment _meta corrupt): peer-segment schema substitution. The tool scans other wal*/N/ directories for the same table, finds the first segment with a readable _meta, and uses that as a schema source. The schema may not match exactly if columns were added or dropped between segments; the manifest flags this. Composes with tier 3 so a segment missing both _meta and _event still recovers. Partial column file loss: pre-checks every column's .d (and .i for var-size) before opening WalReader and records each missing file in the manifest's skippedColumns list. Even if WalReader subsequently fails to construct because of the loss, the manifest still tells the operator exactly which column files are gone. Tests: 10 unit tests for the Args parser, the parquet filename builder across all tier combinations, and TableInfo.fromDirName parsing. Integration tests with programmatic WAL construction would need CairoEngine setup and are left as follow-up. README: build instructions and a Running section with the Java 17+ module-access flags moved to the top of utils/README.md. A new section documents WalToParquet with options, output layout, tier suffix matrix and worked examples.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Per-txn SYMBOL resolution: QuestDB's WAL writer can reassign the same numeric symbol code to different strings across transactions inside one segment (most visibly after a suspend/resume cycle, where the WAL writer's local symbol space resets and code 0 in batch 2 means a different string than code 0 in batch 1). The previous synthesis used WalReader.getSymbolValue, which exposes a single merged map and silently lets later diffs overwrite earlier ones. The new synthesizeSymbolBuffersPerTxn walks _event, snapshots each segmentTxn's effective dictionary for the column, resolves every row through its own transaction's snapshot, then deduplicates strings into a single global dictionary with newly assigned dense codes. The remapped .d buffer plus the new chars/offsets buffers are passed to PartitionEncoder. Validated against a live table where the bug was reproducible: batch 1 rows now correctly show BTC/ETH/XRP/SOL and batch 2 rows correctly show NEW-AAA/BBB/CCC, with seven distinct symbols across the recovered Parquet instead of four. SQL log sidecar: a new <tableName>__sql_log.json captures every non-DATA transaction observed in any of the table's WAL segments (UPDATE, TRUNCATE, view definitions, mat-view invalidation). For each it records (walId, segmentId, segmentTxn, seqTxn, commitTimestamp, type, sql). These transactions do not materialise as rows in the Parquet output - their effect would require replay against a live table - so the sidecar is the only record of what work the WAL contained beyond raw inserts. Operators reconcile the sidecar against the data files via seqTxn. Per-row shoulder column rename: _seq_txn was renamed to _txnSeq_ for naming consistency with the sidecar's seqTxn field. The column carries the global QuestDB sequencer transaction id that originally wrote each row, so an UPDATE statement recorded in the sidecar at seqTxn=N can be located inside the Parquet files via _txnSeq_=N. Integration tests under utils/src/test: a real WAL is built via CairoEngine and SqlExecutionContextImpl in a temp data root, ApplyWal2TableJob is deliberately not started so rows stay unapplied, WalToParquet.main is invoked against the temp root, and the recovered Parquet is read back through parquet_scan to assert content and schema. Coverage: - testHappyPathRecovery exercises ts/long/symbol/double with shoulder columns enabled. - testAllDataTypesRoundtrip covers the full type surface: TIMESTAMP, TIMESTAMP_NS, SYMBOL, DECIMAL(20,4), DOUBLE, DOUBLE[][], UUID, FLOAT, LONG, BINARY, VARCHAR; asserts aggregates plus a row-level length check on BINARY. - testSqlLogCapturesUpdates runs an INSERT followed by an UPDATE, both stay in WAL, and verifies __sql_log.json captures the UPDATE statement with type=SQL and the original SQL text. - testNoShoulderFlag confirms --no-shoulder omits the provenance columns. README: WalToParquet section moved to the last entry under utils/ (it is the longest section, ends with worked examples). Updated to document the new _txnSeq_ column name, the __sql_log.json sidecar, and the per-txn SYMBOL handling note. Build and Running instructions were already at the top of the file.
Recovery status column: a sixth shoulder column _recovery_status_ flags each row as "unapplied", "applied_unpurged" or "unknown" by comparing its _txnSeq_ against the table's appliedSeqTxn watermark read from _txn at the table root. The watermark is read with TxReader semantics - pick the active record via version parity, then read seqTxn plus absolute lag txn count from that base, matching TableWriter. getAppliedSeqTxn(). Manifest also surfaces appliedSeqTxn directly. Operators can filter "applied_unpurged" rows after recovery if they trust the committed partitions, or keep them all when the committed file is the thing being scrubbed. Schemas sidecar: <tableName>__schemas.json captures the column list at each distinct structureVersion observed across the table's WAL segments. Each entry records name, type, writerIndex, and isDesignatedTimestamp. Versions are emitted in ascending numeric order so the file is deterministic across runs regardless of File. listFiles() order. The original ALTER SQL statements that produced each transition are not extracted - that would need CairoEngine or a full reimplementation of AlterOperation's binary format - so this sidecar is the authoritative schema-evolution record. ManifestSegment.structureVersion: each segment in the manifest now carries the structureVersion it was written under. Set during the existing recordMissingColumnFiles pass for happy-path segments, and again right after tier-3 selects either own or peer-fallback _meta. This closes the operator workflow chain: Parquet file -> manifest segment entry -> structureVersion -> schemas.json[version] -> column list. Native memory accounting: happy-path symbol synthesis now tracks each offsets buffer's size so Unsafe.free can deduct the real amount from MemoryTag.NATIVE_DEFAULT counters. Tier-3 path was leaking all but the last SYMBOL column's clamped-codes buffer because tracking used scalar fields rather than a LongList. Replaced with parallel LongLists so every allocation is freed. JSON readability: disable Gson's HTML-escaping so sidecar files print literal '<', '>', '=' and quote characters instead of <-style escapes. Operators read sql_log.json by hand. Tests: three new integration tests bring the count to 17. testRecoveryStatusColumn verifies unapplied rows get tagged when ApplyWal2TableJob never runs. testSchemasSidecar covers the basic schemas file emission. testSchemaEvolutionMapping inserts rows at structureVersion 0, runs ALTER TABLE ADD COLUMN, inserts more rows at version 1, then asserts both versions appear in __schemas.json with correct column counts AND each written manifest segment references the correct structureVersion. testHappyPathRecovery's column-count assertion bumped from 10 to 11 to account for the new _recovery_status_ shoulder column. README: documents _recovery_status_, the appliedSeqTxn watermark, the __schemas.json sidecar with deterministic version ordering, and the operator workflow for mapping a Parquet file back to its schema via manifest -> structureVersion -> schemas.
Tier-3 fallback used to derive a segment's rowCount from the timestamp column's .d file size on disk. WAL column files are mmap-preallocated, so the file length reports capacity not committed appends - in the unreadable-event test setup that meant a 1-row table was being "recovered" as 65,536 rows of fabricated data. The new behaviour: - Each txnlog cursor record's getTxnRowCount() is captured into the per-segment txnRowCounts list during enumerateSegments. V1 sequencer format throws UnsupportedOperationException for that call, so V1 slots stay at -1. - The tier-3 fallback now sums those per-txn counts via sumTxnRowCounts(). If any referenced segmentTxn lacks a trustworthy row count (i.e., V1 format, or tier-2 mode where the txnlog itself is unreadable) the helper returns -1 and the segment is marked unrecoverable with a new manifest status skipped_event_unreadable_no_row_count. No Parquet is written for that segment - silent fabrication of capacity-byte rows is worse than refusing to emit. - A zero-row sum gets its own status skipped_event_unreadable_zero_rows so operators can tell "no data to recover" from "couldn't determine row count". Event-walk hardening (per-event try/catch). collectNonDataEvents now uses three nested catches: - _event file unopenable (er.of() throws): one UNKNOWN_EVENT_UNREADABLE entry scoped to the segment is emitted and we move on. The WalEventReader is also closed via try-with-resources whether construction or of() fails (previously er.of() could leak the reader). - hasNext()/getType()/getTxn() throws: WalEventCursor reads the full record - including dispatch on the type byte - inside hasNext(), so an unknown OSS-invisible type byte surfaces here as an exception with no segmentTxn known. One UNKNOWN_EVENT_UNREADABLE entry is recorded with the (walId, segmentId) and error string, then we stop - cursor state is undefined past this point. - SQL body parse failure with valid header: the entry preserves type/walId/segmentId/segmentTxn/seqTxn/commitTimestamp and carries the parse error in a new "error" field; iteration continues. New "error" field on ManifestSqlStatement keeps error context separate from the "sql" text so downstream consumers can tell "we have SQL" from "we couldn't read it". Schemas mapping per segment. ManifestSegment.structureVersion was populated only in the early recordMissingColumnFiles() pass. When that pass failed but tier-3 later recovered the metadata via peer-segment fallback, the manifest still carried -1 for the segment's structureVersion, making __schemas.json unable to map back to the file. Now writeSegmentToParquetTier3 sets entry.structureVersion = meta.getMetadataVersion() right after the metadata it will actually use is selected, covering the peer-fallback path. Schemas sorted by numeric structureVersion. The byVersion map was populated by File.listFiles() which has no guaranteed order, so the schemas file's emission order was filesystem-dependent. Now keys are collected, sorted with Collections.sort, and re-inserted into the output LinkedHashMap so the file is deterministic across runs. Gson HTML-escaping disabled across all three sidecars. SQL log previously printed >= for ">=" and ' for "'", making the file barely readable by hand. New output writes the literal characters. Tests bring the count to 18 (10 unit + 8 integration). The unreadable-event test now asserts no Parquet is emitted, the manifest's segment carries the unrecoverable status, rowsWritten=0 and outputFile=null - the silent 65k fabrication is gone. README updated to: clarify the tier-3 row in the suffix matrix (V2-only recovery via txnlog row counts, V1 marked unrecoverable), spell out the UNKNOWN_EVENT_UNREADABLE failure shapes accurately, describe the schemas-version-to-segment join, and list _recovery_status_ as the sixth shoulder column.
Closes the partial-malloc-leak windows around the new trackNativeAllocation helper by wrapping the three-call registration sequences with flag-tracked orphan cleanup so a LongList resize-OOM mid-sequence cannot orphan the tail buffers. Tightens tier-3 fixes from prior rounds: - int clampedSize widened to long to avoid overflow at rowCount near Integer.MAX_VALUE - SymbolMapReaderImpl is registered into the cleanup pool before sr.of() runs so a corrupt-symbol-files throw cannot leak the reader - columnMemories.add for mem and aux moved inside the inner try so a resize-OOM during registration cannot leak the mmap - per-row recovery status reconstructed from txnlog row counts (previously a single segment-wide constant) so multi-txn segments get correct per-row applied/unapplied attribution - reconciliation note in the manifest if the txnlog row counts cover fewer rows than rowCount - Path try/finally around Vm.getCMRInstance calls - maxTxn watermark refreshed from observed seqTxn after the walk - discoverTables filter loosened so user tables starting with underscore are not silently dropped - recordMissingColumnFiles deferred to the WalReader-failure path, with a cheap recordSegmentStructureVersion called on the happy path - RECOVERY_STATUS dictionary pinned to its index constants by a runtime check that fires even without -ea Adds defensive infrastructure: - TestUtils.assertMemoryLeak harness with named 8 KB slack for pool warmup; 8 of 14 main integration tests wrapped - trackNativeAllocation helper that frees its pair's buffer on a resize-OOM and rolls back the addr list Adds coverage for paths previously uncovered: - testV2Tier3PerRowRecoveryStatusAcrossTxns regression for the multi-txn per-row recovery status - testV2Tier3RecoversFromCorruptEvent for the V2 tier-3 happy path - testTier2CorruptTxnlog in its own class to isolate engine.clear() blast radius - testTier4PeerMetaFallback for the corrupt-_meta peer fallback via V2 tier-3 - testPartialColumnFileLoss in its own class for missing .d files - testNullValuePreservation across LONG, DOUBLE, SYMBOL, VARCHAR - testDropColumnSchemaEvolution and testRenameColumnSchemaEvolution verifying recovered Parquet content per structureVersion - testMultiSegmentRecovery exercising ALTER-driven segment rolls - testUnderscorePrefixedUserTable regression for discoverTables Fixes a pre-existing UUID/LONG128 comparison bug in TestUtils.assertColumnValues (lr.getLong128Hi swapped to getLong128Lo for the low-half comparison). Adjusts utils/pom.xml to keep Java compiler target at 17 matching the rest of the project. README documents the manifest's tier-2 sqlStatements/structuralChanges gap and the Gson rationale. All 30 tests pass.
Level-3 review result (pass 5)Summary Verdict: approve. No Critical. One real Moderate (M1: class-init ordering fragility — only triggers if a future contributor reorders the static fields alphabetically). Five Minor items, all cosmetic or coverage gaps. Findings tally: 0 Critical, 1 Moderate, 7 Minor verified. ~6 raw agent claims downgraded after source verification (Agent 10's In-diff vs out-of-diff: all 8 findings are in-diff. Zero out-of-diff regressions. Agent 9 reconfirmed zero production callers and clean upstream contracts. Comparative trajectory across five passes:
The PR has reached the asymptote of what level-3 will find. Remaining items are cosmetic or coverage gaps that exist in many comparable PRs across QuestDB. Ready for human approval. |
|
Summary
Adds
io.questdb.cliutil.WalToParquet, a strict read-only forensic utility that walks every WAL table under a QuestDB data root and exports each un-purged segment as a Parquet file. Designed for the case where a table is suspended or its committed partitions are corrupt and the operator wants to extract whatever still lives in the WAL before scrubbing.No
CairoEngineis booted, no writes touch the source tree, no exclusive locks. Safe to run against a live, running QuestDB instance under snapshot semantics: anything visible in_txnlogat the start of the walk is in scope, anything appended after is ignored.Output
Each WAL segment becomes one Parquet file alongside three JSON sidecars per table:
<tableName>__manifest.json-- every segment the tool considered (written, skipped, partial), every structural-change transaction in seqTxn order, the txnlog format version,maxTxnandappliedSeqTxnwatermarks, per-segment reasons, and each written segment'sstructureVersion.<tableName>__sql_log.json-- every non-DATA transaction (UPDATE, ALTER TABLE, TRUNCATE, view/mat-view events) with(walId, segmentId, segmentTxn, seqTxn, commitTimestamp, type, sql, error). Cross-referenceseqTxnagainst the_txnSeq_shoulder column in the Parquet files.<tableName>__schemas.json-- column list at each distinctstructureVersionobserved across the table's WAL segments, sorted ascending. Operators recreate the table at the right version viaParquet -> manifest segment -> structureVersion -> schemas[version]before loading rows.Per-row provenance (shoulder columns)
Default-on (
--no-shoulderto opt out):_wal_id,_segment_id,_segment_txn,_txnSeq_(= seqTxn),_commit_ts,_recovery_status_._recovery_status_isunapplied/applied_unpurged/unknown, derived from each row's_txnSeq_against the table'sappliedSeqTxnwatermark read directly from_txnat the table root (TxReader semantics: version-parity record selection +seqTxn + abs(lagTxnCount)).Tiered fallback
_txnlog,_event,_meta, all column files intact.__tier3.parquet_eventmissing/corrupt. Row count comes from the txnlog's per-txn row counts (V2 sequencer format only). New-in-WAL symbols are clamped to NULL (only the base symbol-table snapshot recovers).__tier2.parquet_txnlogmissing/corrupt. Filesystem scan ofwal*/N/; cross-segment seqTxn ordering is unknown;manifest.structuralChangesandmanifest.sqlStatementsare empty in tier-2 because both lists come from the txnlog walk.__tier2__tier3.parquetIf
_eventis gone AND the txnlog is V1 (QuestDB default), the segment is markedskipped_event_unreadable_no_row_countand no Parquet is emitted, refusing to fabricate row counts from mmap-preallocated column file lengths.Tier 4 (per-segment
_metacorrupt) borrows schema from a peer segment; the manifest flags substitution.Per-txn SYMBOL resolution
QuestDB's WAL writer can reassign the same numeric symbol code to different strings across transactions inside one segment (especially after suspend/resume). The tool walks
_eventto build per-txn snapshots of each SYMBOL column's dictionary, resolves every row through its own transaction's snapshot, then deduplicates strings into a single global dictionary with newly assigned dense codes for the Parquet output. Rows are correctly attributed regardless of code reuse.Event-walk failure handling
Per-event, not per-segment. Three failure shapes:
errorstring; iteration continues.hasNext()/getType()/getTxn()failure (typically an unknown OSS-invisible type byte, or framing corruption): oneUNKNOWN_EVENT_UNREADABLEentry recorded with(walId, segmentId)andsegmentTxn=-1; event collection stops (cursor state undefined past this point)._eventfile unopenable: oneUNKNOWN_EVENT_UNREADABLEscoped to the segment.Known limitations
manifest.structuralChangesas{seqTxn, commitTimestamp}markers only. Extracting the original ALTER SQL text would require booting CairoEngine or reimplementingAlterOperation's binary serialisation; the resulting schemas in__schemas.jsonare the authoritative source instead.GRANT/CREATE USERas ordinaryWalTxnType.SQLevents with the standard OSS framing, the SQL text is captured verbatim. Genuinely new event type bytes throw insideWalEventCursorand becomeUNKNOWN_EVENT_UNREADABLE(no per-txn detail).Test plan
mvn -pl utils -am test -Dtest='WalToParquet*' -Dsurefire.failIfNoSpecifiedTests=false-- 30 tests, all pass (10 unit + 14 integration + 3 V2 integration + 1 tier-2 integration + 1 tier-4 integration + 1 partial-file integration).CairoEnginein a temp data root, deliberately never runApplyWal2TableJobso data stays unapplied (except where applied-watermark behavior is being tested), invokeWalToParquet.main(), then read the recovered Parquet back throughparquet_scan.testHappyPathRecovery-- tier-1, shoulder columns enabled.testAllDataTypesRoundtrip-- full type matrix: TIMESTAMP, TIMESTAMP_NS, SYMBOL, DECIMAL(20,4), DOUBLE, DOUBLE[][], UUID, FLOAT, LONG, BINARY, VARCHAR.testSqlLogCapturesUpdates-- UPDATE statement captured with full SQL text.testRecoveryStatusColumn-- never-applied rows taggedunapplied.testAppliedSeqTxnReadFromTxn-- afterApplyWal2TableJob.drain(0), the_txnwatermark is read correctly and surviving rows tag asapplied_unpurged.testNullValuePreservation-- explicit NULL values in LONG, DOUBLE, SYMBOL, VARCHAR survive the round-trip alongside non-NULL values.testSchemasSidecar-- basic schemas emission.testSchemaEvolutionMapping-- ADD COLUMN bumps structureVersion 0 to 1; both versions inschemas.json; manifest segments reference correct versions.testDropColumnSchemaEvolution-- DROP COLUMN counterpart; verifies the dropped column is absent from both the post-drop schema AND the post-drop recovered Parquet.testRenameColumnSchemaEvolution-- RENAME COLUMN counterpart; verifies the pre- and post-rename names land in their respective__schemas.jsonentries, and that manifest segments reference the correct version each.testMultiSegmentRecovery-- two ALTERs roll the WAL into three segments; verifies three Parquet files are emitted with correct per-segment row counts and the total matches the inserted row count viaparquet_scan.testUnreadableEventRecordedAsPlaceholder-- truncated_eventproducesUNKNOWN_EVENT_UNREADABLEentry, manifest segment marked unrecoverable, no Parquet emitted (asserts the now-fixed bogus-row fabrication).testUnderscorePrefixedUserTable-- discoverTables surfaces user tables whose directory name starts with_(QuestDB'sisValidTableNamepermits the prefix).testNoShoulderFlag----no-shoulderomits the shoulder columns.testV2TxnlogFormatVersionRecorded(V2 selection sanity),testV2Tier3RecoversFromCorruptEvent(truncate_event, assert__tier3.parquetcarries the correct row count from txnlog), andtestV2Tier3PerRowRecoveryStatusAcrossTxns(apply watermark falls inside a multi-txn segment; per-row_recovery_status_must reflect each row's own seqTxn, not a segment-wide constant; asserts specific row counts per (seqTxn, status) bucket).testTier2CorruptTxnlog-- truncate_txnlog, assert filesystem-scan fallback produces__tier2.parquetwith correct row count and the manifest'stxnLog.statusiserror.testTier4PeerMetaFallback-- corrupt both_eventand_metafor one segment, verify the peer-meta fallback inside the tier-3 path borrows the schema from an intact peer segment and emits Parquet with a tier-4 note inskippedColumns.testPartialColumnFileLoss-- delete one column's.dfile; assert manifest surfaces the specific missing column AND the segment is markedskipped_reader_open_failedAND no Parquet is emitted.testHappyPathRecovery,testPartialColumnFileLoss,testV2Tier3PerRowRecoveryStatusAcrossTxns) wrap their bodies in a localassertMemoryLeakhelper that tracksMemoryTag.NATIVE_DEFAULTagainst an 8 KB engine-warmup slack. The harness pins the partial-malloc-leak fixes insynthesizeSymbolBuffersPerTxnandmakeRecoveryStatusColumnagainst future regression.