Skip to content

[dbsp] Improving logging for treatment of files in storage at startup.#6289

Merged
blp merged 1 commit into
mainfrom
log-gc
May 21, 2026
Merged

[dbsp] Improving logging for treatment of files in storage at startup.#6289
blp merged 1 commit into
mainfrom
log-gc

Conversation

@blp

@blp blp commented May 20, 2026

Copy link
Copy Markdown
Member

This makes it clearer what's happening for GC of files at startup:

  • It logs at INFO level the names of the first several files and directories that it deletes.

  • It logs the names of unexpected files that it leaves in place.

  • It logs the counts of all kinds of files that it encounters.

In addition, it logs the names of batch files that should appear in storage but do not.

It adds tests for these behaviors.

Describe Manual Test Plan

Added unit tests.

Checklist

  • Unit tests added/updated
  • Integration tests added/updated
  • Documentation updated
  • Changelog updated

@blp blp requested a review from gz May 20, 2026 22:27
@blp blp self-assigned this May 20, 2026
@blp blp added bug Something isn't working DBSP core Related to the core DBSP library storage Persistence for internal state in DBSP operators rust Pull requests that update Rust code user-reported Reported by a user or customer labels May 20, 2026
Comment thread crates/dbsp/src/circuit/checkpointer.rs Outdated
Comment thread crates/dbsp/src/circuit/checkpointer.rs Outdated
Comment thread crates/dbsp/src/circuit/checkpointer.rs Outdated
Comment thread crates/dbsp/src/circuit/checkpointer.rs Outdated

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved already by two reviewers; I only have nits. The GC logic itself reads correctly, and tolerating duplicate list() reports by clearing indexes rather than removing the entry is a nice touch.

Comment thread crates/dbsp/src/circuit/checkpointer.rs
Comment thread crates/dbsp/src/circuit/checkpointer.rs
Comment thread crates/dbsp/src/circuit/checkpointer.rs Outdated
Comment thread crates/dbsp/src/circuit/checkpointer.rs

@mythical-fred mythical-fred left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup — the INFO summary plus first-N sampling of kept/removed paths is exactly what an on-call wants when storage GC misbehaves, and the missing-file ERROR with UUIDs makes silent corruption loud. Test coverage is excellent (cold start, healthy reopen, stray-file disposition, lost-batch detection with a fixed UUID for stable assertions).

A couple of non-blocking notes inline.

Comment thread crates/dbsp/src/circuit/checkpointer.rs
Comment thread crates/dbsp/src/circuit/checkpointer.rs
Comment thread crates/dbsp/src/circuit/checkpointer.rs
@gz

gz commented May 21, 2026

Copy link
Copy Markdown
Contributor

we should also make sure we surface any errors we encounter during GC when reading those files in the log (if we don't already)

@blp

blp commented May 21, 2026

Copy link
Copy Markdown
Member Author

we should also make sure we surface any errors we encounter during GC when reading those files in the log (if we don't already)

I'm having trouble understanding this suggestion.

This makes it clearer what's happening for GC of files at startup:

- It logs at INFO level the names of the first several files and
  directories that it deletes.

- It logs the names of unexpected files that it leaves in place.

- It logs the counts of all kinds of files that it encounters.

In addition, it logs the names of batch files that should appear in
storage but do not.

It adds tests for these behaviors.

Signed-off-by: Ben Pfaff <blp@feldera.com>
@blp

blp commented May 21, 2026

Copy link
Copy Markdown
Member Author

I pushed an update for everything in the resolved comments.

@blp blp added this pull request to the merge queue May 21, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 21, 2026
@gz

gz commented May 21, 2026

Copy link
Copy Markdown
Contributor

I'm having trouble understanding this suggestion.

sorry I just meant, "in case the gather_files_for_checkpoint_function" has an error, can we make sure we surface it in the log? it may already happen

@blp

blp commented May 21, 2026

Copy link
Copy Markdown
Member Author

I'm having trouble understanding this suggestion.

sorry I just meant, "in case the gather_files_for_checkpoint_function" has an error, can we make sure we surface it in the log? it may already happen

Ah.

We can do better there, I'll work on it.

@blp blp added this pull request to the merge queue May 21, 2026
Merged via the queue into main with commit 6922a49 May 21, 2026
1 check passed
@blp blp deleted the log-gc branch May 21, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working DBSP core Related to the core DBSP library rust Pull requests that update Rust code storage Persistence for internal state in DBSP operators user-reported Reported by a user or customer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants