Skip to content

br: Cherry pick from feature restore from replication storage#68601

Open
Leavrth wants to merge 4 commits into
pingcap:release-8.5from
Leavrth:cherry-pick-from-feature-restore-from-replication-storage
Open

br: Cherry pick from feature restore from replication storage#68601
Leavrth wants to merge 4 commits into
pingcap:release-8.5from
Leavrth:cherry-pick-from-feature-restore-from-replication-storage

Conversation

@Leavrth
Copy link
Copy Markdown
Contributor

@Leavrth Leavrth commented May 22, 2026

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features

    • CLI flags: --retain-latest-mvcc-version, --restore-phase, --replication-status-sub-prefix.
    • Runtime support & probe for batch-download of newest MVCC-per-key SSTs.
    • Tagged backup-meta filenames and helpers to detect DDL metas and persist resume-state.
  • Refactor

    • Split compacted-SST restore into collect + restore phases; checkpoint-aware progress and phase-based lifecycle.
    • Improved metadata read/caching and concurrent meta-KV file reading.
  • Tests

    • Added SST collection validation and concurrency-focused meta/KV read tests.

Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
@ti-chi-bot ti-chi-bot Bot added do-not-merge/cherry-pick-not-approved release-note-none Denotes a PR that doesn't merit a release note. labels May 22, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 22, 2026

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be LGTMed and approved by the reviewers firstly.
  2. For pull requests to TiDB-x branches, it must have no failed tests.
  3. AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 22, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 22, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign 3pointer for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 22, 2026

Hi @Leavrth. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 33283bb2-e7c9-41be-b2a7-bd70815d2185

📥 Commits

Reviewing files that changed from the base of the PR and between d455458 and 2f6522f.

📒 Files selected for processing (5)
  • br/pkg/restore/log_client/client.go
  • br/pkg/stream/stream_mgr.go
  • br/pkg/stream/stream_misc_test.go
  • br/pkg/utils/iter/combinator_test.go
  • br/pkg/utils/iter/combinator_types.go
💤 Files with no reviewable changes (1)
  • br/pkg/restore/log_client/client.go

📝 Walkthrough

Walkthrough

Adds BatchDownloadLatestMVCC and a support probe; implements retain-latest-MVCC SST download and routing in SnapFileImporter; refactors SST restore into collect+restore phases; adds two‑phase stream restore and replication resume-state; introduces tagged backupmeta filename flags and makes metadata reads concurrency-safe; updates tests and build pins.

Changes

Latest MVCC SST Download and Restoration

Layer / File(s) Summary
Import client latest-MVCC RPC methods
br/pkg/restore/internal/import_client/import_client.go
Interface and implementation of BatchDownloadLatestMVCC and CheckBatchDownloadLatestMVCCSupport RPC methods for downloading latest-version SSTs.
SnapFileImporter latest-MVCC infrastructure
br/pkg/restore/snap_client/import.go
Adds retainLatestMVCCVersion flag to options and importer; implements capability check and batchDownloadNewestVersionSST with rewrite-rule validation, per-store throttling, response truncation, and peer de-duplication.
SST download routing for latest-MVCC mode
br/pkg/restore/snap_client/import.go
Routes TiDBCompacted downloads through newest-version batch download when flag enabled; adjusts region workerpool sizing and sets ApiVersion after truncation.
LogClient SST collection refactoring
br/pkg/restore/log_client/client.go, br/pkg/restore/log_client/client_test.go
Adds CollectSSTFileSets to pre-collect compacted SST groups and KV counts; RestoreSSTFileSets restores from pre-collected sets; InitClients accepts retainLatestMVCCVersion.
Meta-KV filtering and concurrent reads
br/pkg/restore/log_client/*
Adds shouldReadMetaKVFile/countReadableMetaKVFiles, separates files by CF, and reads meta-KV entries concurrently with a bounded worker pool.
SnapClient option call-sites
br/pkg/restore/snap_client/client.go, br/pkg/restore/snap_client/export_test.go
Update NewSnapFileImporterOptions invocations to pass new boolean argument for the retain flag.

Two-Phase Stream Restore with Checkpoint Preservation

Layer / File(s) Summary
RestoreConfig flags and fields for two-phase restore
br/pkg/task/restore.go
Adds CLI flags and config fields: RetainLatestMVCCVersion, ReplicationStatusSubPrefix, RestorePhase, FromReplicationStorage, RestoreInPhase.
Flag parsing and validation
br/pkg/task/restore.go
Parses new flags, validates allowed phase values, and requires checkpoint mode when phase-based restore is active.
RunRestore phase 1 early exit & abort wiring
br/pkg/task/restore.go
Pauses the restore task and returns early for phase 1 to preserve checkpoints; RunRestoreAbort uses replication-storage parameters.
Stream restore flow changes
br/pkg/task/stream.go
Adds phase control (early exit for phase 1), checkpoint-aware SST progress accounting, switches to CollectSSTFileSets + RestoreSSTFileSets, and conditionally disables user DML restore when retain-latest-MVCC is enabled after probing.
Replication status persistence
br/pkg/task/stream.go
Adds PersistentState, status filename helpers, and a reader that extracts LastCheckpoint from replication storage.
Helpers: startTS and log-file probe
br/pkg/task/stream.go
Tightens restore start-TS selection and adds hasAnyLogFiles to detect remaining log files.

Metadata filename flags and cache concurrency

Layer / File(s) Summary
Tagged backupmeta parser
br/pkg/stream/backupmetas/parser.go
Adds NameFlagsTag, ParsedName.Flags/HasFlags, CalculateShiftTS, HasDDLFiles, and TryParseTaggedBackupMetaFileName.
Shift-TS filename-based update
br/pkg/stream/stream_metas.go
Switches UpdateShiftTS to try filename parsing first and adds wrapper/fallback to metadata-based UpdateShiftTSFromMetadata; metadata keyed by filename.
Metadata cache & ReadFile API
br/pkg/stream/stream_mgr.go
Make ContentRef/MetadataHelper concurrency-safe (per-entry mutex + cache RW lock), add rawLength parameter to ReadFile/decode, and add skipCondition to FastUnmarshalMetaData.
Metadata read tests
br/pkg/stream/*_test.go, br/pkg/restore/log_client/*_test.go
Add gated read wrappers, concurrency assertions, and tests for DDL-only metadata filtering and readable-file counting.

Utility refactors and build updates

Layer / File(s) Summary
iter chunkMapping refactor
br/pkg/utils/iter/combinator_types.go, .../combinator_test.go, BUILD.bazel
Replace errgroup-based buffer fill with producer/consumer mapping, update tests, and remove errgroup from build deps.
Dependency and Bazel patches
go.mod, DEPS.bzl, br/pkg/task/BUILD.bazel, br/pkg/restore/log_client/BUILD.bazel
Update kvproto pseudo-version, update DEPS pin, increment test shard_count, and add //br/pkg/stream/backupmetas to log_client deps; adjust NewSnapFileImporterOptions call-sites.
Task flags test
br/pkg/task/config_test.go
Add TestRestorePhaseRequiresCheckpoint verifying checkpoint requirement when restore-phase is enabled.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • pingcap/tidb#66655: Overlapping changes to tagged backupmeta filename parsing and related stream logic.
  • pingcap/tidb#67730: Related work enabling retain-latest-MVCC mode and threading it through PITR/restore clients.
  • pingcap/tidb#66990: Prior extensions to backupmeta tagged parsing and consumers.

Suggested labels

type/cherry-pick-for-release-8.5, needs-cherry-pick-release-8.5, ok-to-test

Suggested reviewers

  • YuJuncen
  • RidRisR
  • bb7133

"I tunneled through branches to fetch the newest leaf—
The SSTs I carried, trimmed to just the briefest grief.
Phase one I curtsied, left checkpoints snug and warm,
Metadata flags I nibbled through every file and form.
Hop, hop, restore, the rabbit hums a tiny charm."

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete, with blank fields for Issue Number, Problem Summary, and implementation details; only the test checklist is filled (unit/manual tests marked). Complete the Issue Number reference, provide Problem Summary and implementation details explaining the changes, and update other relevant sections in the template.
Docstring Coverage ⚠️ Warning Docstring coverage is 30.61% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title adequately references the cherry-pick source (feature restore from replication storage) but lacks specificity about what the PR implements or changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
br/pkg/restore/log_client/client.go (1)

337-338: ⚡ Quick win

Add contextual wrapping when SST collection fails.

Returning raw iterator/rewrite errors here makes restore failures harder to triage (which table/subcompaction failed). Please annotate with table context before returning.

Suggested diff
@@
-       if r.Err != nil {
-           return nil, 0, r.Err
+       if r.Err != nil {
+           return nil, 0, errors.Annotate(r.Err, "failed while iterating compacted SSTs")
        }
@@
-       newRules, err := rc.rewriteRulesFor(i, rewriteRules)
+       newRules, err := rc.rewriteRulesFor(i, rewriteRules)
        if err != nil {
-           return nil, 0, err
+           return nil, 0, errors.Annotatef(err, "failed to prepare rewrite rules for table %d", i.TableID())
        }
As per coding guidelines: Go code: Keep error handling actionable and contextual; avoid silently swallowing errors.

Also applies to: 351-353

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/restore/log_client/client.go` around lines 337 - 338, Wrap the raw
iterator/rewrite error (r.Err) at the return sites so the error includes table
and subcompaction context before returning; replace returns like "return nil, 0,
r.Err" with a wrapped error using fmt.Errorf (or errors.Wrapf) that adds
identifiers present in scope (e.g. tableID/tableName, cfName, subCompactionID or
range/start/end) such as fmt.Errorf("collect SSTs failed for table %s cf %s
subcompaction %d: %w", tableName, cfName, subCompactionID, r.Err); do the same
for the other return site(s) that currently return r.Err (lines around the
second occurrence).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@br/pkg/task/restore.go`:
- Around line 535-553: The validation and normalization of RestorePhase,
ReplicationStatusSubPrefix and their derived booleans (RestoreInPhase,
FromReplicationStorage) are currently only performed in ParseFromFlags; move
that logic into the config's Adjust() method so non-CLI code paths also enforce
the invariant and normalize values. Specifically, perform: (1) compute
RestoreInPhase = flags-equivalent check (i.e. whether RestorePhase != 0 or
ReplicationStatusSubPrefix non-empty) inside Adjust(), (2) validate RestorePhase
is 1 or 2 when RestoreInPhase is true, (3) enforce the checkpoint requirement
(UseCheckpoint) when RestoreInPhase is true, and (4) set FromReplicationStorage
based on ReplicationStatusSubPrefix presence inside Adjust(); remove or keep
only minimal flag parsing in ParseFromFlags so it only reads raw values and
defers normalization/validation to Adjust().
- Around line 1084-1093: The code currently treats a failed
restoreRegistry.PauseTask as non-fatal: it logs a warning then unconditionally
logs that phase 1 finished and returns nil; change this so that inside the
IsStreamRestore && cfg.RestorePhase == 1 branch, if cfg.RestoreID != 0 and
restoreRegistry.PauseTask(c, cfg.RestoreID) returns an error you return that
error (or wrap it) instead of continuing; only emit the success
log.Info("restore phase 1 finished...") and return nil when PauseTask succeeded
(or when cfg.RestoreID == 0), and remove the misleading success log when
PauseTask failed. Ensure the change touches the PauseTask call, the surrounding
if block (cfg.RestoreID check), and the final return path.

In `@br/pkg/task/stream.go`:
- Around line 1738-1745: The async DML-check using
hasDMLFilesResultCh/hasAnyLogFiles must be made synchronous and performed before
calling RestoreSSTFileSets when cfg.RetainLatestMVCCVersion is true so we
fail-fast on DML-bearing inputs; replace the goroutine+buffered channel pattern
that defers waiting with a direct call to hasAnyLogFiles(ctx, logFilesIter),
handle its error immediately, and if hasFiles is true return the appropriate
error (or abort) before invoking RestoreSSTFileSets (references:
hasDMLFilesResultCh, hasAnyLogFiles, RestoreSSTFileSets,
cfg.RetainLatestMVCCVersion, logFilesIter).

In `@go.mod`:
- Around line 349-350: The go.mod currently replaces github.com/pingcap/kvproto
with a fork github.com/leavrth/kvproto at v0.0.0-20260505121738-8ce467678bf2
which is a fork-only commit; for a release either remove that replace and
restore usage of github.com/pingcap/kvproto (undo the replace line) or, if the
forked changes are absolutely required, keep the replace but add a clear
justification and remediation steps: add a short comment above the replace in
go.mod referencing a tracking issue/PR ID, document the reason and
compatibility/interop implications in RELEASE_NOTES (or
SECURITY/THIRD-PARTY.md), open an upstreaming PR to pingcap/kvproto and include
a TODO in go.mod to remove the replace once upstreamed (and state the target
release version). Ensure the unique symbol to change is the replace directive
for github.com/pingcap/kvproto => github.com/leavrth/kvproto
v0.0.0-20260505121738-8ce467678bf2.

---

Nitpick comments:
In `@br/pkg/restore/log_client/client.go`:
- Around line 337-338: Wrap the raw iterator/rewrite error (r.Err) at the return
sites so the error includes table and subcompaction context before returning;
replace returns like "return nil, 0, r.Err" with a wrapped error using
fmt.Errorf (or errors.Wrapf) that adds identifiers present in scope (e.g.
tableID/tableName, cfName, subCompactionID or range/start/end) such as
fmt.Errorf("collect SSTs failed for table %s cf %s subcompaction %d: %w",
tableName, cfName, subCompactionID, r.Err); do the same for the other return
site(s) that currently return r.Err (lines around the second occurrence).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ccfab626-c6f1-47fc-b0bd-f7640f6b0a59

📥 Commits

Reviewing files that changed from the base of the PR and between 15cb1cf and e525781.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (11)
  • br/pkg/restore/internal/import_client/import_client.go
  • br/pkg/restore/log_client/client.go
  • br/pkg/restore/log_client/client_test.go
  • br/pkg/restore/snap_client/client.go
  • br/pkg/restore/snap_client/export_test.go
  • br/pkg/restore/snap_client/import.go
  • br/pkg/task/BUILD.bazel
  • br/pkg/task/config_test.go
  • br/pkg/task/restore.go
  • br/pkg/task/stream.go
  • go.mod

Comment thread br/pkg/task/restore.go
Comment thread br/pkg/task/restore.go
Comment thread br/pkg/task/stream.go
Comment thread go.mod Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

❌ Patch coverage is 45.75972% with 307 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-8.5@15cb1cf). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff                @@
##             release-8.5     #68601   +/-   ##
================================================
  Coverage               ?   55.5705%           
================================================
  Files                  ?       1823           
  Lines                  ?     656274           
  Branches               ?          0           
================================================
  Hits                   ?     364695           
  Misses                 ?     264475           
  Partials               ?      27104           
Flag Coverage Δ
integration 39.0926% <40.4593%> (?)
unit 64.9903% <31.9788%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9954% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 62.2351% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Leavrth Leavrth mentioned this pull request May 22, 2026
13 tasks
Leavrth added 2 commits May 22, 2026 22:27
Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
@Leavrth Leavrth force-pushed the cherry-pick-from-feature-restore-from-replication-storage branch from e525781 to d455458 Compare May 22, 2026 14:29
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
br/pkg/task/stream.go (1)

2083-2089: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a doc comment for GetStatusFileName.

This new exported helper is missing a Go doc comment.

As per coding guidelines, "Go code: Keep exported-symbol doc comments, and prefer semantic constraints over name restatement".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/task/stream.go` around lines 2083 - 2089, Add a Go doc comment for the
exported function GetStatusFileName that explains what the function returns and
under what conditions it can error: mention that it normalizes the provided
storage sub-directory using normalizeStorageSubDir and returns the path to the
status file "resume-state.json" within that normalized subdirectory, and that an
error is returned if normalization fails; prefer a semantic description rather
than repeating the function name.
♻️ Duplicate comments (3)
br/pkg/task/stream.go (1)

1738-1745: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Block on the DML probe before restoring SSTs.

This still applies SST data before waiting on hasDMLFilesResultCh, so --retain-latest-mvcc-version can fail only after a partial restore has already happened. The DML-file check needs to complete before RestoreSSTFileSets.

Also applies to: 1764-1785

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/task/stream.go` around lines 1738 - 1745, The DML probe launched into
hasDMLFilesResultCh must be awaited before applying SSTs: when
cfg.RetainLatestMVCCVersion is true, block on reading from hasDMLFilesResultCh
(the value produced by hasAnyLogFiles via hasLogFilesResult) and handle its
error/result before calling RestoreSSTFileSets; do the same for the other
occurrence that currently applies SST data (the second block using
hasDMLFilesResultCh around RestoreSSTFileSets) so SST restore is skipped or
aborted if the probe reports an error or indicates DML files exist.
br/pkg/task/restore.go (2)

1084-1093: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make phase-1 registry pause failure fatal.

If PauseTask fails here, the function still logs phase-1 success and returns nil, which leaves checkpoint data behind without a paused restore registration for the follow-up phase.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/task/restore.go` around lines 1084 - 1093, The current block treats
PauseTask failure as non-fatal: in IsStreamRestore + cfg.RestorePhase == 1 path,
if restoreRegistry.PauseTask(c, cfg.RestoreID) returns an error the code only
logs a warning and then logs phase-1 success and returns nil; change this so
that when restoreRegistry.PauseTask(...) fails you log the error (use log.Error
or similar) and return that error (or a wrapped error) immediately instead of
proceeding to log success and return nil; ensure the success log ("restore phase
1 finished...") only runs when PauseTask succeeds and reference the same
cfg.RestoreID and restoreRegistry.PauseTask symbols when modifying control flow.

535-552: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize phase/replication restore state in Adjust(), not only during flag parsing.

These derived fields are still set only in ParseFromFlags, so non-CLI paths that populate RestoreConfig and then call Adjust() can miss replication-storage mode or skip phase validation entirely.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/task/restore.go` around lines 535 - 552, Move the derived/validation
logic for phase and replication-storage out of ParseFromFlags into
RestoreConfig.Adjust(): in Adjust(), compute RestoreInPhase = (RestorePhase >
0), validate RestorePhase is either 1 or 2 when RestoreInPhase, enforce that
RestoreInPhase requires UseCheckpoint, and set FromReplicationStorage =
(len(ReplicationStatusSubPrefix) > 0); keep ParseFromFlags responsible only for
reading flags into fields (it can still call cfg.Adjust() at the end) so non-CLI
code paths that populate RestoreConfig and call Adjust() get consistent
normalization and validation.
🧹 Nitpick comments (1)
br/pkg/stream/stream_metas.go (1)

295-315: ⚡ Quick win

Add doc comments for the exported helpers in this hunk.

TryParseTaggedBackupMetaFileNameWrapper, UpdateShiftTS, and UpdateShiftTSFromMetadata are exported but undocumented, which makes the package surface harder to understand and breaks the Go guideline we follow.

As per coding guidelines, "Go code: Keep exported-symbol doc comments, and prefer semantic constraints over name restatement".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/stream/stream_metas.go` around lines 295 - 315, Add concise Go doc
comments for the three exported helpers:
TryParseTaggedBackupMetaFileNameWrapper, UpdateShiftTS, and
UpdateShiftTSFromMetadata. For each function, add a one- or two-sentence comment
immediately above the function declaration that explains what the function does,
its key inputs/outputs (e.g., filename and metadata for parsing; startTS and
restoreTS and the returned shifted timestamp and found flag), and any important
semantic behavior (e.g., that TryParse... strips metaSuffix and delegates to
backupmetas.TryParseTaggedBackupMetaFileName, that UpdateShiftTS prefers
filename parsing and falls back to metadata, and that UpdateShiftTSFromMetadata
computes shift from Metadata). Keep comments idiomatic (no restating the name)
and follow Go doc style (start with the function name).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@br/pkg/restore/log_client/log_file_manager.go`:
- Around line 179-199: The anonymous callback that calls
stream.TryParseTaggedBackupMetaFileNameWrapper currently returns false on parse
error, which prevents the raw-metadata fallback from running; instead, change
the error branch so that parse failures do not abort the callback (i.e., do not
return false on err) and allow the function to return true so the raw-metadata
handler runs (preserving shiftStartTS behavior); update the block around
stream.TryParseTaggedBackupMetaFileNameWrapper, the err branch, and the
surrounding logic that sets shiftTS (the closure referencing lm.startTS,
lm.restoreTS and shiftTS) to ensure untagged/older .meta files trigger the
fallback path.

In `@br/pkg/stream/stream_mgr.go`:
- Around line 438-440: The call to the callback skipCondition in stream_mgr.go
is not nil-checked and will panic if callers pass nil; update the code that
currently does if skipCondition(path) { return nil } to first guard the callback
(e.g., check skipCondition != nil) before invoking it, or initialize
skipCondition to a no-op default earlier where it's accepted, so that calls to
skipCondition(path) are safe; ensure this change references the skipCondition
variable in the function or method that contains the current check so callers
who pass nil no longer cause a panic.
- Around line 208-214: In decodeCompressedData, avoid passing rawLength directly
into make (which can panic if it's too large); validate rawLength before using
it to set slice capacity in the call to m.decoder.DecodeAll: ensure rawLength is
within a safe bound and convertible to int (e.g., check rawLength <= math.MaxInt
and <= a reasonable maximum like 1<<30), fallback to 0 or a safe default when
invalid, then convert to int and call m.decoder.DecodeAll(data, make([]byte, 0,
int(validatedRawLength))). This prevents a "cap out of range" panic while
keeping the use of m.decoder.DecodeAll consistent.

In `@br/pkg/stream/stream_misc_test.go`:
- Around line 26-39: The ReadFile implementation on gatedReadStorage currently
recurses into itself; replace the final recursive call with a call to the
underlying storage by delegating to the embedded ExternalStorage (e.g., call
s.ExternalStorage.ReadFile(ctx, name)), preserving the active counter defer and
the readGate wait behavior; ensure you reference gatedReadStorage.ReadFile, the
embedded ExternalStorage, and fields active, maxActive and readGate when making
the change.

In `@br/pkg/utils/iter/combinator_types.go`:
- Around line 48-52: The code is double-draining the m.outstanding channel
causing possible deadlock: remove or guard the second blocking receive so
outstanding is only drained once when an upstream error occurs. In TryNext (the
method calling m.inner.TryNext(ctx)), after receiving r := m.inner.TryNext(ctx)
and before sending the error result, drain outstanding exactly once; then when
you receive back from the mapper result avoid doing a second <-m.outstanding if
r.Err != nil (either skip the receive when r.Err != nil or replace it with a
non-blocking select to drain only if available). Update the logic around
m.inner.TryNext, m.cancel, m.finished and uses of <-m.outstanding so the channel
is drained in one place and the second drain is conditional/non-blocking to
prevent blocking forever.

---

Outside diff comments:
In `@br/pkg/task/stream.go`:
- Around line 2083-2089: Add a Go doc comment for the exported function
GetStatusFileName that explains what the function returns and under what
conditions it can error: mention that it normalizes the provided storage
sub-directory using normalizeStorageSubDir and returns the path to the status
file "resume-state.json" within that normalized subdirectory, and that an error
is returned if normalization fails; prefer a semantic description rather than
repeating the function name.

---

Duplicate comments:
In `@br/pkg/task/restore.go`:
- Around line 1084-1093: The current block treats PauseTask failure as
non-fatal: in IsStreamRestore + cfg.RestorePhase == 1 path, if
restoreRegistry.PauseTask(c, cfg.RestoreID) returns an error the code only logs
a warning and then logs phase-1 success and returns nil; change this so that
when restoreRegistry.PauseTask(...) fails you log the error (use log.Error or
similar) and return that error (or a wrapped error) immediately instead of
proceeding to log success and return nil; ensure the success log ("restore phase
1 finished...") only runs when PauseTask succeeds and reference the same
cfg.RestoreID and restoreRegistry.PauseTask symbols when modifying control flow.
- Around line 535-552: Move the derived/validation logic for phase and
replication-storage out of ParseFromFlags into RestoreConfig.Adjust(): in
Adjust(), compute RestoreInPhase = (RestorePhase > 0), validate RestorePhase is
either 1 or 2 when RestoreInPhase, enforce that RestoreInPhase requires
UseCheckpoint, and set FromReplicationStorage = (len(ReplicationStatusSubPrefix)
> 0); keep ParseFromFlags responsible only for reading flags into fields (it can
still call cfg.Adjust() at the end) so non-CLI code paths that populate
RestoreConfig and call Adjust() get consistent normalization and validation.

In `@br/pkg/task/stream.go`:
- Around line 1738-1745: The DML probe launched into hasDMLFilesResultCh must be
awaited before applying SSTs: when cfg.RetainLatestMVCCVersion is true, block on
reading from hasDMLFilesResultCh (the value produced by hasAnyLogFiles via
hasLogFilesResult) and handle its error/result before calling
RestoreSSTFileSets; do the same for the other occurrence that currently applies
SST data (the second block using hasDMLFilesResultCh around RestoreSSTFileSets)
so SST restore is skipped or aborted if the probe reports an error or indicates
DML files exist.

---

Nitpick comments:
In `@br/pkg/stream/stream_metas.go`:
- Around line 295-315: Add concise Go doc comments for the three exported
helpers: TryParseTaggedBackupMetaFileNameWrapper, UpdateShiftTS, and
UpdateShiftTSFromMetadata. For each function, add a one- or two-sentence comment
immediately above the function declaration that explains what the function does,
its key inputs/outputs (e.g., filename and metadata for parsing; startTS and
restoreTS and the returned shifted timestamp and found flag), and any important
semantic behavior (e.g., that TryParse... strips metaSuffix and delegates to
backupmetas.TryParseTaggedBackupMetaFileName, that UpdateShiftTS prefers
filename parsing and falls back to metadata, and that UpdateShiftTSFromMetadata
computes shift from Metadata). Keep comments idiomatic (no restating the name)
and follow Go doc style (start with the function name).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 13a391dd-c793-4c23-a55b-fb6b3f9ac059

📥 Commits

Reviewing files that changed from the base of the PR and between e525781 and d455458.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (23)
  • DEPS.bzl
  • br/pkg/restore/internal/import_client/import_client.go
  • br/pkg/restore/log_client/BUILD.bazel
  • br/pkg/restore/log_client/client.go
  • br/pkg/restore/log_client/client_test.go
  • br/pkg/restore/log_client/export_test.go
  • br/pkg/restore/log_client/log_file_manager.go
  • br/pkg/restore/log_client/log_file_manager_test.go
  • br/pkg/restore/snap_client/client.go
  • br/pkg/restore/snap_client/export_test.go
  • br/pkg/restore/snap_client/import.go
  • br/pkg/stream/backupmetas/parser.go
  • br/pkg/stream/stream_metas.go
  • br/pkg/stream/stream_metas_test.go
  • br/pkg/stream/stream_mgr.go
  • br/pkg/stream/stream_mgr_fuzz_test.go
  • br/pkg/stream/stream_misc_test.go
  • br/pkg/task/restore.go
  • br/pkg/task/stream.go
  • br/pkg/utils/iter/BUILD.bazel
  • br/pkg/utils/iter/combinator_test.go
  • br/pkg/utils/iter/combinator_types.go
  • go.mod
💤 Files with no reviewable changes (1)
  • br/pkg/utils/iter/BUILD.bazel
✅ Files skipped from review due to trivial changes (1)
  • br/pkg/restore/log_client/BUILD.bazel

Comment thread br/pkg/restore/log_client/log_file_manager.go
Comment thread br/pkg/restore/snap_client/import.go
Comment thread br/pkg/stream/stream_mgr.go Outdated
Comment thread br/pkg/stream/stream_mgr.go
Comment thread br/pkg/stream/stream_misc_test.go
Comment thread br/pkg/utils/iter/combinator_types.go
@Leavrth
Copy link
Copy Markdown
Contributor Author

Leavrth commented May 23, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 23, 2026

@Leavrth: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 23, 2026

@Leavrth: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/mysql-test 2f6522f link true /test mysql-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/cherry-pick-not-approved release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant