[WIP] Add AS-aware relay selection#1514
Conversation
Coverage Report for CI Build 26363362204Coverage increased (+0.2%) to 85.299%Details
Uncovered Changes
Coverage Regressions23 previously-covered lines in 3 files lost coverage.
Coverage Stats
💛 - Coveralls |
5a6c956 to
d927140
Compare
6f1528c to
4547e8e
Compare
|
cc @nothingmuch to approach Ack/Nack |
Replace the singular v2 directory setting with a trusted directory list while preserving the CLI override behavior.
Load ASMap settings behind an optional feature and reject ASMap configuration when the binary was built without support.
Choose relay order from the receiver key, request kind, time window and relay bucket
Route key fetching and protocol requests through selected relay candidates with failover for retryable network errors.
14b46f8 to
364ce27
Compare
Add ASMap/OHTTP e2e observation test that runs the payjoin-cli v2 flow against public directories and relays. The test resolves each endpoint, maps IPs through the ASMap file, and writes a Markdown summary showing selected directories, relay eligibility, and observed ASNs. Also log OHTTP relay attempts so CI output shows which relays were used during the live flow.
nothingmuch
left a comment
There was a problem hiding this comment.
approach ack
not 100% sure about relaymanager getting removed it may still have a job (keeping track of failures, arguably important enough to be in scope for reference client)
relay selection looks good, but i didn't review carefully since only seeking approach ack
| Post(MessageKind), | ||
| Poll(MessageKind), |
There was a problem hiding this comment.
is message kind for diagnostic purposes?
|
|
||
| pub(crate) const WINDOW_SECS: u64 = 30; | ||
| const CLOCK_SKEW_WINDOWS: i64 = 2; | ||
| const POST_RESERVED_COUNT: usize = 3; |
There was a problem hiding this comment.
at time t_i, avoid making any real GET requests (i.e. dummy GET requests are OK) using the relays
whose hash is lexicographically smallest not just at time t_i, but at times t_{i-2}..t_{i+2} or at least
t_{i-1}..t_{i+1}
they should always use a different relay for the post and the get, avoiding the worst case leak
k=3 and skew=2 are fixed at the upper end of #919 (comment)
with 3-5 public OHTTP relays, the reservation budget (up to 15 buckets) can swallow the whole pool, so POLL falls back (286-289) to POST-reserved buckets, silently violating the spec when poll_candidates is empty (small pool)
picking k and skew based on pool size (k=2/skew=1 when small) would reduce reserved bucket to 6 and avoid the fallback for B ≥ 7, while keeping max protection when B ≥ 16. The remaining edge case (B ≤ 6) would still hit the warn!, possibly worth surfacing to the caller, maybe with a typed error
makes sense?
There was a problem hiding this comment.
Yeah, that makes sense. Fixed k=3 / skew=2 is good for larger pools but too aggressive for small relay sets since it can reserve every bucket.
| /// | ||
| /// The ordering is recomputed from the receiver key and current time, so no | ||
| /// relay-selection progress needs to be stored between requests. | ||
| pub(crate) fn select_relays_for_request( |
There was a problem hiding this comment.
would be good to add tests for the deterministic selection
selection_hash stability (interop), deterministic ordering per (pubkey, RequestKind, TimeWindow), POLL/POST exclusion, and round-robin fairness
There was a problem hiding this comment.
Yeah was trying to keep PR small and only keep essential so it can be easier to review and get approach ack with the smaller PR's test should be added
| bail!("No valid relays available for the selected directory {}", directory.url.as_str()); | ||
| } | ||
|
|
||
| relays.shuffle(&mut thread_rng()); |
There was a problem hiding this comment.
btw, relays.shuffle seems inert as ordered_candidates reorders everything afterwards
leftover of the old shuffle?
There was a problem hiding this comment.
Yep, good catch. This looks like leftover behavior from the old random relay ordering. Since request-time selection now goes through ordered_candidates, the initial relays.shuffle does not affect POST/POLL ordering. I’ll remove it.
|
|
||
| [dependencies] | ||
| anyhow = "1.0.99" | ||
| asmap = { version = "0.1.0", optional = true } |
There was a problem hiding this comment.
would be good to know what the team think about adding this dep to parse the binary, especially if we are moving to the payjoin crate
is it a good idea or would it be better to have our own parser? #1452 (comment)
There was a problem hiding this comment.
I did check out the library also saw a contibutor to asmap bitcoin core library give it a star but more eyes on the library would be nice
| pub struct AsmapConfig { | ||
| #[serde(rename = "asmap_file")] | ||
| pub asmap: LoadedAsmap, |
There was a problem hiding this comment.
a configured, per-integration asmap file is a client fingerprint two ways: (1) only some integrations load one, so the AS-aware ones stand out (uses-it-or-not), and (2) selection is deterministic from the asmap, so even those that do load one diverge unless they're on the exact same snapshot.
could the lib ship one bundled snapshot per version instead, so every integration on a given version has the same file with nothing to fetch?
it's ~1.5 MB, and asmap-data updates ~monthly (irregular, gaps up to ~2 months), but relay ASNs are stable so across versions keeping everyone consistent
https://github.com/bitcoin-core/asmap-data
wdyt?
There was a problem hiding this comment.
No answer for this yet , this might require more discussion will get back to you
| #[serde(default)] | ||
| pub user_public_ips: Vec<IpAddr>, | ||
| #[serde(default)] | ||
| pub user_asns: Vec<u32>, |
There was a problem hiding this comment.
Given a list of trusted directories and relays, these lists should be filtered to exclude servers that share an AS with the user. This is potentially tricky as it users to be able to determine their public IP(s).
requiring the user's own ASN/IP makes AS-aware mode operator-only, a NAT'd wallet can't supply it without an external lookup, which is a privacy leak of its own
#919 lists four AS-overlap risks:
sender∩receiver→ inherent: relay selection can't fix it (it's the two endpoints themselves). VPN/Tor territory. out of scope.
of the three that selection can address, two don't need your own AS:
relay∩directory→ exclude relays in the directory's AS. Needs only the directory + relay ASes.relay∩relay→ the windowed ordering derived from the receiver key. Needs only the relays' ASes.
both use IPs you resolve via DNS anyway to connect, like Bitcoin Core's asmap, which diversifies its peers by AS (anti-eclipse) and never learns its own ASN; you bucket the servers, not yourself.
- only user-AS exclusion
user∩directoryneeds your own AS, and thats the one that could be dropped to the network layer (VPN/Tor) , not the selector:- a directory's AS (datacenter) vs a user's AS (residential/mobile) almost never coincide.
- the sender uses the directory that came in the URI, so it can't pick a different one to escape its own AS. it can't be addressed in selection.
so, if we drop this and keep just:
- directory-AS exclusion
relay∩directory - windowed ordering
relay∩relay
that way both use only server ASes, zero input, so AS-aware selection can be on by default for everyone, which removes the uses-it-or-not fingerprint, default and zero-input, instead of operator-only.
makes sense?
| pub(crate) struct SystemNetwork { | ||
| #[cfg(feature = "asmap")] | ||
| asmap: Option<LoadedAsmap>, |
There was a problem hiding this comment.
with the asmap bundled and the user-AS input dropped, could this Option and the random fallback go, so AS-aware is the default instead of an opt-in?
This PR is a proof of concept for AS-aware, stateless relay selection in payjoin-cli and is meant to move the discussion in #919 forward. Reviewers will probably want to read #919 for the full context behind the design and tradeoffs.
The implementation adds optional ASMap-based filtering for trusted directories and OHTTP relays then demonstrates relay selection that separates POST and POLL traffic without storing a selected relay or relay index. Relay ordering is derived from the receiver key, request/message type and short time windows with polling avoiding POST-reserved AS buckets for nearby windows.
I also added comments around parts of the implementation that may be candidates for moving into the
payjoincrate or possibly into a reusable external crate. These comments are suggestions not final API proposals and reviewer feedback would be very welcome. One reason for calling these pieces out is the concern raised in #919: asking every downstream wallet to reimplement this behavior could create extra work and consistency problems.Disclosure: co-authored by Codex
Pull Request Checklist
Please confirm the following before requesting review:
AI
in the body of this PR.