Skip to content

[Experimental] Shared-memory transport for large node IPC payloads#14054

Draft
JanProvaznik wants to merge 1 commit into
mainfrom
dev/janprovaznik/shared-memory-ipc
Draft

[Experimental] Shared-memory transport for large node IPC payloads#14054
JanProvaznik wants to merge 1 commit into
mainfrom
dev/janprovaznik/shared-memory-ipc

Conversation

@JanProvaznik

Copy link
Copy Markdown
Member

Summary

Experimental / exploratory — not for merge as-is. This PR adds an opt-in shared-memory fast path for large MSBuild node-IPC payloads and the instrumentation used to measure it. It targets the -mt (multithreaded) build, where non-thread-safe tasks (notably Csc/Vbc, which are not [MSBuildMultiThreadableTask]) are routed to out-of-proc TaskHost sidecars and their TaskHostConfiguration/result packets cross the named-pipe transport.

The named pipe still carries every packet header and all small/control packets inline, so framing, version negotiation, disconnect detection, node reuse and shutdown are unchanged. Only packet bodies ≥ 8 KB are moved through a per-direction memory-mapped slot.

Problem (measured)

The out-of-proc transport sends serialized packets with a synchronous PipeStream.Write on a per-node drain thread (NodeProviderOutOfProcBase.DrainPacketQueue). That write is backpressure-bound: it blocks until the child drains the 128 KB kernel pipe buffer.

Instrumenting the actual codepath (MSBUILDIPCSTATS, see IpcTransferStats) over a deterministic OrchardCore.Cms.Web -t:Rebuild -mt -nr:false434 MB of large (≥ 8 KB) packet bodies — the main process spends:

codepath (main process, large packets) named pipe (baseline) shared memory speedup
send (~5,000 pkts / 434 MB) 83.4 / 109.4 s (3.8–5.0 MB/s) 0.38 / 0.40 s (~1.1 GB/s) ~210–290×
receive (1,664 pkts / 120 MB) 0.32 / 0.34 s 0.054 / 0.078 s ~4–6×
total transport time 83.7 / 109.8 s 0.43 / 0.48 s ~190–250×

(2 runs each; same binary, only the env var toggles the feature.) The ~5 MB/s send throughput confirms the cost is blocking on backpressure, not raw copy bandwidth.

Approach

  • A spare high bit of the 32-bit packet-length header field flags "body delivered via shared memory" (independent of the existing type-byte extended-header bit).
  • NodeSharedMemoryChannel: one memory-mapped slot (1 MB) + two named semaphores per direction, strictly single-producer/single-consumer. Writer creates, reader opens (race-free: the writer creates its slot before the flagged header reaches the pipe; the reader only opens after observing the header). Payloads larger than the slot stream in chunks.
  • Ordering is direction-specific (measured): the parent sends configs back-to-back and must write the header first (body-first was ~25× slower — 12–14 s — from AcquireEmpty stalls). The child sends results spaced by task execution, so it publishes the body before the header, which collapses the parent's receive time from ~1.7 s to ~0.05 s.

Correctness & safety

  • Opt-in via MSBUILDSHAREDMEMORYIPC=1; default behavior is byte-for-byte unchanged.
  • The env var is inherited by launched nodes, so both endpoints agree. The measurement runs also set a distinct MSBUILDNODEHANDSHAKESALT so flagged/unflagged nodes can never pair (defense against node reuse mixing).
  • Windows-only at runtime (named MMF/semaphores are cross-process there) and #if NET — the .NET Framework / CLR2 task host always uses the pipe.
  • No serialization change: both paths still do the full Translate/BinaryTranslator round-trip; only the byte transport differs.
  • Validated: a standalone two-role protocol test (round-trips up to 5 MB, multi-chunk, byte-for-byte); every instrumented OrchardCore build exits 0; tracing confirms both-side engagement (~16 MB offloaded per project across all TaskHost children).

End-to-end effect (for context)

Full -mt Rebuild wall-clock improved only ~2–6% best case (often within machine noise) because that 80–110 s of pipe-write is largely overlapped across parallel TaskHost drain threads and off the critical path. When IPC is forced to be the bottleneck (MSBUILDFORCEALLTASKSOUTOFPROC=1), shm is a consistent ~5% faster with no overlap between arms. The headline is the transport codepath number, not e2e.

Limitations / open questions (why this is a draft)

  • Cross-platform: needs a Unix story (named semaphores/MMF differ); currently Windows-gated.
  • Engagement should be negotiated in the handshake, not via an env-var + salt trick, before this could ship.
  • Node reuse across builds is only lightly exercised here (measurements use -nr:false); slot lifetime/renaming across reconnects needs hardening.
  • Threshold (8 KB) and slot size (1 MB) are untuned.
  • IpcTransferStats is measurement scaffolding (opt-in, zero overhead when off) — would be dropped or moved behind ETW for a real change.
  • A zero-copy read (deserialize directly from the MMF view) was evaluated and rejected: the reader-side copy already runs at ~1.5–2.1 GB/s, so it would save ~30–60 ms/build for real correctness risk on the IPC path.

Try it

set MSBUILDSHAREDMEMORYIPC=1
set MSBUILDNODEHANDSHAKESALT=shmopt
dotnet msbuild <proj> -t:Rebuild -mt

Add MSBUILDIPCSTATS=1 (and optionally MSBUILDIPCSTATSFILE=<path>) to dump the per-mechanism transfer timings at process exit.

Offloads packet bodies >=8KB from the named pipe to a per-direction
memory-mapped slot (1MB) + two semaphores in out-of-proc TaskHost / worker
communication. The pipe still carries every header and small/control packets
inline, so framing, versioning, disconnect and reuse are unchanged.

Measured on a deterministic OrchardCore.Cms.Web -mt Rebuild (434MB of large
packets): main-process transport time drops from ~84-110s (synchronous,
backpressure-bound PipeStream.Write at ~5 MB/s) to ~0.43-0.48s (~1 GB/s),
a ~190-250x reduction on the codepath.

Opt-in via MSBUILDSHAREDMEMORYIPC=1, Windows-only, #if NET. Includes
MSBUILDIPCSTATS measurement scaffolding.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant