[Experimental] Shared-memory transport for large node IPC payloads#14054
Draft
JanProvaznik wants to merge 1 commit into
Draft
[Experimental] Shared-memory transport for large node IPC payloads#14054JanProvaznik wants to merge 1 commit into
JanProvaznik wants to merge 1 commit into
Conversation
Offloads packet bodies >=8KB from the named pipe to a per-direction memory-mapped slot (1MB) + two semaphores in out-of-proc TaskHost / worker communication. The pipe still carries every header and small/control packets inline, so framing, versioning, disconnect and reuse are unchanged. Measured on a deterministic OrchardCore.Cms.Web -mt Rebuild (434MB of large packets): main-process transport time drops from ~84-110s (synchronous, backpressure-bound PipeStream.Write at ~5 MB/s) to ~0.43-0.48s (~1 GB/s), a ~190-250x reduction on the codepath. Opt-in via MSBUILDSHAREDMEMORYIPC=1, Windows-only, #if NET. Includes MSBUILDIPCSTATS measurement scaffolding. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Experimental / exploratory — not for merge as-is. This PR adds an opt-in shared-memory fast path for large MSBuild node-IPC payloads and the instrumentation used to measure it. It targets the
-mt(multithreaded) build, where non-thread-safe tasks (notably Csc/Vbc, which are not[MSBuildMultiThreadableTask]) are routed to out-of-proc TaskHost sidecars and theirTaskHostConfiguration/result packets cross the named-pipe transport.The named pipe still carries every packet header and all small/control packets inline, so framing, version negotiation, disconnect detection, node reuse and shutdown are unchanged. Only packet bodies ≥ 8 KB are moved through a per-direction memory-mapped slot.
Problem (measured)
The out-of-proc transport sends serialized packets with a synchronous
PipeStream.Writeon a per-node drain thread (NodeProviderOutOfProcBase.DrainPacketQueue). That write is backpressure-bound: it blocks until the child drains the 128 KB kernel pipe buffer.Instrumenting the actual codepath (
MSBUILDIPCSTATS, seeIpcTransferStats) over a deterministicOrchardCore.Cms.Web -t:Rebuild -mt -nr:false— 434 MB of large (≥ 8 KB) packet bodies — the main process spends:(2 runs each; same binary, only the env var toggles the feature.) The ~5 MB/s send throughput confirms the cost is blocking on backpressure, not raw copy bandwidth.
Approach
NodeSharedMemoryChannel: one memory-mapped slot (1 MB) + two named semaphores per direction, strictly single-producer/single-consumer. Writer creates, reader opens (race-free: the writer creates its slot before the flagged header reaches the pipe; the reader only opens after observing the header). Payloads larger than the slot stream in chunks.AcquireEmptystalls). The child sends results spaced by task execution, so it publishes the body before the header, which collapses the parent's receive time from ~1.7 s to ~0.05 s.Correctness & safety
MSBUILDSHAREDMEMORYIPC=1; default behavior is byte-for-byte unchanged.MSBUILDNODEHANDSHAKESALTso flagged/unflagged nodes can never pair (defense against node reuse mixing).#if NET— the .NET Framework / CLR2 task host always uses the pipe.Translate/BinaryTranslatorround-trip; only the byte transport differs.End-to-end effect (for context)
Full
-mtRebuild wall-clock improved only ~2–6% best case (often within machine noise) because that 80–110 s of pipe-write is largely overlapped across parallel TaskHost drain threads and off the critical path. When IPC is forced to be the bottleneck (MSBUILDFORCEALLTASKSOUTOFPROC=1), shm is a consistent ~5% faster with no overlap between arms. The headline is the transport codepath number, not e2e.Limitations / open questions (why this is a draft)
-nr:false); slot lifetime/renaming across reconnects needs hardening.IpcTransferStatsis measurement scaffolding (opt-in, zero overhead when off) — would be dropped or moved behind ETW for a real change.Try it
Add
MSBUILDIPCSTATS=1(and optionallyMSBUILDIPCSTATSFILE=<path>) to dump the per-mechanism transfer timings at process exit.