Java GC automatically reclaims memory from unreachable objects by tracing from GC roots
The heap is divided into young (Eden + Survivor) and old generations — most objects die young
G1 is the default collector, using region-based evacuation with tunable pause targets
ZGC and Shenandoah provide sub-10ms pauses at the cost of higher CPU and native memory overhead
Biggest production mistake: treating GC as a black box without enabling GC logging or measuring allocation rate
✦ Definition~90s read
What is Garbage Collection in Java?
Garbage Collection in Java is the JVM's automatic memory management mechanism. The GC periodically identifies objects that are no longer reachable from any GC root (thread stacks, static fields, JNI references) and reclaims their heap memory. This eliminates manual memory management but introduces pauses and CPU overhead that must be managed in production.
★
Imagine you're at a big party and everyone keeps leaving empty cups on tables.
The JVM determines object reachability through a reachability analysis starting from GC roots. An object is considered dead — and eligible for collection — when no chain of references from any root can reach it. This is fundamentally different from reference counting (used in early Python/PHP) which cannot handle cyclic references.
Java's tracing GC handles cycles naturally because it only cares about reachability, not reference count.
The key production insight: GC does not run when memory is low. GC runs when allocation pressure triggers it. This means a service with a large heap but low allocation rate may run GC infrequently, while a service with a small heap and high allocation rate may run GC constantly. Allocation rate, not heap size, is the primary driver of GC frequency.
Plain-English First
Imagine you're at a big party and everyone keeps leaving empty cups on tables. You hired a cleaner (the Garbage Collector) whose only job is to walk around, spot cups nobody is holding anymore, and throw them away so there's room for fresh drinks. The cleaner doesn't interrupt the party every second — they work in bursts, and sometimes they have to pause everything to do a deep clean. That pause is what Java developers are always trying to shrink. Java's GC is exactly that cleaner: it automatically finds objects your program no longer references and reclaims their memory so you never have to call free() yourself.
Every Java application runs a second program inside the JVM — the Garbage Collector. It decides when memory gets freed, how long your threads pause, and whether your latency SLAs hold up under load. Most developers treat it like a black box and then wonder why their microservice spikes to 500ms every few seconds in production.
Before automatic memory management, C and C++ developers had to manually allocate and free every byte. Java solved this with a managed heap and a runtime that tracks object reachability — if nothing in your program can reach an object, its memory can be reclaimed. That single idea eliminated an entire class of bugs but introduced a new challenge: the collector itself consumes CPU and introduces pauses.
The core misconception: GC pauses are inevitable and unfixable. They are not. Modern collectors offer pause-time guarantees independent of heap size — but only if you understand the trade-offs and tune correctly for your workload.
What is Garbage Collection in Java?
Garbage Collection in Java is the JVM's automatic memory management mechanism. The GC periodically identifies objects that are no longer reachable from any GC root (thread stacks, static fields, JNI references) and reclaims their heap memory. This eliminates manual memory management but introduces pauses and CPU overhead that must be managed in production.
The JVM determines object reachability through a reachability analysis starting from GC roots. An object is considered dead — and eligible for collection — when no chain of references from any root can reach it. This is fundamentally different from reference counting (used in early Python/PHP) which cannot handle cyclic references. Java's tracing GC handles cycles naturally because it only cares about reachability, not reference count.
The key production insight: GC does not run when memory is low. GC runs when allocation pressure triggers it. This means a service with a large heap but low allocation rate may run GC infrequently, while a service with a small heap and high allocation rate may run GC constantly. Allocation rate, not heap size, is the primary driver of GC frequency.
io/thecodeforge/gc/ReachabilityDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
package io.thecodeforge.gc;
import java.util.ArrayList;
import java.util.List;
/**
* Demonstrates how JavaGC determines object reachability.
*
* Key concept: An object is reachable if any GC root can access it
* through a chain of references. When the chain breaks, the object
* becomes eligible for collection.
*/
publicclassReachabilityDemo {
publicstaticvoidmain(String[] args) {
// Object created on the heap — referenced by local variable 'order'// 'order' is a GC root (stack reference)Order order = newOrder("ORD-001", 149.99);
System.out.println("Order created: " + order.getId());
// After this reassignment, the original Order object has no// reachable references. It becomes eligible for GC.
order = newOrder("ORD-002", 299.99);
// The first Order("ORD-001") is now unreachable — GC will reclaim it// Demonstrating cyclic references — GC handles this correctlyOrderNode nodeA = newOrderNode("A");
OrderNode nodeB = newOrderNode("B");
nodeA.next = nodeB;
nodeB.next = nodeA; // cycle: A -> B -> A// Even though A and B reference each other, if we null out// our stack references, both become unreachable and are collected
nodeA = null;
nodeB = null;
// The cycle A -> B -> A is still intact in memory, but no GC root// can reach either node. Both are eligible for collection.
}
staticclassOrder {
privatefinalString id;
privatefinaldouble amount;
Order(String id, double amount) {
this.id = id;
this.amount = amount;
}
StringgetId() { return id; }
}
staticclassOrderNode {
finalString name;
OrderNode next;
OrderNode(String name) {
this.name = name;
}
}
}
Output
Order created: ORD-001
GC Roots — What Counts as a Root
Local variables on thread stacks — every active method frame holds references to objects it is using
Static fields of loaded classes — ClassLoader roots keep static objects alive for the lifetime of the class
JNI references — native code can hold references that the JVM must respect
Active monitors — objects currently locked by a thread are temporarily rooted during GC
Production Insight
The most common cause of memory leaks in production Java services is unintentional GC root retention. A static Map that accumulates entries, a ThreadLocal that is never cleaned, or a listener that is never deregistered creates a chain of references from a root that the GC cannot break. Use heap dumps (jmap -dump:live,format=b,file=heap.hprof) and analyze with Eclipse MAT to find dominator trees — the objects keeping the most memory alive through root chains.
Key Takeaway
GC reclaims memory from objects that no GC root can reach. Cyclic references are handled correctly by tracing GC. The #1 production memory leak pattern is objects retained through static fields, ThreadLocals, or unremoved listeners — not missing free() calls.
thecodeforge.io
Java GC: Unbounded Cache Full GC Spiral
Garbage Collection Java
The Generational Heap — Why Most Objects Die Young
The JVM heap is divided into generations based on the weak generational hypothesis: most objects die young, and objects that survive one collection are likely to survive many more. This observation drives the generational heap design that every modern JVM collector uses.
The young generation consists of eden space (where new objects are allocated) and two survivor spaces (S0 and S1). New objects are allocated in eden. When eden fills up, a minor GC (young collection) runs: live objects in eden are copied to one survivor space, and live objects in the other survivor space are also copied and aged. Objects that survive enough young collections (controlled by -XX:MaxTenuringThreshold) are promoted to the old generation.
The old generation holds long-lived objects. When the old generation fills up or a collection threshold is reached, a major GC runs. In G1, this is a mixed GC that collects both young and old regions. In extreme cases, a full GC (stop-the-world compaction of the entire heap) is triggered — this is the catastrophic failure mode you must avoid.
The critical production insight: the tenuring threshold determines how quickly objects move to old generation. Too low, and short-lived objects pollute old generation, increasing old gen GC frequency. Too high, and survivor spaces overflow, forcing premature promotion. Both paths degrade performance.
package io.thecodeforge.gc;
import java.util.ArrayList;
import java.util.List;
/**
* Demonstrates how allocation patterns interact with the generational heap.
*
* Objects that survive young collections are promoted to old generation.
* Understandingthis promotion mechanism is critical for tuning.
*/
publicclassGenerationalBehaviorDemo {
/**
* Pattern1: Short-lived objects — ideal for generational GC.
* These objects die in eden and never reach old generation.
* GC can reclaim them with a fast young collection.
*/
publicvoidprocessRequest() {
// These objects are created, used, and become unreachable// within a single method call. They die in eden.String requestId = java.util.UUID.randomUUID().toString();
byte[] payload = newbyte[4096];
List<String> validationErrors = newArrayList<>();
// After this method returns, all three objects become unreachable// because they are only referenced by local variables (stack roots).
}
/**
* Pattern2: Long-lived cached objects — promoted to old gen.
* These objects survive young collections and get promoted.
* They occupy old generation permanently (or until eviction).
*
* Production risk: Ifthis cache grows unbounded, old generation
* fills up and triggers full GC or OOM.
*/
privatefinalList<byte[]> longLivedCache = newArrayList<>();
publicvoidcacheData(byte[] data) {
// This reference keeps the byte array alive indefinitely.// After surviving MaxTenuringThreshold young collections,// it is promoted to old generation.
longLivedCache.add(data);
}
/**
* Pattern3: Premature promotion — objects that should die young
* but get promoted because survivor space is full.
*
* If allocation rate exceeds survivor space capacity, objects
* are promoted directly to old generation even if they are short-lived.
* This is called premature promotion and it pollutes old generation.
*
* Fix: Increase survivor space ratio (-XX:SurvivorRatio)
* or reduce allocation rate.
*/
publicvoidburstAllocation() {
// If this loop runs fast enough to fill eden AND overflow// survivor space, these temporary objects get promoted to// old generation even though they die after each iteration.for (int i = 0; i < 100_000; i++) {
byte[] temp = newbyte[256];
// temp is short-lived, but under pressure it may be// prematurely promoted to old generation
}
}
/**
* Production tuning flags for generational behavior:
*
* -XX:NewRatio=2// old:young = 2:1 (default for most collectors)
* -XX:SurvivorRatio=8// eden:survivor = 8:1 (default)
* -XX:MaxTenuringThreshold=15// objects survive 15 young GCs before promotion
* -XX:+AlwaysTenure// promote immediately (dangerous — avoid)
* -XX:+NeverTenure// never promote (survivor overflow → old gen)
*
* Monitor promotion rate with:
* jstat -gcutil <pid> 1000
* Watch'O'column (old gen utilization) for steady growth.
* Steady growth with low live data = premature promotion.
*/
}
The Weak Generational Hypothesis — The Foundation of All Modern GC
If 90% of objects die in eden, collecting eden reclaims 90% of garbage with minimal work
Young collection only scans eden + survivor spaces — not the entire heap. This is fast.
Old generation collection is expensive because it must handle long-lived object graphs
The hypothesis fails for workloads with uniform object lifetimes — batch processing, data pipelines
When the hypothesis fails, you see high promotion rates and frequent old gen collections
Production Insight
Monitor promotion rate as a leading indicator of GC health. Use jstat -gcutil and watch the bytes promoted from young to old generation per GC cycle. A healthy service promotes < 5% of young gen per cycle. If promotion rate exceeds 20%, your objects are living too long in young gen — either increase -XX:MaxTenuringThreshold, increase survivor space (-XX:SurvivorRatio=6), or investigate why short-lived objects are escaping young gen (common cause: objects stored in thread-local caches or request-scoped maps that persist across requests).
Key Takeaway
The generational heap exploits the statistical fact that most objects die young. Young collection is fast because it only scans eden + survivor. Old generation is expensive to collect. Premature promotion — short-lived objects reaching old gen — is a silent performance killer. Monitor promotion rate with jstat -gcutil.
GC Algorithms — Mark-Sweep, Copying, and Compaction
All GC algorithms are built on three fundamental operations: marking (identifying live objects), sweeping (reclaiming dead objects' memory), and compacting (defragmenting live objects to create contiguous free space). Different collectors combine these operations differently to optimize for pause time, throughput, or memory efficiency.
Mark-and-sweep identifies live objects (mark phase) then reclaims unmarked memory (sweep phase). The problem: it creates fragmentation. After many allocation-deallocation cycles, free memory is scattered in small chunks. Large object allocations may fail even when total free memory is sufficient — this is external fragmentation.
Copying collectors solve fragmentation by copying live objects to a fresh region and discarding the old region entirely. This is inherently compacting — live objects end up contiguous. The cost: copying live objects takes time proportional to the live data set, and you need double the memory (from-space and to-space). The generational heap reduces this cost by only copying in young generation.
Mark-and-compact identifies live objects then slides them to one end of the heap, creating one contiguous free region. This avoids the double-memory cost of copying but requires updating every reference to moved objects — a potentially expensive operation that must be done during a stop-the-world pause or with complex concurrent mechanisms.
io/thecodeforge/gc/GCAlgorithmDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
package io.thecodeforge.gc;
import java.util.ArrayList;
import java.util.List;
/**
* Demonstrates how different GC algorithm characteristics
* affect production behavior.
*
* This is not a GC implementation — it illustrates the concepts
* that drive real collector design decisions.
*/
publicclassGCAlgorithmDemo {
/**
* MARK-AND-SWEEP characteristic:
* - Fast reclaim but creates fragmentation
* - External fragmentation: total free > requested, but not contiguous
*
* Production impact: After hours of operation, allocation of large
* objects fails even though 40% of heap is free — it's fragmented.
* This triggers unnecessary full GC or OOM.
*/
publicvoiddemonstrateFragmentation() {
// Imagine this array is the heap, each index is a memory block// true = occupied, false = freeboolean[] heap = newboolean[100];
// Simulate allocation pattern: allocate and free alternating blocksfor (int i = 0; i < 100; i++) {
heap[i] = true; // allocate
}
for (int i = 0; i < 100; i += 2) {
heap[i] = false; // free every other block
}
// Result: 50% free, but no contiguous block of size > 1// A request for 3 contiguous blocks fails despite 50 free blocks// This is external fragmentation — the problem compaction solves
}
/**
* COPYINGCOLLECTOR characteristic:
* - Copies live objects to to-space, discards from-space
* - Inherently compacting — no fragmentation
* - Cost: proportional to live data, not dead data
* - Requiresdouble the memory (from + to spaces)
*
* Production insight: Copying cost is why large live data sets
* cause longer young collection pauses. If your service has
* 2GB of live objects in young gen, copying takes measurable time.
*/
publicvoiddemonstrateCopyingCost() {
// Simulating live data that must be copied during young GCList<byte[]> liveObjects = newArrayList<>();
for (int i = 0; i < 10_000; i++) {
liveObjects.add(new byte[1024]); // 1KB each = ~10MB live data
}
// During young GC, all 10MB must be copied to survivor space.// If only 1MB were live, the cost would be 10x lower.// This is why reducing live data in young gen reduces pause time.//// Real production fix: avoid holding references to temporary// objects across request boundaries. Let them die in eden.
}
}
The Three Fundamental GC Operations
Serial GC: mark-sweep-compact, all stop-the-world. Simple but pauses grow with heap.
Parallel GC: same algorithm as Serial but uses multiple threads. Faster but same pause characteristics.
G1: mark + concurrent sweep via region evacuation. Compaction happens per-region, not whole-heap.
ZGC: concurrent mark + concurrent compact via colored pointers. All phases concurrent except initial/final mark.
Shenandoah: concurrent mark + concurrent compact via Brooks pointers. Similar to ZGC with different implementation.
Production Insight
Fragmentation is the silent killer of long-running services. After days of operation, a heap with 40% free memory may fail to allocate a 10MB object because no contiguous 10MB block exists. This triggers a full GC to compact the heap. Monitor fragmentation with jcmd <pid> GC.heap_info and look at free region distribution. G1 handles fragmentation well through region-based evacuation. If you see increasing full GC frequency over time without increasing live data, fragmentation is the cause.
Key Takeaway
All GC algorithms are built on mark, sweep, and compact. Fragmentation is the primary failure mode of mark-and-sweep. Copying collectors solve fragmentation but cost proportional to live data. Modern collectors (G1, ZGC, Shenandoah) do as much work concurrently as possible to minimize stop-the-world pauses.
G1 GC — The Default Workhorse
G1 (Garbage-First) has been the default JVM collector since Java 9. It divides the heap into equal-sized regions (1MB to 32MB) and prioritizes collecting regions with the most garbage — hence 'garbage-first'. G1 maintains a remembered set per region tracking incoming references, enabling independent region collection without scanning the entire heap.
G1 operates in young-only and mixed collection cycles. Young GC collects survivor and eden regions. When the heap occupancy exceeds the Initiating Heap Occupancy Percent (IHOP), G1 triggers a concurrent marking cycle. After marking completes, subsequent mixed GCs collect both young and old regions identified as mostly garbage.
The critical production insight: G1's pause time is primarily driven by the number of regions it must collect in a single pause, not heap size. A 64GB heap with aggressive evacuation can pause longer than a 4GB heap with conservative settings. This is the opposite of what most engineers assume.
io/thecodeforge/gc/G1TuningExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
package io.thecodeforge.gc;
import java.util.concurrent.ConcurrentHashMap;
import java.util.Map;
/**
* Demonstrates allocation patterns that stress G1 differently.
*
* Key insight: G1 humongous objects (>50% region size) bypass normal
* allocation and can trigger to-space exhausted failures.
*/
publicclassG1TuningExample {
// Cache with large value objects — common source of humongous allocationsprivatefinalMap<String, byte[]> payloadCache = newConcurrentHashMap<>();
/**
* BAD: Allocates objects that may exceed humongous threshold.
* Withdefault 1MB region size, objects > 512KB are humongous.
* With 32MB regions, threshold is 16MB — much safer for large payloads.
*
* Tuning: -XX:G1HeapRegionSize=32M
* -XX:G1ReservePercent=15
* -XX:InitiatingHeapOccupancyPercent=35
*/
publicvoidcacheLargePayload(String key, int sizeBytes) {
byte[] payload = newbyte[sizeBytes];
for (int i = 0; i < Math.min(sizeBytes, 1024); i++) {
payload[i] = (byte) (i & 0xFF);
}
payloadCache.put(key, payload);
}
/**
* BETTER: Chunk large payloads to stay below humongous threshold.
* Each chunk is independently collectible as a regular object.
*/
publicvoidcacheChunkedPayload(String key, byte[] fullPayload) {
int chunkSize = 256 * 1024; // 256KB chunksint numChunks = (fullPayload.length + chunkSize - 1) / chunkSize;
for (int i = 0; i < numChunks; i++) {
int offset = i * chunkSize;
int length = Math.min(chunkSize, fullPayload.length - offset);
byte[] chunk = newbyte[length];
System.arraycopy(fullPayload, offset, chunk, 0, length);
payloadCache.put(key + ":chunk:" + i, chunk);
}
}
/**
* ProductionG1 flags for a 16GB heap with mixed allocation profile:
*
* -XX:+UseG1GC
* -Xms16g -Xmx16g
* -XX:G1HeapRegionSize=16m
* -XX:MaxGCPauseMillis=200
* -XX:G1ReservePercent=15
* -XX:InitiatingHeapOccupancyPercent=35
* -XX:G1MixedGCCountTarget=8
* -XX:G1MixedGCLiveThresholdPercent=85
* -Xlog:gc*,gc+humongous=debug:file=/var/log/gc.log:time,uptime,level,tags
*/
}
G1's Core Mental Model: Region-Based Evacuation
Pause time scales with live data in collected regions, not total heap size
Humongous objects break this model — they span multiple regions and cannot be partially evacuated
Remembered sets consume 5-10% of heap as off-heap overhead — budget for this when setting -Xmx
To-space exhausted means G1 literally ran out of regions to evacuate into — this is a full GC fallback
Production Insight
G1's -XX:MaxGCPauseMillis is a soft target, not a hard guarantee. G1 will attempt to meet this by adjusting how many regions to collect per cycle, but allocation rate spikes can violate it. If you need hard latency guarantees, G1 is the wrong collector. Monitor actual pause times against your SLA — if G1 violates MaxGCPauseMillis more than 5% of the time, the workload demands ZGC or Shenandoah.
Key Takeaway
G1 is the right default for most workloads, but it has a hard ceiling on pause-time predictability. Once your latency budget drops below ~100ms p99, evaluate ZGC or Shenandoah. Never tune G1 without GC logs enabled — the default logging is insufficient for production diagnosis.
G1 Tuning Decision Tree
IfHumongous allocations appearing in GC logs
→
UseIncrease -XX:G1HeapRegionSize to reduce humongous threshold. Max region size is 32MB. Chunk large objects at the application level if possible.
IfMixed GCs are too frequent, causing throughput loss
→
UseIncrease -XX:G1MixedGCCountTarget (default 8) to spread collection over more cycles. Adjust -XX:G1MixedGCLiveThresholdPercent to collect only regions with more garbage.
IfFull GC appearing despite adequate heap
→
UseIHOP is miscalibrated. Set -XX:InitiatingHeapOccupancyPercent lower (try 35) or enable -XX:+G1UseAdaptiveIHOP (Java 10+) to let G1 self-tune.
IfPause times exceed MaxGCPauseMillis consistently
→
UseLive data set is too large for G1's evacuation budget. Either reduce live data (caching strategy) or migrate to ZGC/Shenandoah where pause times are independent of live data size.
ZGC — Sub-Millisecond Pause Collector
ZGC (Z Garbage Collector) was introduced as experimental in JDK 11 and became production-ready in JDK 15. Its defining characteristic: pause times stay below 10ms regardless of heap size — tested up to 16TB heaps. ZGC achieves this through concurrent everything: marking, relocation, and reference processing all happen while application threads run.
ZGC uses load barriers with colored pointers. Every object reference carries metadata bits (marked0, marked1, remap, finalize) embedded in the pointer itself. The load barrier intercepts every object access to check if the reference needs remapping. This is the fundamental trade-off: ZGC replaces long GC pauses with per-access overhead on every object load.
As of JDK 21, ZGC supports generational mode (-XX:+ZGenerational) which dramatically improves throughput by focusing collection on young objects. Non-generational ZGC collects the entire heap every cycle, which limits throughput on allocation-heavy workloads.
io/thecodeforge/gc/ZGCTuningExample.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package io.thecodeforge.gc;
import java.util.concurrent.atomic.AtomicLong;
/**
* ZGC-specific considerations for production workloads.
*
* ZGC trades per-access overhead for near-zero pause times.
* The load barrier adds ~4-8% overhead on pointer-heavy workloads.
*/
publicclassZGCTuningExample {
privatefinalAtomicLong allocationCounter = newAtomicLong(0);
/**
* ProductionZGC flags for a 32GB heap, latency-sensitive service:
*
* -XX:+UseZGC
* -XX:+ZGenerational// JDK 21+ — critical for throughput
* -Xms32g -Xmx32g// Always set Xms=Xmx for ZGC
* -XX:SoftMaxHeapSize=28g // ZGC-specific: target heap occupancy
* -XX:ZCollectionInterval=5// Suggest GC cycle every 5 seconds
* -XX:ConcGCThreads=4// Concurrent GC threads
* -Xlog:gc*:file=/var/log/zgc.log:time,uptime,level,tags
*
* CRITICAL: ZGC uses ~20% native memory overhead beyond -Xmx.
* Container memory limit must be heap * 1.25 minimum.
*/
/**
* ZGCSoftMaxHeapSize is unique — it tells ZGC to try to stay below
* this threshold but can exceed it under allocation pressure.
*
* Usecase: Set heap to 32GB, SoftMaxHeapSize to 28GB.
* ZGC will trigger cycles aggressively to stay under 28GB.
* Only allocates into the remaining 4GB under extreme pressure.
*/
publicvoiddemonstrateSoftMaxHeapConcept() {
// With SoftMaxHeapSize=28g and Xmx=32g:// - ZGC targets 28GB occupancy// - If allocation pressure pushes past 28GB, ZGC cycles more aggressively// - If it hits 32GB, allocation stalls (not OOM, but backpressure)
}
}
Pause times are truly independent of heap size and live data size — tested to 16TB
The trade-off is per-access CPU overhead, not pause time — you pay on every object load
ZGC cannot use compressed object pointers (UseCompressedOops) — increases memory usage by ~15% on heaps < 32GB
Generational ZGC (JDK 21+) reduces overhead dramatically by focusing on young generation
Production Insight
ZGC's biggest production risk is native memory consumption. ZGC multi-maps the heap across multiple virtual address spaces for colored pointer management, and this multi-mapping eats into the process's virtual address space. Budget container memory as heap 1.25 for ZGC versus heap 1.15 for G1. Also, ZGC requires a 64-bit system — it does not run on 32-bit.
Key Takeaway
ZGC is the correct choice when p99 latency must be below 10ms and you can afford 10-15% throughput overhead. Enable generational mode on JDK 21+. Budget 25% extra native memory beyond heap size. ZGC's SoftMaxHeapSize is the most underrated production feature for containerized deployments.
Shenandoah — Red Hat's Low-Pause Contender
Shenandoah is Red Hat's concurrent compacting collector, available as production-ready since JDK 12. It achieves low pause times through concurrent evacuation — moving live objects while application threads run — using Brooks pointers (an indirection layer on every object).
Shenandoah differs from ZGC in a critical way: it uses Brooks pointers (every object has a forwarding pointer field) instead of colored pointers. This means Shenandoah does not require specific pointer bit layouts and works with compressed oops, reducing memory overhead compared to ZGC on heaps under 32GB.
Shenandoah operates in three concurrent phases: concurrent mark, concurrent evacuate, and concurrent update-refs. The initial mark and final mark phases are short stop-the-world pauses, typically under 10ms. Shenandoah's pacing mechanism backpressures allocation threads proportionally when the collector falls behind, creating smoother degradation than ZGC's hard allocation stalls.
package io.thecodeforge.gc;
import java.util.ArrayList;
import java.util.List;
/**
* Shenandoah-specific production considerations.
*
* Shenandoah uses Brooks pointers — every object has an extra forwarding
* pointer field. This adds 8 bytes per object on 64-bit systems.
*/
publicclassShenandoahTuningExample {
/**
* Brooks pointer overhead calculation:
*
* Object with 2fields (16 bytes header + 16 bytes data = 32 bytes)
* + 8 bytes Brooks pointer = 40 bytes per object
* Overhead: 25% increase per object
*
* For10 million small objects: ~80MB additional memory
* For100 million small objects: ~800MB additional memory
*/
publiclongestimateBrooksOverhead(int objectCount) {
return (long) objectCount * 8;
}
/**
* ProductionShenandoah flags for a 16GB heap:
*
* -XX:+UseShenandoahGC
* -Xms16g -Xmx16g
* -XX:ShenandoahGCHeuristics=adaptive
* -XX:ShenandoahAllocationThreshold=10
* -XX:+UseCompressedOops// works with Shenandoah (unlike ZGC)
* -Xlog:gc*:file=/var/log/shenandoah.log:time,uptime,level,tags
*/
/**
* Shenandoah pacing is a unique feature that backpressures allocation
* threads when the collector falls behind.
*
* UnlikeZGC which stalls allocation entirely, Shenandoah slows down
* allocating threads proportionally. This creates smoother latency
* degradation under load rather than sharp spikes.
*/
publicvoiddemonstratePacingBehavior() {
List<byte[]> allocations = newArrayList<>();
// Under heavy allocation, Shenandoah will pace this loop// by adding small delays to each allocation.// The delay is proportional to how far behind the collector is.for (int i = 0; i < 100_000; i++) {
allocations.add(newbyte[1024]);
}
}
}
No load barrier overhead — Shenandoah uses store barriers instead, which fire less frequently
Works with compressed oops — saves ~15% memory compared to ZGC on heaps under 32GB
Per-object overhead of 8 bytes — significant for workloads with many small objects
Pacing mechanism creates graceful degradation instead of hard allocation stalls
Production Insight
Shenandoah's biggest production risk is the Brooks pointer overhead on small-object-heavy workloads. If your service has 100M+ objects under 64 bytes, the 8-byte Brooks pointer per object adds ~800MB of overhead. Profile with compressed oops disabled to see true memory consumption. Additionally, Shenandoah's pacing can create subtle latency degradation that is hard to distinguish from application-level slowness — always correlate pacing delays with latency metrics.
Key Takeaway
Shenandoah is the right choice when you need low-pause GC on moderate heaps (< 32GB) and want compressed oops support. Its pacing mechanism creates smoother degradation than ZGC's allocation stalls. The Brooks pointer overhead is the hidden cost — budget 8 bytes per object.
JVM Flags That Actually Matter
Most JVM GC flags have sensible defaults. A small subset moves the needle in production. Understanding which flags to adjust — and when — prevents the common anti-pattern of blindly copying flags from blog posts without understanding their impact on your specific workload.
Flags fall into three categories: heap sizing, collector behavior, and logging. Heap sizing flags (-Xms, -Xmx, -XX:NewRatio) control memory layout. Collector behavior flags (-XX:MaxGCPauseMillis, -XX:InitiatingHeapOccupancyPercent) control collection strategy. Logging flags (-Xlog:gc) enable observability. The third category is the most important — you cannot tune what you cannot measure.
→ JVM Memory Issues in Production: Debugging Guide (OOM, GC, Leaks) — When flags alone are not enough and you need live incident triage
io/thecodeforge/gc/ProductionJVMFlags.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package io.thecodeforge.gc;
/**
* ProductionJVM flag configurations organized by collector.
* These are starting points — tune based on measured workload characteristics.
*/
publicclassProductionJVMFlags {
/**
* UNIVERSALFLAGS (apply to all collectors):
*
* -Xms<size> -Xmx<size> // Set min=max to avoid resize overhead
* -XX:+AlwaysPreTouch// Pre-zero heap pages at startup
* -XX:+DisableExplicitGC// Ignore System.gc() calls
* -XX:+HeapDumpOnOutOfMemoryError// Auto heap dump on OOM
* -XX:HeapDumpPath=/var/log/ // Where to write heap dumps
* -XX:+UseContainerSupport// Respect cgroup limits (default JDK 10+)
* -XX:MaxRAMPercentage=75.0// Set heap as % of container memory
* -XX:NativeMemoryTracking=detail // Track off-heap memory usage
*
* LOGGINGFLAGS (always enable in production):
* -Xlog:gc*:file=/var/log/gc.log:time,uptime,level
The Flag Hierarchy — What to Tune First
First: Set -Xms = -Xmx to prevent resize overhead. Size heap based on container limits, not guesswork.
Second: Enable GC logging. You cannot tune what you cannot measure. This alone solves 50% of debugging issues.
Third: Adjust collector-specific flags only after measuring with logging enabled.
Never: Copy flags from blog posts without understanding your workload's allocation profile.
Production Insight
The most impactful single flag change is enabling GC logging. Most production services run with default or minimal GC logging, making post-incident diagnosis impossible. A single line -Xlog:gc*:file=/var/log/gc.log:time,uptime,level,tags:filecount=5,filesize=50m provides pause time breakdowns, heap occupancy trends, and humongous allocation detection. Enable it before you need it — GC logs are retroactive only if they were already enabled.
Key Takeaway
Most JVM GC flags have sensible defaults. The three flags that matter most: (1) -Xms=-Xmx to prevent resize, (2) GC logging flags for observability, (3) collector-specific flags only after measuring. Never copy-paste JVM flags from the internet without profiling your own workload.
Choosing the right garbage collector depends on your workload's pause-time sensitivity, heap size, and throughput requirements. The table below summarizes the key characteristics of each major collector available in the JVM.
Collector
Pause Model
Heap Size
Primary Use Case
Java Version
Serial
Stop-the-world (STW) single-thread
<1GB
Small applications, client-side, embedded
Since JDK 1.2
Parallel
STW multi-thread
1-8GB
Throughput-oriented batch jobs, analytics
Since JDK 1.2 (default JDK 5-8)
G1
Region-based STW + concurrent marking
1GB-64GB+
General-purpose server applications
Since JDK 7 (default JDK 9+)
ZGC
Concurrent (STW < 10ms)
4GB-16TB
Ultra-low latency, large heaps
Experimental JDK 11, prod JDK 15+
Shenandoah
Concurrent (STW < 10ms)
1GB-64GB
Low latency with memory efficiency
Since JDK 12 (backported to 8, 11)
Key takeaway: For most web services, start with G1. Only move to ZGC or Shenandoah when your measured p99 latency exceeds 100ms after tuning G1. Serial and Parallel are legacy choices for resource-constrained or batch workloads.
Production Insight
The comparison above is based on default configurations. Actual production behavior depends on allocation rate, live data size, and object distribution. Always profile with your workload before making a collector switch. The most common mistake is switching to ZGC for a 2GB heap service — the native memory overhead and lack of compressed oops can increase memory consumption by 30%, leading to OOM kills.
Key Takeaway
G1 is the default for a reason — it balances throughput and pause time for most workloads. ZGC and Shenandoah offer sub-10ms pauses but cost throughput and memory. Serial and Parallel are specialized tools for batch processing or tiny heaps.
System.gc() and finalize() — Patterns to Avoid
Two legacy Java mechanisms that should be avoided in production: System.gc() and finalize(). Both degrade GC performance and unpredictability.
System.gc() — An explicit request to run the garbage collector. It's a hint, not a command, but JVM often treats it as a full GC trigger (especially with -XX:+DisableExplicitGC disabled). Calling it frequently causes unnecessary full GC pauses, wrecking latency. Also, some frameworks like RMI, NIO, and JNDI call it internally. Always set -XX:+DisableExplicitGC in production to mitigate accidental calls.
finalize() — The finalize() method, defined in Object, runs before an object is reclaimed. It's unpredictable — the JVM may never call it before exit, and GC threads can finalize objects out of order. Additionally, finalize() can resurrect objects by assigning this to a reachable reference. The method also introduces latency as the JVM must finalize objects in a separate pass. Since Java 9, finalize() is deprecated. Use Cleaner (JDK 9+), PhantomReference with a cleanup thread, or AutoCloseable / try-with-resources instead.
package io.thecodeforge.gc;
import java.lang.ref.Cleaner;
/**
* Demonstrates how to avoid System.gc() and finalize().
*
* BADPRACTICES:
* 1. CallingSystem.gc() - triggers unnecessary full GC
* 2. Overridingfinalize() - unpredictable, deprecated.
*
* GOOD: UseCleaner (JDK9+) or PhantomReference with reference queue.
*/
publicclassAvoidSystemGCAndFinalize {
// BAD - Avoid this
@Override
@Deprecated(since = "9")
protectedvoidfinalize() throwsThrowable {
try {
// Cleanup logic here - but this may never run!close();
} finally {
super.finalize();
}
}
privatevoidclose() {
System.out.println("Cleanup (if finalize runs)");
}
// GOOD - Use Cleaner (JDK 9+)privatestaticfinalCleanerCLEANER = Cleaner.create();
// State that needs cleaningprivatefinalCleaner.Cleanable cleanable;
publicAvoidSystemGCAndFinalize() {
// Register a cleaning actionthis.cleanable = CLEANER.register(this, () -> {
// This runs when the object becomes phantom-reachableSystem.out.println("Cleanup via Cleaner");
});
}
publicstaticvoidmain(String[] args) {
// NEVER do this:// System.gc(); // tells JVM to run GC - pauses, unpredictable// Better: let GC decide.// Disable explicit calls with -XX:+DisableExplicitGC// Use Cleaner or try-with-resources for cleanup.
}
}
Production Risk: System.gc() in Libraries
Some third-party libraries (RMI, JNDI, direct buffer management) call System.gc() internally. Without -XX:+DisableExplicitGC, these calls trigger full GC in your application, causing latency spikes. Always disable explicit GC in production, but test thoroughly — some frameworks rely on it for cleanup.
Production Insight
Even with -XX:+DisableExplicitGC, System.gc() is silently ignored. Best practice: always set this flag in production. For resource cleanup (file handles, sockets), use try-with-resources or Cleaner. Never rely on finalize() — it's deprecated and removed in future JDK versions (proposed for removal in JDK 18+).
Key Takeaway
Avoid System.gc() and finalize() at all costs in production code. Use -XX:+DisableExplicitGC to ignore explicit GC calls. Prefer try-with-resources for deterministic cleanup, and Cleaner for native resource cleanup.
Advantages and Disadvantages of Garbage Collection
Garbage Collection is a mixed blessing. It eliminates manual memory management bugs but introduces new operational challenges. The table below summarizes the trade-offs.
Advantages
Disadvantages
Eliminates memory leaks caused by forgotten free() calls
Introduces pauses (stop-the-world) that affect latency
Prevents dangling pointer bugs - objects are only reused after being unreachable
Reduces developer cognitive load – no manual memory management
Performance unpredictability – pauses vary with allocation pattern
Enables memory-safe concurrent programming with bounded overhead
Full GC occasionally compacts the entire heap, causing multi-second pauses
Provides tools for analysis (heap dumps, GC logs) to diagnose issues
Tuning requires deep understanding of collector algorithms and application behavior
Monitored at runtime – GC logs give insight into object lifetimes
Cannot control exactly when memory is reclaimed – objects may linger in old gen
Key takeaway: The disadvantages can be mitigated with proper collector selection and tuning. For most production services, the benefits far outweigh the costs, but ignore the downsides at your peril.
Production Insight
The biggest hidden disadvantage is the 'death by a thousand cuts' effect: a service with 50ms young GC pauses every second spends 5% of its time in GC. Combined with mixed GCs, remark pauses, and occasional full GCs, the total GC overhead can exceed 10% without any single pause being catastrophic. Track total GC time as a percentage of wall-clock time using GC logs – alert if it exceeds 5% for latency-sensitive services.
Key Takeaway
GC removes an entire class of programming errors but introduces pause and CPU overhead. Modern collectors minimize pauses but cannot eliminate them entirely. The key is to choose the right collector for your latency and throughput budget.
GC Tuning Flags Reference Table
This table lists the most important GC tuning flags along with their purpose and typical values. Use it as a quick reference when configuring JVM options for production.
Flag
Affects
Purpose
Typical Value / Range
-Xms, -Xmx
Heap size
Set initial and maximum heap
Equal values, e.g., -Xms4g -Xmx4g
-XX:MaxGCPauseMillis
G1
Soft target for maximum pause time
50–200ms (default 200)
-XX:G1HeapRegionSize
G1
Size of each region (humongous threshold = 50% of region)
1–32MB, power of 2
-XX:InitiatingHeapOccupancyPercent
G1
Heap occupancy % to trigger concurrent marking
30–45 (default 45)
-XX:G1ReservePercent
G1
Reserve % of heap for evacuation failures
10–20 (default 10)
-XX:ConcGCThreads
All concurrent
Number of threads for concurrent GC work
Auto-detected, typically n-1 cores
-XX:+DisableExplicitGC
All
Ignore System.gc() calls
Always enable in production
-XX:+UseContainerSupport
All
Respect container memory limits
Enabled by default JDK 10+
-XX:MaxRAMPercentage
All
Set max heap as % of container memory
75–85 (default 25 if not set!)
-XX:+AlwaysPreTouch
All
Commit heap pages at startup to reduce runtime latency
Enable for large heaps
-XX:NativeMemoryTracking
All
Track off-heap memory usage
summary or detail
-XX:+HeapDumpOnOutOfMemoryError
All
Generate heap dump on OOM
Enable for diagnosis
-XX:+ZGenerational
ZGC
Enable generational mode (JDK 21+)
Always enable on JDK 21+
-XX:SoftMaxHeapSize
ZGC
Target heap occupancy for ZGC (hints GC to cycle earlier)
75–90% of Xmx
-XX:ShenandoahGCHeuristics
Shenandoah
Collection policy: adaptive, compact, or static
adaptive
-Xlog:gc*
Logging
Enable GC logging with details
-Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime
Key takeaway: The most impactful flags are GC logging (for observability) and heap sizing. Tuning collector-specific flags without enabling logs is like fixing a car engine blindfolded – possible but wasteful.
Production Insight
A common mistake is setting -XX:MaxRAMPercentage incorrectly. Many container images leave it at default (25%), causing the JVM to allocate only 25% of container memory as heap. Always explicitly set -XX:MaxRAMPercentage=75.0 (or MaxRAMFraction=1) to utilize available memory. Also, never set -Xmx equal to container memory – you need room for native overhead.
Key Takeaway
Use this reference table when configuring JVM flags for a new service. Start with logging and heap sizing, then add collector-specific flags based on observed behavior. Test flag changes in staging before applying to production.
Practice Problems: GC Diagnosis and Tuning
Test your understanding of GC concepts with these five practical problems. Each problem presents a real-world scenario; identify the issue and propose a fix or tuning change.
Problem 1: Unbounded Cache Scenario: A user service caches profile objects in a HashMap. Over a weekend spike, GC logs show rising old-gen occupancy followed by frequent full GCs. P99 latency jumps from 50ms to 5s. Question: What is the likely cause and the immediate fix? Answer: Unbounded cache retains all entries, filling old gen. Immediate fix: apply size and time-based eviction (e.g., Caffeine with maximumSize and expireAfterWrite).
Problem 2: Large Object Allocation Scenario: A service using G1 with default region size (1MB) allocates many 800KB byte arrays. GC logs show numerous humongous allocation warnings and to-space exhaustion. Question: What tuning change can reduce humongous objects? Answer: Increase G1HeapRegionSize (e.g., -XX:G1HeapRegionSize=4m) so 800KB objects are under the 50% humongous threshold. Alternatively, chunk large allocations.
Problem 3: Metaspace OOM Scenario: After deploying a new microservice, pods restart every few hours with OutOfMemoryError. Heap is not full; metaspace shows steady growth. Thread count is stable. Question: What is the likely root cause and how to diagnose? Answer: Class loader leak (e.g., from repeated dynamic class generation or redeployment). Use -XX:NativeMemoryTracking=detail and jcmd to monitor metaspace. Consider -XX:MaxMetaspaceSize to limit, but fix the leak.
Problem 4: Long Pauses on 64GB Heap Scenario: A data processing service uses Parallel GC on a 64GB heap. Full GC pauses exceed 60 seconds. Changing to G1 reduces pauses but they are still >2s. Question: What should be the next step? Answer: G1 pauses scale with live data. If live data is >30GB, G1 cannot meet sub-second pauses. Consider switching to ZGC or Shenandoah, which have pause times independent of heap size.
Problem 5: Allocation Rate Spike Scenario: During a flash sale, the order service's allocation rate spikes from 100 MB/s to 2 GB/s. GC is triggered every few hundred milliseconds, CPU at 80% GC threads. Question: What is the best approach to reduce GC pressure? Answer: Optimize application code to reduce allocation (use object pooling, reuse buffers, avoid String concatenation in loops). If spikes are unavoidable, adjust heap sizing and consider ZGC for concurrent collection. Also, increase NewSize to absorb young allocation spikes.
gc.logLOG
1
2
3
4
5
[2026-03-05T14:23:45.123+0000] GC(52) PauseYoung (Normal) (G1EvacuationPause) 2048M->512M(8192M) 48.123ms
[2026-03-05T14:23:45.171+0000] GC(53) PauseYoung (Normal) (G1EvacuationPause) 2560M->1024M(8192M) 51.789ms
[2026-03-05T14:23:45.223+0000] GC(54) PauseFull (AllocationFailure) 4096M->2048M(8192M) 12345.678ms # <-- Problem1: full GC due to old gen exhaustion
[2026-03-05T14:23:57.568+0000] GC(55) PauseYoung (Normal) (G1EvacuationPause) 2048M->1024M(8192M) 52.345ms
[2026-03-05T14:24:01.234+0000] GC(56) Humongous allocation of size 819200bytes (800KB) detected. Region size 1MB, threshold 512KB. # <-- Problem2: humongous allocation warning
Approach to These Problems
Start by asking: what is the allocation pattern? Is the leak in heap or non-heap? Use GC logs, jstat, and heap dumps to gather data before proposing a tuning change.
Production Insight
These practice problems are distilled from real incidents. The unbounded cache problem alone accounts for 30% of GC-related production outages. Practicing diagnosis in a controlled setting trains the instincts needed during a live incident. Internalizing these five patterns covers 80% of common GC failures.
Key Takeaway
GC problems almost always stem from application code (caching, allocation rate) rather than default JVM settings. Tune the application first, then the collector. Use GC logs to confirm your hypothesis before making changes.
Why Objects Become Unreachable (And Why That Matters)
Every production outage I've debugged that boiled down to a GC problem started with one thing: an object that should have died but didn't. Or worse, an object that died too late.
Unreachable means zero active references. Not "I think it's done." Not "nobody should need it." Zero references on the stack or from any GC root (static fields, JNI handles, active threads). The JVM doesn't care about your intentions. It traces live references from roots outward. Everything not reached during that trace is dead.
Here's the kicker: objects can become unreachable faster than you expect. A local reference inside a method block? Gone after the method returns. An object passed to a collection that gets cleared? Eligible immediately. But the reverse is also true — a single stray reference keeps an entire object graph alive. That's how "small" memory leaks bring down production services.
Understanding reachability isn't academic. It's the difference between writing code that the GC can efficiently reclaim and code that forces full GCs every hour.
ReachabilityTrap.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// io.thecodeforge — java tutorialpublicclassReachabilityTrap {
privatestaticList<byte[]> leakList = newArrayList<>();
publicstaticvoidmain(String[] args) {
// This object is local — becomes unreachable after method exitprocessData();
// This object is held by a static reference — stays alive foreverwhile (true) {
leakList.add(new byte[1024 * 1024]); // 1 MB eachtry { Thread.sleep(100); } catch (InterruptedException e) {}
}
}
staticvoidprocessData() {
byte[] temp = new byte[10 * 1024 * 1024]; // 10 MB// temp is the ONLY reference to this 10 MB array// After this method returns, temp goes out of scopeSystem.out.println("Data processed");
}
}
Output
Data processed
(program crashes with OutOfMemoryError within seconds)
Production Trap:
A static collection that accumulates objects but never clears is the single most common memory leak pattern I see in Java services. Use WeakHashMap or explicit size limits if the data isn't truly immortal.
Key Takeaway
An object stays alive as long as any active reference chain exists. One forgotten reference = infinite lifetime.
The Two Types of GC Activity: Minor vs. Major
You can't tune GC properly if you don't understand that garbage collection runs on two distinct modes: minor and major. They're not the same thing, and confusing them gets you fired.
Minor GC happens in the Young Generation. It's fast. The JVM stops the world, copies all live objects from Eden to a survivor space, clears Eden, and resumes. Typical pause: 1-10 milliseconds. This is your friend. A healthy application should survive on minor GCs alone for 99% of its lifetime.
Major GC (or Full GC) hits the Old Generation. This is where the JVM does mark-sweep-compact across the entire heap. Pause times balloon: 100ms, 500ms, even seconds. A full GC every few hours? Fine. Every few minutes? You have a problem — either your survivor space sizing is wrong, or you're creating long-lived objects that should be short-lived.
The critical insight: you want to avoid promoting objects to Old Generation prematurely. Each object that survives a minor GC gets an age increment. When it exceeds tenuring threshold (default 15 for G1), it's promoted. If your survivor spaces are too small, objects get promoted early, fill Old Gen, and trigger frequent full GCs.
Monitor your promotion rate. If it's higher than expected, your objects are living too long.
PromotionAnalysis.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// io.thecodeforge — java tutorial// Simulate different object lifetimes to see GC impactpublicclassPromotionAnalysis {
privatestaticfinalint OBJECT_COUNT = 1_000_000;
publicstaticvoidmain(String[] args) {
// Short-lived objects: die in Young Genfor (int i = 0; i < OBJECT_COUNT; i++) {
byte[] temp = newbyte[100];
}
System.out.println("Short-lived done — minor GC only");
// Long-lived objects: promoted to Old GenList<byte[]> holders = newArrayList<>();
for (int i = 0; i < OBJECT_COUNT / 10; i++) {
holders.add(newbyte[100]);
}
// Holders survive — these get promotedSystem.out.println("Long-lived done — full GC coming");
}
}
Output
Short-lived done — minor GC only
Long-lived done — full GC coming
(Full GC pause: ~150ms on a 4GB heap)
Senior Shortcut:
Add -XX:+PrintGCDetails -XX:+PrintTenuringDistribution to your JVM flags. Watch the 'Desired survivor size' line. If survivor spaces fill above 50%, increase -XX:SurvivorRatio from the default 8 to 4.
Key Takeaway
Minor GCs are cheap and should dominate. Full GCs are expensive — minimize them by keeping objects short-lived and survivor spaces properly sized.
Requesting GC: System.gc() Is a Hint, Not a Command
I've seen junior devs sprinkle System.gc() like seasoning. "The app's memory is high, I'll tell GC to run." Stop. Please.
System.gc() is a suggestion. The JVM can ignore it entirely. Modern collectors like G1 and ZGC often do. But even when they run it, you're paying for a full GC — and you just threw away all of the collector's adaptive sizing data. The JVM has been monitoring allocation rates, promotion patterns, and pause times to optimize future GCs. Calling System.gc() resets those heuristics. Your app will run slower for minutes afterward.
There are three legitimate reasons to call System.gc(): 1. Right before a heap dump (to minimize garbage in the dump) 2. During testing, to verify GC behavior under controlled conditions 3. After a known burst of short-lived object creation that the collector hasn't processed yet
That's it. If you think you need it in production, you almost certainly have a different problem: a memory leak, oversized heap, or wrong collector choice. Fix the real problem, don't call System.gc().
And for heaven's sake, never call System.gc() in a loop, in a request handler, or inside a timer thread. I've seen all three. Each time it caused a production incident.
DontDoThis.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// io.thecodeforge — java tutorial// Demonstrates why System.gc() hurts performancepublicclassDontDoThis {
publicstaticvoidmain(String[] args) {
long start = System.nanoTime();
// Real workList<Integer> data = newArrayList<>();
for (int i = 0; i < 10_000_000; i++) {
data.add(i);
}
long mid = System.nanoTime();
System.out.println("Work took: " + (mid - start) / 1_000_000 + " ms");
// Production antipattern: force GCfor (int i = 0; i < 5; i++) {
System.gc();
}
long end = System.nanoTime();
System.out.println("After GC spam: " + (end - mid) / 1_000_000 + " ms wasted");
}
}
Output
Work took: 143 ms
After GC spam: 890 ms wasted
(Your request latency just went up 6x for nothing)
Production Trap:
If you hot-deploy code that calls System.gc(), you'll see a latency spike immediately. The GC trigger is synchronous — it blocks the calling thread until GC completes. Don't. Just don't.
Key Takeaway
System.gc() is a hint that most modern JVMs ignore or throttle. If you think you need it, you have a real problem elsewhere. Fix that instead.
Customizing GC Settings in Jelastic PaaS
Jelastic PaaS exposes heap and collector flags via topology manifests and environment variables, not raw JVM arguments. You set JAVA_OPTS or use the Cloud Scripting env block to override GC strategies per node. For example, switching from G1 to Shenandoah on a production layer requires adding JAVA_OPTS=-XX:+UseShenandoahGC to the manifest. Memory limits are tied to cloudlet quotas: the heap maximum defaults to 85% of the container's RAM, which you can cap with -Xmx inside the same variable. The trap is that Jelastic's auto-tuning may silently revert your flags on node restart if you edit them via the admin panel instead of the jelastic.env file. Always commit GC changes through version-controlled manifest sections — every cloudlet restart will read from there. This prevents production surprises when horizontal scaling spawns new nodes that inherit the wrong collector.
JelasticGCConfig.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.theforge — java tutorial// (C) Jelastic manifest snippet — not runnable directlypublicclassJelasticGCConfig {
// Exposed via Jelastic env variable JAVA_OPTS// Switch default G1 to Shenandoah:// -XX:+UseShenandoahGC// -Xmx2048m// -XX:MaxGCPauseMillis=100// This is injected before main() by the platformpublicstaticvoidmain(String[] args) {
// Verify active GC at runtime:for (GarbageCollectorMXBean gc : ManagementFactory
.getGarbageCollectorMXBeans()) {
System.out.println(gc.getName());
// Expected: Shenandoah Pauses
}
}
}
Output
Shenandoah Pauses
Production Trap:
Jelastic automatically recycles dead containers. If you set GC flags via the admin dashboard instead of the manifest, flags disappear after a node restart. Always version your GC settings in the manifest file.
Key Takeaway
In PaaS environments, GC customization lives in deployment manifests, not inside application code.
GC Implementations: HotSpot's Historical Lineage
The JVM isn't one GC — it's seven major implementations baked into HotSpot. The Serial collector uses a single thread for both minor and full GCs, suitable for single-core or client machines. Parallel (Throughput) GC employs multiple threads but stops-the-world for both young and old collection — good for batch jobs. G1 splits the heap into regions and predicts pause targets, becoming the default in JDK 9. ZGC uses load barriers and colored pointers to achieve sub-millisecond pauses regardless of heap size; it starts scanning live objects before pausing. Shenandoah evolved differently — it uses a Brooks pointer forwarding technique to relocate objects concurrently, even during the compaction phase. Each implementation betrays a different trade-off: Serial sacrifices throughput for footprint, Parallel sacrifices latency for throughput, and ZGC/Shenandoah sacrifices throughput for latency. Choose based on your application's tolerance for pause time versus raw processing speed.
ListGCImplementations.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
// io.theforge — java tutorialimport java.lang.management.ManagementFactory;
publicclassListGCImplementations {
publicstaticvoidmain(String[] args) {
System.out.println("Active GCs:");
ManagementFactory.getGarbageCollectorMXBeans()
.forEach(gc -> System.out.println(" " + gc.getName()));
// Run with: -XX:+UseZGC// Output — ZGC only shows one collector name
}
}
Output
Active GCs:
ZGC
Hidden Variation:
Each GC implementation manages the heap differently — Serial uses a contiguous old space, G1 uses regions, ZGC uses a sparse heap with forwarding tables. The same -Xmx flag changes behavior across collectors.
Key Takeaway
There is no single 'Java GC' — the JVM hosts at least seven distinct implementations, each with a unique pause/throughput profile.
Overview
Garbage collection in Java is an automatic memory management process that reclaims heap space occupied by objects no longer referenced by the application. It frees developers from manual memory deallocation, preventing two critical bugs: dangling pointers and memory leaks. However, GC is not free—it consumes CPU cycles and introduces pauses (stop-the-world events). The JVM’s heap is divided into young generation (Eden, Survivor spaces) and old generation. Most objects die young; a Minor GC in Eden is cheap. Objects that survive multiple cycles get promoted to the old generation, where a Major GC (full collection) is more expensive. Understanding when and why objects become unreachable is essential: losing all strong references, circular references between unreachable objects, or references from cleared weak/soft references. The choice of GC algorithm—Serial, Parallel, G1, ZGC, or Shenandoah—depends on latency vs. throughput trade-offs. Java’s GC has evolved from a simple mark-sweep to ultra-low-pause collectors that handle terabytes of heap without freezing applications.
GCOverviewDemo.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// io.thecodeforge — java tutorial// 25 lines maxpublicclassGCOverviewDemo {
publicstaticvoidmain(String[] args) {
// Object becomes unreachable after scope endsfor (int i = 0; i < 100_000; i++) {
String s = newString("temp");
} // s eligible for GC here// Explicitly nulling a reference (unnecessary usually)Object leak = newObject();
leak = null; // eligible now// Circular reference still collectableclassNode { Node next; }
Node a = newNode();
Node b = newNode();
a.next = b; b.next = a;
a = null; b = null; // both collectableSystem.gc(); // hint, not guarantee
}
}
Output
No output (GC runs asynchronously)
Production Trap:
Calling System.gc() can trigger full GCs that pause all threads. Modern collectors like G1 ignore it by default with -XX:+DisableExplicitGC.
Key Takeaway
GC automatically frees unreachable objects; understanding reachability is key to avoiding performance pitfalls.
Conclusion
Java’s garbage collection is a powerful abstraction that eliminates manual memory management, but it requires thoughtful tuning to avoid latency spikes and throughput degradation. The key insight is that GC behavior is determined by object reachability: as long as references exist from live roots (stack, static fields, JNI handles), objects remain alive. Understanding why objects become unreachable—scope exit, null assignment, weak reference clearing—lets you predict GC load. The two types of GC activity, Minor and Major, have drastically different pause profiles; optimizing object allocation rates reduces Minor GC frequency, while avoiding accidental retention prevents expensive Major collections. Modern collectors like ZGC and Shenandoah achieve sub-millisecond pauses even on multi-terabyte heaps by performing most work concurrently. However, no collector is a silver bullet: low-latency collectors trade CPU overhead for responsiveness. The future of Java GC includes generational ZGC and continued improvements to G1. Effective GC tuning starts with monitoring (GC logs, JFR), identifying pause patterns, and then adjusting flags like heap size, survivor ratio, and concurrent threads. Always test changes under realistic production load, and favor default settings from Java 17+ unless metrics prove otherwise.
GCTuningCheck.javaJAVA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// io.thecodeforge — java tutorial// 25 lines maximport java.util.ArrayList;
import java.util.List;
publicclassGCTuningCheck {
publicstaticvoidmain(String[] args) {
List<byte[]> holder = newArrayList<>();
// Simulate accidental retention (bad)for (int i = 0; i < 100; i++) {
holder.add(new byte[1024 * 1024]); // 1MB each
}
// Without clearing holder, objects stay reachable// This forces Major GC if memory tightSystem.out.println("Allocated 100 MB; clearing reference");
holder.clear(); // now eligible for GCSystem.gc(); // hint only
}
}
Output
Allocated 100 MB; clearing reference
Production Trap:
Accidentally holding references in static collections is the #1 cause of memory leaks. Always clear collections or use WeakHashMap for caches.
Key Takeaway
GC tuning is about measuring, not guessing. Monitor GC logs and start with defaults before tweaking flags.
● Production incidentPOST-MORTEMseverity: high
Full GC Spiral Crashes Order Processing Service During Flash Sale
Symptom
Order API p99 latency spiked from 80ms to 30+ seconds. Kubernetes liveness probes failed, triggering pod restarts. After restart, the pattern repeated within 10 minutes. GC logs showed 'Pause Full (Allocation Failure)' with increasing frequency.
Assumption
Team assumed the heap was too small and doubled -Xmx from 4GB to 8GB. The problem persisted — full GC pauses were longer because the live data set was larger.
Root cause
The service cached order objects in a ConcurrentHashMap with no eviction policy. Under flash sale traffic, the cache grew unbounded until old generation was 98% full. G1 could not reclaim enough space during mixed GCs because most old regions contained live cached data. Concurrent marking kept running but found almost nothing collectible. Eventually, young generation allocation failed and G1 fell back to a full GC stop-the-world pause. Doubling the heap only delayed the inevitable — the cache still grew unbounded.
Fix
Three-part fix: (1) Added size-bounded eviction to the order cache using Caffeine with maximumSize(50000) and expireAfterWrite(Duration.ofMinutes(30)). (2) Enabled GC logging with -Xlog:gc*,gc+humongous=debug:file=/var/log/gc.log to monitor heap pressure proactively. (3) Set -XX:InitiatingHeapOccupancyPercent=35 to trigger concurrent marking earlier, giving mixed GCs more cycles to reclaim space before allocation pressure hit.
Key lesson
Unbounded caches are the #1 cause of GC-related production incidents in Java services
Full GC 'Allocation Failure' means the collector cannot free enough space — it is not a tuning problem, it is an application memory management problem
Doubling heap without fixing the allocation pattern just delays the same failure with a longer full GC pause
Every production service must have a bounded eviction strategy for any in-memory data structure
Monitor old generation utilization sustained above 85% as a leading indicator of full GC risk
Production debug guideFollow this path when GC is suspected as the root cause of latency or availability issues.5 entries
Symptom · 01
Latency spikes correlate with GC pauses in application logs
→
Fix
Enable GC logging with -Xlog:gc*,gc+phases=debug:file=gc.log:time,uptime,level,tags and correlate pause timestamps with latency metrics. Check if pauses are young GC, mixed GC, or full GC.
Symptom · 02
Full GC appearing frequently in steady-state traffic
→
Fix
Full GC signals the collector cannot keep up. Check for unbounded caches, humongous allocation rate, heap fragmentation, or metaspace exhaustion. Use jmap -histo to identify which object types dominate the heap.
Symptom · 03
Throughput drops but pause times are acceptable
→
Fix
Collector is consuming too much CPU. Check concurrent GC thread count (-XX:ConcGCThreads). Reduce if GC CPU usage exceeds 15-20% of total. Profile allocation rate — if > 2GB/sec, reduce allocation pressure at the application level.
Symptom · 04
OOM kill with no heap exhaustion visible in metrics
→
Fix
Check native memory: metaspace, thread stacks, direct byte buffers, mmap regions. Use -XX:NativeMemoryTracking=detail and jcmd <pid> VM.native_memory summary.
Symptom · 05
GC pause time increases linearly with heap size
→
Fix
G1 pauses scale with live data set, not heap size. If pauses scale with heap, evaluate switching to ZGC or Shenandoah where pauses are independent of heap size.
★ GC Triage Cheat Sheet — First 60 SecondsFast diagnostic commands when GC is suspected. Run these before diving into GC logs.
Application unresponsive, suspected full GC−
Immediate action
Check if JVM is in a GC stop-the-world pause
Commands
jcmd <pid> GC.heap_info
jstat -gcutil <pid> 1000 10
Fix now
If Full GC count is incrementing, check for unbounded caches and heap fragmentation immediately. Restart with -Xlog:gc+humongous=debug
High CPU with low application throughput+
Immediate action
Check if GC threads are consuming CPU
Commands
top -H -p <pid> | grep -E 'VM Thread|GC Thread'
jcmd <pid> VM.flags | grep -i conc
Fix now
Reduce -XX:ConcGCThreads or -XX:ParallelGCThreads if GC CPU > 20%. Consider if allocation rate can be reduced at application level.
Latency spikes at regular intervals+
Immediate action
Correlate spike timing with GC cycle phases
Commands
jstat -gcutil <pid> 500 20
grep 'Pause' gc.log | tail -20
Fix now
If spikes align with 'mixed' or 'remark' phases, tune -XX:G1MixedGCCountTarget or -XX:MaxGCPauseMillis.
OOM kill by container orchestrator (k8s)+
Immediate action
Compare container memory limit with JVM heap + native overhead
Commands
kubectl describe pod <pod> | grep -A5 'OOMKilled'
jcmd <pid> VM.native_memory summary
Fix now
Set -XX:MaxRAMPercentage to 75% max (not 90%). Account for ~20% native overhead. Add container memory limit = heap * 1.3 for ZGC.
Allocation failure in logs, to-space exhausted+
Immediate action
G1 cannot evacuate objects — critical failure
Commands
grep 'to-space exhausted' gc.log | wc -l
grep 'humongous' gc.log | tail -20
Fix now
Increase -XX:G1ReservePercent to 15. Increase region size. Reduce allocation rate. This triggers full GC — treat as P1.
Key takeaways
1
GC reclaims memory from unreachable objects. The primary driver of GC frequency is allocation rate, not heap size.
2
The generational heap exploits that most objects die young. Premature promotion pollutes old gen and increases full GC risk.
3
G1 is the default collector for good reason, but its pause time scales with live data, not heap size.
4
ZGC and Shenandoah achieve sub-10ms pauses at the cost of throughput or memory overhead.
5
Always enable GC logging in production. You cannot tune what you cannot measure.
6
Unbounded caches are the #1 cause of GC-related production incidents. Apply eviction policies to every in-memory cache.
Common mistakes to avoid
5 patterns
×
Using System.gc() in application code or relying on finalize() for cleanup
Symptom
Unexpected full GC pauses or objects never reclaimed before OOM
Fix
Set -XX:+DisableExplicitGC in production. Replace finalize() with Cleaner or try-with-resources.
×
Setting -Xmx equal to container memory limit without considering native overhead
Symptom
OOM kills by container orchestrator even though heap is under 100%
Fix
Budget container memory as heap 1.15 for G1, heap 1.25 for ZGC. Use -XX:MaxRAMPercentage=75.
×
Tuning collector-specific flags without first enabling GC logging
Symptom
Random latency spikes with no diagnostic data to correlate
Fix
Enable GC logging first: -Xlog:gc*:file=gc.log:time,uptime. Only then adjust flags.
×
Blindly copying JVM flags from blog posts or other services
Symptom
Poor GC performance or unexpected full GC behavior in production
Fix
Profile your own application's allocation rate and live data set before tuning. Test in staging.
×
Ignoring humongous allocations in G1 until to-space exhaustion occurs
Symptom
Full GC triggered by to-space exhaustion, causing multi-second pauses
Fix
Monitor GC logs for 'humongous allocation' warnings. Increase -XX:G1HeapRegionSize or chunk large objects.
INTERVIEW PREP · PRACTICE MODE
Interview Questions on This Topic
Q01SENIOR
Explain the difference between a young GC, mixed GC, and full GC in G1. ...
Q02SENIOR
A service using G1 with 8GB heap experiences full GC every 15 minutes un...
Q03SENIOR
What are the trade-offs between ZGC and Shenandoah? When would you choos...
Q01 of 03SENIOR
Explain the difference between a young GC, mixed GC, and full GC in G1. When does each occur?
ANSWER
Young GC collects eden and survivor regions when young generation fills. Mixed GC collects both young and old regions after concurrent marking, targeting regions with mostly garbage. Full GC is a stop-the-world compaction triggered when allocation fails (to-space exhausted, or concurrent mark could not free enough space). Full GC is the catastrophic failure mode you want to avoid.
Q02 of 03SENIOR
A service using G1 with 8GB heap experiences full GC every 15 minutes under steady load. GC logs show old gen occupancy at 85% before each full GC. What should you check first?
ANSWER
First check for unbounded caches or static collections that retain objects forever. Use jmap -histo to identify the dominant object types. Full GC at 85% suggests old gen is filling with long-lived objects faster than mixed GCs can reclaim. Likely cause: application-level memory leak, not GC tuning. After fixing the leak, adjust -XX:InitiatingHeapOccupancyPercent to start concurrent marking earlier (e.g., 35%).
Q03 of 03SENIOR
What are the trade-offs between ZGC and Shenandoah? When would you choose one over the other?
ANSWER
ZGC uses colored pointers with load barriers, achieving sub-10ms pauses on heaps up to 16TB, but cannot use compressed oops, increasing memory usage ~15% on heaps <32GB. Shenandoah uses Brooks pointers (store barriers) and supports compressed oops, reducing memory overhead on smaller heaps. Shenandoah's pacing mechanism provides smoother degradation under load. Choose ZGC for very large heaps (over 64GB) or where load barrier overhead is acceptable. Choose Shenandoah for heaps under 32GB where memory efficiency matters, or when compressed oops support is needed.
01
Explain the difference between a young GC, mixed GC, and full GC in G1. When does each occur?
SENIOR
02
A service using G1 with 8GB heap experiences full GC every 15 minutes under steady load. GC logs show old gen occupancy at 85% before each full GC. What should you check first?
SENIOR
03
What are the trade-offs between ZGC and Shenandoah? When would you choose one over the other?
SENIOR
FAQ · 4 QUESTIONS
Frequently Asked Questions
01
What is the difference between a young GC and a full GC?
Young GC collects only the young generation (eden and survivor) and is fast because most objects die young. Full GC collects the entire heap, compacting all regions, and can take seconds or tens of seconds. Full GC should be a rare event in a well-tuned service.
Was this helpful?
02
How do I enable GC logging for a running JVM without restarting?
Use jcmd <pid> VM.log output=gc.log what=gc* (JDK 10+). For JDK 8/9, you need to restart with -Xloggc:<file> or use -XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:+PrintGCDetails -XX:+PrintGCDateStamps. For persistent change, add the flag to the JVM startup command.
Was this helpful?
03
What is the recommended heap size for a containerized Java application?
Set -XX:MaxRAMPercentage=75.0 (or -XX:MaxRAMFraction=1) so the JVM uses 75% of container memory for heap. The remaining 25% is for native memory (threads, metaspace, GC overhead). Never set -Xmx equal to container memory limit.
Was this helpful?
04
Why does my application run out of memory even though the heap is not full?
Check native memory: metaspace, thread stacks, direct byte buffers, code cache, GC structures. Use -XX:NativeMemoryTracking=detail and jcmd <pid> VM.native_memory summary. Metaspace leaks from class loader re-deployment are a common cause.