JVM Garbage Collection - What Actually Matters in Production?

From HTTP Request to Heap Allocation
The JVM Heap Model for Backend Services
GC and API Latency: p99 Over Average
Modern Collectors
Java Backend Memory Leaks That Look Like GC Problems
Spring Boot, Hibernate, Jackson, and GC Pressure
Conclusion

Garbage collection is not just JVM internals. For a Java backend developer, GC is connected to request latency, allocation patterns, object lifetime, framework behavior, container memory limits, observability, and incident response.

From HTTP Request to Heap Allocation

When a request enters a Java backend service, it does not only consume CPU and network resources. It also creates objects.

A lot of them!!!

For a single request, that may not matter. But in a production service handling hundreds or thousands of requests per second, those allocations become one of the main drivers of garbage collection behavior.

A Request Is an Allocation Pipeline

Considering a typical Spring Boot REST API request:

@PostMapping("/beers")
public BeerResponse createBeer(@RequestBody CreateBeerRequest request) {
    Beer beer = beerService.createBeer(request);
    return beerMapper.toResponse(beer);
}

This may look simple, but several allocations happen before and after this method runs.

A typical request may involve:

request and response DTOs
service-layer objects
domain entities
collections and strings
logging messages
database result objects
etc…

Most of these objects are short-lived. They exist only during the lifetime of the request.

That is good. The JVM is optimized for this pattern.

But at scale, even short-lived objects can become expensive.

Backend Services Allocate Constantly

A Java backend service is usually allocation-heavy by design.

Frameworks like Spring, Hibernate, Jackson, Bean Validation, Micrometer, logging libraries, and HTTP servers create many temporary objects to provide abstraction and developer productivity.

That is not automatically a bad thing, the problem starts when allocation rate becomes too high.

Allocation rate means how much memory the application allocates per second.

For example:

500 MB/s allocation rate

This means that the application is creating half a gigabyte of new objects every second.

Even if most objects die quickly, the garbage collector still needs to process that memory. This is why allocation rate is often more important than total heap usage. A service can have plenty of free heap and still suffer from frequent garbage collections because it allocates too aggressively.

Most Request Objects Should Die Young

Modern JVM garbage collectors rely heavily on the hypothesis that most objects die young. This fits backend applications very well. For example, these objects usually die quickly:

CreateBeerRequest request
BeerResponse response
List<BeerBreweryResponse> breweries
String formattedMessage
Map<String, Object> logContext

They are created during request processing and become unreachable shortly after the response is sent. These objects are usually allocated in the young generation, specifically in Eden space. When Eden fills up, the JVM performs a young GC to remove dead objects. This is normally cheap because most young objects are already garbage. That is the happy path :)

Problems start when objects that should be short-lived accidentally survive.

For example:

private static final List<CreateBeerRequest> requests = new ArrayList<>();

If these objects remain reachable, the GC cannot collect them. After surviving enough young collections, they may be promoted to the old generation, which changes the situation completely.

Young generation garbage is expected. Old generation growth is more serious. In backend systems, old generation growth often means one of these things:

a legitimate cache is growing
a queue is not being drained fast enough
a memory leak exists
request-scoped data escaped into long-lived state
the application is holding too many database entities
the service is processing payloads that are too large

This code looks harmless at first glance:

public List<BeerResponse> getBeers(User user) {
    return beerRepository.findByUserId(user.getId())
            .stream()
            .map(beer -> new BeerResponse(
                    beer.getId(),
                    beer.getStatus().name(),
                    beer.getCreatedAt().toString()
            ))
            .toList();
}

But under load, it may allocate many objects:

database entity objects
stream pipeline objects
lambda-related structures
response DTOs
strings from name()
strings from toString()
internal list storage

This does not mean streams are bad. It means hot paths must be understood in terms of allocation behavior. In low-traffic code, readability may matter more. In high-QPS endpoints, allocation patterns can affect GC frequency and latency.

JSON Serialization: a Major Allocation Source

For backend APIs, JSON processing is one of the most common sources of allocation.

When using Jackson, the application may allocate:

request DTOs
nested DTOs
temporary parsing buffers
strings
collections
reflection metadata structures
response serialization buffers

Example:

@PostMapping("/users/search")
public List<UserResponse> search(@RequestBody SearchRequest request) {
    return userService.search(request);
}

If the response contains thousands of users, the service may allocate a large object graph:

List<UserResponse>
  -> UserResponse
  -> AddressResponse
  -> List<RoleResponse>
  -> String fields
  -> serialization buffers

Large responses increase memory pressure, even if the endpoint seems functionally correct. This is why pagination is not only a database concern, but it is also a memory and GC concern.

Hibernate Can Increase Object Lifetime

ORM frameworks can also influence GC behavior. With Hibernate/JPA, a query does not just return raw database rows. It may create:

entity objects
proxy objects
collections
persistence context entries
lazy-loading structures

For example:

List<Beer> beers = beerRepository.findAll();

This can be dangerous if the result set is large. The persistence context may keep entities reachable longer than expected. That means objects that could have died quickly may stay alive until the transaction or session ends. In batch jobs or large backend operations, this can cause old generation pressure.

A safer pattern may be:

Page<Beer> page = beerRepository.findAll(pageable);

or streaming/clearing the persistence context carefully in batch processing.

This seems obvious, but I thought it was worth mentioning that database result size affects heap pressure.

Logging Can Allocate More Than Expected

Logging is another common source of hidden allocation. This is especially true when logs are built eagerly:

log.debug("Beer details: " + expensiveObject.toString());

Even if debug logging is disabled, the string concatenation may still happen before the logger decides whether to write the message.

I prefer parameterized logging:

log.debug("Beer details: {}", expensiveObject);

But even parameterized logging is not free if the object’s toString() is eventually called. This becomes important in high-throughput systems where every request logs multiple lines. Logging full payloads is especially risky. This can create large strings, increase allocation rate, slow down request processing and of course expose sensitive data.

Good backend logging should be intentional, bounded, and production-safe. So only log what is necessary.

Heap Usage Alone Is Not Enough

I’ve seen many developers look only at heap usage:

Heap used: 2.5 GB / 4 GB

That number is useful, but incomplete. For backend performance, we also need to understand:

Allocation rate
Young GC frequency
Old generation growth
Promotion rate
GC pause duration
Live object set size

A service with stable heap usage may still have bad GC behavior if it allocates too much temporary memory.

What This Means for Java Backend Developers

A Java backend developer does not need to manually manage memory like in C or C++. But that does not mean memory is irrelevant. In Java, our responsibility is different:

keep object lifetimes short when possible
avoid unbounded data structures
be careful with large responses
understand framework allocation behavior
monitor allocation rate and GC pauses
investigate old-generation growth
use evidence before tuning JVM flags

Backend engineers must know that performance is not only about algorithms or database indexes. It is also about runtime behavior!

The JVM Heap Model for Backend Services

A Java backend developer does not need to manage memory manually, but they do need to understand where objects live, how long they live, and why that affects latency. The JVM heap is not just “where objects go.” In production, it is the space where request traffic, framework behavior, caching, JSON processing, database access, and application design all meet.

The most important idea is this:

The heap is shaped by object lifetime.

Not all objects are equal. A request DTO that lives for 20 milliseconds is very different from a cache entry that lives for hours.

Heap vs Stack

In Java, local variables and method calls are managed on thread stacks, while objects usually live on the heap. Example:

public BeerResponse getBeer(Long id) {
    Beer beer = beerRepository.findById(id).orElseThrow();
    return beerMapper.toResponse(beer);
}

In simplified terms:

Stack:

id
beer reference
return value reference

Heap:

Long object, depending on boxing/caching
Beer entity
BeerResponse DTO
Strings, collections, nested objects

The stack stores references and method execution state. The heap stores the actual objects.

When the method returns, stack variables disappear. But heap objects are only collectible if nothing still references them, and that distinction is critical. This object can be collected after the request:

BeerResponse response = new BeerResponse(...);

Unless something keeps a reference to it:

recentResponses.add(response);

The garbage collector does not care whether the request is finished. It only cares whether the object is still reachable.

Reachability

The JVM collects objects that are no longer reachable from known starting points called GC roots. Common GC roots may include:

active thread stacks
static fields
class metadata references
running thread references

Example:

private static final Map<Long, UserProfile> cache = new HashMap<>();

Anything inside this static map is reachable through a GC root. That means the GC cannot collect it, even if the data is no longer useful. This is why memory leaks in Java are not usually “lost memory” in the C/C++ sense. They are usually unwanted reachable objects. A Java memory leak means that the application still holds references to objects it no longer needs.

That is one of the most important distinctions for us backend developers.

Young Generation

Most new objects are initially allocated in the young generation. The young generation commonly contains:

Eden space
Survivor space 1
Survivor space 2

The usual flow is:

Eden
survives young GC
Survivor space
survives multiple collections
promoted to Old Generation

This is the ideal case. Short-lived request objects should be born, used, and collected quickly.

A young GC collects the young generation. Usually, it is efficient because most young objects are dead.

Before young GC:

Eden: many request DTOs, JSON buffers, temporary lists

After young GC:

dead objects removed
surviving objects copied to Survivor or promoted

Young GCs are normally expected in Java backend applications. The problem is not that young GCs happen, it is when they happen too often or take too long. Frequent young GCs usually indicate high allocation rate. That often comes from:

large responses
excessive DTO mapping
object-heavy transformations
unnecessary intermediate collections
verbose logging
inefficient serialization
high request volume
large database result sets

Survivor Spaces

Objects that survive a young GC may be moved into a Survivor space. Survivor spaces exist to avoid promoting objects to the old generation too quickly. It is something like this:

Request starts
Object allocated in Eden
Young GC happens while request is still running
Object is still reachable
Object moves to Survivor
Request finishes
Next young GC collects it

And this is normal!!!

A request object may survive one young GC simply because the request was still being processed at the time. That does not mean it is a leak. But if many objects keep surviving multiple young GCs, the JVM may eventually promote them to the old generation. That is where things become more expensive.

Old Generation

The old generation stores objects that survived long enough to be considered long-lived. In backend services, old generation commonly contains:

Spring beans
application configuration
caches
connection pools
thread pools
loaded class structures and framework objects
long-lived domain data
retained Hibernate entities
queued messages
large collections
session data

Old generation is not bad, the problem is unexpected old-generation growth. For example:

private final List<BeerEvent> goodBeers = new ArrayList<>();

If this list grows without bounds, old generation usage may continuously increase. The GC can run repeatedly and still fail to reclaim memory because those objects are still reachable. That is not a garbage collector failure its object retention!

Live Set

A very important production concept is the live set. The live set is the amount of memory still occupied after a full or major collection. For example:

Heap before GC: 6 GB
Heap after GC: 2 GB
Live set: about 2 GB

The live set represents objects the application is actually retaining. If the live set keeps growing over time, you may have:

a memory leak
an unbounded cache
growing queues
large long-lived maps
legitimate growth in business data

This is much more useful than looking only at total heap usage. A healthy service often has a pattern like this:

heap grows
GC runs
heap drops
heap grows
GC runs
heap drops

An unhealthy service may look like this:

heap grows
GC runs
heap drops less than before
heap grows again
GC runs
heap drops even less
old generation keeps increasing

Heap Size Lies

A common mistake is assuming that a bigger heap always means fewer problems, sometimes increasing heap may help. For example, if a service has a bursty allocation pattern, a slightly larger heap may reduce GC frequency. But bigger heap can also make things worse.

Larger heaps may:

hide memory leaks for longer
increase time spent scanning live objects
delay failure instead of fixing it
increase container memory cost
make old-generation collections more expensive

For backend systems, heap sizing should be based on:

live set size
allocation rate
latency requirements
traffic profile
container memory limits
collector choice
headroom for spikes

Modern Collectors

When learning GC, diagrams often show this:

Heap
├── Young Generation
│    ├── Eden
│    ├── Survivor 0
│    └── Survivor 1
└── Old Generation

This model is useful, but modern collectors may implement it differently. For example, G1 divides the heap into regions. A region can be used as Eden, Survivor, or Old depending on current GC needs. Conceptually, the generational model still matters. But physically, the heap may not be one large contiguous young area and one large contiguous old area. The young/old model is still useful conceptually, but collectors like G1 implement the heap using regions rather than fixed contiguous generations.

Backend Example

Considering this endpoint:

@GetMapping("/beers/{id}")
public BeerResponse getBeers(@PathVariable Long id) {
    Beer beer = beerService.getBeer(id);
    return beerMapper.toResponse(beer);
}

A healthy memory profile might look like this:

Long-lived:

BeerService
BeerRepository
DataSource
BeerMapper
Spring MVC infrastructure

Short-lived:

request wrapper
path variable objects
Beer entity
BeerResponse DTO
JSON serialization buffer

After the response is sent, most request-specific objects should become unreachable. That is exactly what the JVM is good at.

Now consider this:

@Component
public class ProductDebugStore {

    private final List<ProductResponse> responses = new ArrayList<>();

    public void save(ProductResponse response) {
        responses.add(response);
    }
}

And this endpoint:

@GetMapping("/products/{id}")
public ProductResponse getProduct(@PathVariable Long id) {
    Product product = productService.getProduct(id);
    ProductResponse response = productMapper.toResponse(product);

    productDebugStore.save(response);

    return response;
}

Now every response is kept alive. The object is no longer request-scoped. It is reachable through a Spring singleton! The GC cannot collect it. Eventually, these objects may move to the old generation and stay there. This kind of issue often appears in production as:

old generation keeps growing
GC runs more often
pause times increase
eventually OutOfMemoryError or container restart

The root cause is not “bad GC.” Its accidental retention.

GC and API Latency: p99 Over Average

In backend systems, garbage collection becomes important when it affects request latency. A GC event is not just a memory-management detail. During certain phases of garbage collection, the JVM may pause application threads. When that happens, your service temporarily stops processing requests. That means GC can directly affect:

HTTP response time
p95 / p99 latency
timeouts
retries
service-to-service communication

The key production idea is that GC problems usually appear as latency problems before they appear as memory problems.

Average Latency Can Hide GC Issues

Many developers look at average latency first. For example the average latency is 25ms. That number may look healthy, but averages are often misleading in backend systems. A service can have a good average latency while still having serious tail-latency problems.

For example:

Average latency: 25ms
p95 latency: 80ms
p99 latency: 750ms

This means most requests are fine, but the slowest 1% are very slow and for high-traffic services, that 1% matters. If a service handles 2,000 requests per second, then 1% means 20 slow requests every second!

For backend APIs:

p50 tells you the normal user experience
p95 tells you about common slowness
p99 tells you about production pain
p999 tells you about rare but dangerous behavior

GC often appears in p99 or p999 because pauses may not happen constantly. They happen periodically, under pressure, or during traffic bursts. That is why GC issues can stay hidden if you only monitor average latency.

Stop-the-World Pauses

A stop-the-world pause means application threads are paused so the JVM can perform some GC work safely. During this pause:

request threads stop running
scheduled tasks stop running
database calls may wait for application code to resume

From the outside, the service looks slow or frozen. For example:

Request starts
Service begins processing
GC pause happens for 300ms
Application resumes
Request completes

Even if the application logic only needed 20ms, the observed latency may become 20ms application work + 300ms GC pause = 320ms response time. The user does not care that only 20ms was “real work.” The user sees a 320ms response.

A database query usually affects one request. A slow external API call usually affects one request. A GC pause can affect many requests at the same time. Suppose a given service that has 100 active request-processing threads. If a stop-the-world pause lasts 200ms, all those threads may be paused. That means one GC event can delay many in-flight requests simultaneously. For example:

100 requests in progress
200ms stop-the-world pause
all 100 requests become at least 200ms slower

This is why GC has a strong effect on tail latency. The pause is not isolated to one unlucky request. It can hit the whole process. And that why the problem gets worse under load.

While application threads are paused, new requests may continue arriving. Those requests wait in queues and after the JVM resumes, the service may need to process both requests that were paused and requests that arrived during the pause

This can create a temporary backlog. A 200ms GC pause may cause more than 200ms of user-visible latency if the service was already close to capacity. The pattern is something like this:

GC pause
request processing stops
queues grow
service resumes
backlog causes more latency
clients timeout or retry

And it gets worse in microservices, because latency does not stay local. Imagine Service A calls Service B. If Service B has a GC pause, Service A may timeout. Then Service A retries:

Service B pauses
Service A times out
Service A retries
Service B receives even more requests

Now Service B has more load than before. More load means:

more request objects
more JSON parsing
more DTO creation
more logging
more allocations
more GC pressure

This creates a dangerous feedback loop:

GC pause
latency spike
timeout
retry
more traffic
higher allocation rate
more GC pressure
more pauses

This is how a local JVM memory issue can become a distributed-system incident.

GC and Thread Pools

Java backend services usually depend on thread pools:

Tomcat request threads
database connection pools
async executor pools
scheduler pools
Kafka listener threads

GC pauses interact badly with these pools. During a pause, worker threads stop making progress. If requests keep arriving, thread pools and queues may fill up. This is why GC should be analyzed together with thread pool metrics. A p99 spike may not be caused only by “slow code.” It may be caused by a JVM pause that prevented otherwise healthy threads from running.

Latency-Oriented vs Throughput-Oriented GC

Garbage collectors make trade-offs. A throughput-oriented collector tries to maximize the amount of useful application work completed over time. A latency-oriented collector tries to minimize pause duration, often by doing more work concurrently.

Backend services usually care about latency because they serve users or other services. But that does not mean the lowest-pause collector is always the best choice. There is a trade-off, lower pause times mean more concurrent GC work, more CPU overhead and potentially lower throughput. For a backend service, the question is not which GC is fastest? The better question is which GC gives acceptable p99 latency at acceptable CPU and memory cost for this workload? That is the production mindset in my point of view.

Example: A GC Pause Hidden in p99

Imagine this service:

Endpoint: GET /products/{id}
Average latency: 18ms
p95 latency: 45ms
p99 latency: 480ms
Error rate: low
CPU: normal
Database latency: normal

At first glance, nothing obvious is broken. Then you check GC logs and see something like this:

00:01.200 Young GC pause: 12ms
00:05.700 Young GC pause: 18ms
00:10.400 Young GC pause: 22ms
00:15.900 Mixed GC pause: 410ms
00:16.000 p99 latency spike observed

The application code did not suddenly become slower, the JVM paused request processing. The important skill is not memorizing GC terminology. The important skill is connecting the runtime event to user-visible behavior.

GC Pauses Can Break Health Checks

In containerized systems, GC pauses can also interfere with health checks. If a service pauses long enough, Kubernetes or a load balancer may see failed readiness or liveness checks. Possible outcomes may be:

service temporarily removed from load balancer
pod restarted
traffic shifted to other pods
remaining pods receive more load
more GC pressure on remaining pods

This can create a cascading failure. A single service with unstable GC behavior can cause the platform to make the situation worse by restarting or rerouting traffic. That does not mean health checks are bad. It means GC pauses need to be considered when setting timeouts and thresholds.

Why Low Traffic Testing Misses GC Problems

GC behavior depends heavily on workload. Local testing usually has:

low request volume
small payloads
short test duration
low concurrency

Production has:

high QPS
large payloads
long-running process
real user traffic
traffic bursts
large caches
background jobs
database variability
retries

A service can look perfect locally and still have GC-related p99 spikes in production. This is why load testing matters. A realistic performance test should simulate:

expected QPS
concurrent users
payload size distribution
large responses
database behavior
cache warmup
background jobs
traffic bursts

GC tuning without realistic load is mostly guesswork.

What to Monitor

For backend services, monitor GC together with application metrics. Useful JVM metrics:

GC pause duration
GC pause count
young GC frequency
old GC frequency
heap used before/after GC
old generation usage
allocation rate
promotion rate

Useful application metrics:

request latency p95/p99
request throughput
error rate
timeout rate
retry rate
thread pool usage
connection pool usage
CPU usage
container memory usage

The goal is correlation. You may want to answer:

Did p99 latency increase at the same time as GC pauses?
Did old generation grow before the incident?
Did allocation rate increase after a deployment?
Did retries increase after GC pauses?
Did thread pools saturate after the JVM resumed?

Modern Collectors

I do not need to know every garbage collector in detail. What matters is knowing which collector fits which production problem. The decision is usually about the trade-off: latency vs throughput vs CPU cost vs memory footprint

G1 GC

For most modern Java backend services, G1 GC is the default choice. Oracle’s Java 25 documentation states that G1 is selected by default on most hardware and operating system configurations. It also recommends starting with G1 defaults, optionally setting a pause-time goal and maximum heap size. G1 is a good fit for:

Spring Boot APIs
microservices
medium-size heaps
general backend workloads
systems that need balanced latency and throughput

Why it works well:

region-based heap
incremental collection
predictable pause-time goals
good default behavior
less manual tuning than older collectors

The production mindset is to start with G1 unless you have evidence that your latency requirements need something else.

ZGC

ZGC is designed for low-latency applications where long GC pauses are unacceptable. Modern ZGC is generational, it splits the heap into young and old generations so it can collect recently allocated objects separately from long-lived objects. ZGC is interesting for backend services with:

large heaps
strict p99/p999 latency targets
real-time-ish APIs
highly interactive systems
services where GC pauses are visible to users

You might consider ZGC when:

G1 pause times are too high
old-gen collections affect p99 latency
heap size is large
the business cares more about predictable latency than maximum throughput

The trade-off is that ZGC reduces pause times by doing more work concurrently, but that work consumes CPU. So ZGC is not automatically “better”. It may reduce latency while increasing CPU usage.

Shenandoah

Shenandoah is also a low-pause collector. Its key idea is to reduce pause times by doing more garbage collection work concurrently with the running Java application, including concurrent compaction. It is useful for similar cases as ZGC.

Like ZGC, Shenandoah shifts more work out of stop-the-world pauses and into concurrent execution. The trade-offs are also similar.

How to Choose in Backend Systems

A practical decision model:

Default service? Use G1
Typical Spring Boot microservice? Use G1
No clear GC problem? Stay with G1
Strict p99/p999 latency target? Test ZGC or Shenandoah
Large heap with painful pauses? Test ZGC or Shenandoah
CPU-constrained environment? Be careful with concurrent collectors
Batch job focused on total throughput? Do not blindly choose low-latency GC

The important point is that the collector choice should be driven by workload and latency requirements, not by hype.

Java Backend Memory Leaks That Look Like GC Problems

Many production “GC problems” are not caused by the garbage collector. They are caused by the application keeping references to objects it no longer needs. The key idea is that GC cannot collect objects that are still reachable. A memory leak is often an object-retention problem, not a garbage-collector problem. In Java, memory leaks usually do not happen because memory is “lost.” They happen because objects are still reachable from somewhere: a static field, a cache, a queue, a thread, a session, a listener, or a framework context.

Unbounded Caches

Caching is one of the most common sources of Java backend leaks. For example:

private final Map<String, UserProfile> cache = new HashMap<>();

If this map has no size limit, expiration policy, or eviction strategy, it can grow forever. This often happens with:

user profiles
authorization data
product catalogs
API responses
tenant configuration
lookup tables

A cache should usually have:

maximum size
time-based expiration
metrics
eviction policy

Using a library like Caffeine is usually better than building a cache with a raw HashMap.

Static Collections

Static collections are dangerous because they live as long as the classloader lives. For example:

import java.util.ArrayList;
import java.util.List;

public class DebugStore {
    private static final List<Object> EVENTS = new ArrayList<>();

    public static void add(Object event) {
        EVENTS.add(event);
    }
}

Every object added to EVENTS remains reachable. This is especially dangerous for:

debug stores
temporary registries
in-memory audit logs
test utilities accidentally used in production
static maps of request data

A static reference is effectively a global root. If it grows without bounds, GC cannot help.

Forgotten Listeners and Callbacks

Listeners can leak memory when they are registered but never removed. For example:

eventBus.register(listener);

If the event bus is long-lived and the listener references request, user, or component state, that state may remain alive. Common places are:

event buses
message listeners
application lifecycle hooks
WebSocket subscriptions
observer patterns
reactive streams
scheduler callbacks

The fix is to unregister listeners when they are no longer needed:

eventBus.unregister(listener);

This matters especially in long-running backend services where leaks accumulate slowly.

ThreadLocal Misuse

ThreadLocal is useful, but dangerous in backend applications because request threads are reused. For example:

private static final ThreadLocal<RequestContext> context = new ThreadLocal<>();

public void handle(RequestContext requestContext) {
    context.set(requestContext);
}

If you do not call remove(), the request context may stay attached to the thread after the request finishes. In servlet containers, executor pools, and async processing, those threads may live for the lifetime of the application. The correct pattern is:

try {
    context.set(requestContext);
    process();
} finally {
    context.remove();
}

Leaks through ThreadLocal are common with:

security context
tenant context
correlation IDs
request metadata
large user/session objects
custom tracing data

The rule is simple, if you set a ThreadLocal in request processing, remove it in a finally block or try-with-resources.

Classloader Leaks

Classloader leaks usually appear in application servers, plugin systems, hot reload environments, or repeated redeployments. A classloader can be retained by:

static fields
running threads
ThreadLocals
JDBC drivers
logging frameworks
custom registries

The result is that old application classes and objects cannot be collected after redeployment. In modern Spring Boot applications packaged as containers, this is less common than in traditional app servers, but it still matters in:

Tomcat deployments
application servers

Large Session Objects

Sessions can retain much more memory than expected. Example:

session.setAttribute("cart", largeCartObject);
session.setAttribute("userProfile", fullUserProfile);
session.setAttribute("lastSearchResults", hugeResultList);

This becomes dangerous when:

sessions are long-lived
many users are active
objects stored in session are large
session replication is enabled
old session data is not removed

A common mistake is storing full domain objects or large result sets in session instead of storing small identifiers. Prefer:

user ID instead of full user object
cart ID instead of full cart graph
pagination token instead of full search result list

Sessions should be small, intentional, and time-bounded.

ORM Persistence Context Growth

Hibernate/JPA can retain entities longer than expected. Inside a transaction, the persistence context tracks managed entities for dirty checking and identity management. This is fine for small operations:

Beer beer = entityManager.find(Beer.class, id);

But it becomes dangerous in large operations:

List<Beer> beers = beerRepository.findAll();

for (Beer beer : beers) {
    process(beer);
}

The persistence context may retain many entities until the transaction ends. In batch jobs, this can cause old-generation pressure or OutOfMemoryError. Safer approaches may include:

pagination
streaming carefully
smaller transactions
read-only queries
clearing the persistence context
avoiding huge findAll operations

Example in batch processing:

entityManager.flush();
entityManager.clear();

Queues Growing Faster Than Consumers

Queues are another common backend leak pattern. Example:

private final BlockingQueue<Event> queue = new LinkedBlockingQueue<>();

If the queue is unbounded and producers are faster than consumers, memory usage grows continuously. This happens with:

async event processing
Kafka/RabbitMQ consumers
email jobs
audit logging
background task queues
retry queues

Symptoms:

heap grows during traffic spikes
old generation increases
latency gets worse
consumer lag increases

Spring Boot, Hibernate, Jackson, and GC Pressure

In real Java backend systems, GC pressure rarely comes from one obvious new statement. It usually comes from the ecosystem around the request:

Spring MVC
Jackson
Hibernate/JPA
Bean Validation
logging
DTO mappers
AOP proxies
reflection
metrics/tracing

These tools are productive and valuable, but they also create objects. At production traffic levels, framework allocation becomes part of your performance profile. The point is not to avoid frameworks, its to understand their runtime cost.

###DTO Mapping Overhead

Most backend services use DTOs to separate API models from domain models. For example:

public BeerResponse toResponse(Beer beer) {
    return new BeerResponse(
            beer.getId(),
            beer.getStatus().name(),
            beer.getCustomer().getName(), 
            beer.getBreweries().stream()
            .map(this::toBreweryResponse)
            .toList()
    );
}

This is clean design, as is MapStruct, but it allocates:

BeerResponse
BreweryResponse objects
lists
strings
stream pipeline objects
temporary mapping structures

DTO mapping becomes expensive when:

responses are large
mapping happens in hot endpoints
nested object graphs are deep
multiple mapping layers exist
intermediate collections are created

This does not mean DTOs are bad. It means large mappings should be intentional, measured, and paginated where possible.

Jackson Object Creation

JSON serialization and deserialization are major allocation sources in REST APIs. For inbound requests, Jackson may allocate:

request DTOs
nested objects
collections
strings
parser buffers
temporary metadata structures

For outbound responses, it may allocate:

serialization buffers
field name strings
nested DTO traversal structures
temporary byte/char arrays

Example:

@PostMapping("/search")
public List<BeerResponse> search(@RequestBody SearchRequest request) {
    return beerService.search(request);
}

If this endpoint returns thousands of beers, the cost is not only database time. It is also heap pressure from the returned object graph and serialization process. Use pagination, streaming responses when appropriate, response-size limits, and avoid returning unnecessary fields.

Hibernate Persistence Context

Hibernate does not just return objects from the database it also manages them. Inside a transaction, Hibernate keeps entities in the persistence context for identity tracking and dirty checking. For example:

@Transactional
public void processBeers() {
List<Beer> beers = beerRepository.findAll();

    for (Beer beer : beers) {
        process(beer);
    }
}

This can retain a large number of entities until the transaction ends. The persistence context may hold:

entity instances
entity snapshots
proxy objects
collections
dirty-checking data
relationship graphs

For small requests, this is fine. For large reads or batch operations, it can create old-generation pressure. Safer patterns:

pagination
read-only transactions
smaller transaction boundaries
DTO projections
clearing the persistence context in batches
avoiding large findAll operations

Example:

entityManager.flush();
entityManager.clear();

The key idea is that a large Hibernate session can keep objects alive longer than your code suggests.

N+1 Queries Can Create Excessive Entities

N+1 queries are usually discussed as a database performance problem, but they are also a memory problem. Example:

List<Beer> beers = beerRepository.findAll();

for (Beer beer : beers) {
    beer.getBreweries().size(); // may trigger lazy loading
}

This can create:

many SQL queries
many entity objects
many Hibernate proxies
many collections
large object graphs
extra persistence context entries

So the cost is not only too many database round trips it is also too many Java objects allocated and retained. Fixes may include:

fetch joins
entity graphs
DTO projections
batch fetching
pagination
query-specific data loading

A backend engineer should understand both sides: database impact and heap impact.

Large Result Sets

This is one of the simplest ways to create GC pressure. Example:

@GetMapping("/beers")
public List<BeerResponse> getBeers() {
    return beerRepository.findAll()
        .stream()
        .map(beerMapper::toResponse)
        .toList();
}

This may allocate:

all beer entities
related entities
DTOs
lists
strings
JSON serialization buffers

Large result sets are dangerous because they create memory pressure across multiple layers:

database driver
Hibernate
DTO mapper
Jackson
HTTP response buffer

A production-safe API should usually use:

pagination
limits
filters
cursor-based pagination
streaming for specific use cases

Logging Full Payloads

Logging is often underestimated. Example:

log.info("Request: {}", request);
log.info("Response: {}", response);

This can allocate large strings, especially when DTOs have generated toString() methods that include nested objects. Problems with full-payload logging:

large string allocations
sensitive data exposure
slower request processing
larger log volume
more pressure on logging appenders
possible async logging queue growth

This is worse when logging happens on every request. Prefer:

request ID
user/tenant ID
endpoint
status code
duration
small business identifiers
error code
payload size

Instead of logging the whole object graph, log the useful identifiers.

Validation Frameworks

Bean Validation is convenient and common in Spring Boot APIs. Example:

@PostMapping("/beers")
public BeerResponse create(@Valid @RequestBody CreatebeerRequest request) {
    return beerService.create(request);
}

Validation may allocate:

constraint violation objects
property paths
message templates
interpolated messages
metadata lookups
temporary collections

Usually this is fine, it becomes more relevant when:

payloads are large
nested validation is deep
many invalid requests arrive
validation runs in hot paths
custom validators allocate heavily
error responses include many details

Validation should be clear and useful, but avoid creating huge validation responses for massive payloads.

Reflection and Proxy-Heavy Frameworks

Spring Boot applications use reflection, dynamic proxies, annotations, and generated infrastructure heavily. Examples:

Spring AOP proxies
@Transactional proxies
security proxies
repository proxies
reflection-based binding
annotation scanning
method interceptors
metrics/tracing instrumentation

Most of this cost is acceptable and often paid during startup or cached internally. But in request paths, proxies and interceptors can still add allocation and call overhead. This matters especially in:

very high-QPS endpoints
deep service call chains
heavy AOP usage
reflection-heavy mappers
dynamic serialization/deserialization
excessive instrumentation

Again, the point is not to avoid Spring. The point is to know that abstractions are not free.

How This Connects to GC

A typical Spring Boot request may look like this:

Spring MVC objects
Jackson request DTOs
validation objects
service-layer allocations
Hibernate entities/proxies
DTO mapping
Jackson response serialization
logging/tracing/metrics
GC later cleans temporary objects

This is why GC behavior is connected to everyday backend code. If traffic increases, all of these allocations increase too. If payloads get larger, object graphs get larger. If Hibernate loads too much data, old-generation pressure increases. If logs include full payloads, string allocation increases. If queues or sessions retain these objects, they stop being temporary.

Conclusion

Garbage collection issues in Java backend systems are rarely just “GC problems.” More often, they are symptoms of how the application allocates, retains, and processes objects under real production load.

For backend engineers, the important skill is not memorizing every GC algorithm. It is understanding the connection between application code and runtime behavior: request allocation patterns, object lifetime, old-generation growth, p99 latency, container memory limits, and observability data. A strong Java developer knows that GC tuning should come after evidence. Before changing JVM flags, look at allocation rate, GC logs, heap usage after collection, live set growth, thread pools, database pools, and request latency.

The core lesson is simple, GC performance starts in application design, not in JVM flags. If most request-scoped objects die young, caches and queues are bounded, ThreadLocals are cleaned up, large payloads are controlled, and GC metrics are correlated with production behavior, the JVM can do its job efficiently :)