Skip to content

Conversation

@gniadeck
Copy link

@gniadeck gniadeck commented Dec 14, 2025

introduces a Prometheus histogram metric jvm_gc_duration_seconds to record JVM garbage collection pause durations. existing metrics provide limited visibility into long GC pauses, making it difficult to detect latency spikes. this change leverages the already registered GC notifications (as used in JvmMemoryPoolAllocationMetrics) to capture pause durations without additional instrumentation.

the histogram uses 0.01, 0.1, 1, 10 buckets and includes labels for gc name, action, and cause, enabling detailed monitoring of both short and long GC pauses. this addresses the lack of visibility highlighted in community discussions such as this one. buckets are also defined according to the opentelemetry semantic conventions spec

Example result:

# HELP jvm_gc_duration_seconds JVM GC pause duration histogram.
# TYPE jvm_gc_duration_seconds histogram
jvm_gc_duration_seconds_bucket{action="end of minor GC",cause="G1 Evacuation Pause",gc="G1 Young Generation",le="0.01"} 806
jvm_gc_duration_seconds_bucket{action="end of minor GC",cause="G1 Evacuation Pause",gc="G1 Young Generation",le="0.1"} 806
jvm_gc_duration_seconds_bucket{action="end of minor GC",cause="G1 Evacuation Pause",gc="G1 Young Generation",le="1.0"} 806
jvm_gc_duration_seconds_bucket{action="end of minor GC",cause="G1 Evacuation Pause",gc="G1 Young Generation",le="10.0"} 806
jvm_gc_duration_seconds_bucket{action="end of minor GC",cause="G1 Evacuation Pause",gc="G1 Young Generation",le="+Inf"} 806
jvm_gc_duration_seconds_count{action="end of minor GC",cause="G1 Evacuation Pause",gc="G1 Young Generation"} 806
jvm_gc_duration_seconds_sum{action="end of minor GC",cause="G1 Evacuation Pause",gc="G1 Young Generation"} 0.7360000000000005

@gniadeck gniadeck force-pushed the gniadeck/gc-pause-histogram branch from 0dabc29 to 208482b Compare December 14, 2025 13:28
@zeitlinger
Copy link
Member

would it make sense to implement exactly like otel sem conv?

@gniadeck
Copy link
Author

hi, to be honest I used the sem conv just for a reference - I wasn't sure what buckets to set so the Histogram is useful for all of the library's users, and it turns out that there was already an exact same discussion in otel community :)

This PR is motivated by a real need to alert in case long GC cycle happenes and to improve the visibility of the STW pause durations. I was considering simply implementing this metric on my site, but I thought it might be worth to suggest it as a change to the library, so that others could benefit too :D

@zeitlinger
Copy link
Member

hi, to be honest I used the sem conv just for a reference - I wasn't sure what buckets to set so the Histogram is useful for all of the library's users, and it turns out that there was already an exact same discussion in otel community :)

I think it's a great addition to this library!

I know that OTel spend a lot of time figuring out the how to name metrics and labels - so if at all possible, I'd like to do it the same way, so that users can reuse queries and dashboards.

You can try to point your favorite AI tool to the semantic conventions to get you most of the work done.

@gniadeck
Copy link
Author

i'm happy you also see value in this :) i refactored the metric so it aligns with otel conventions - i changed the name to jvm_gc_duration, labels to jvm_gc_action, jvm_gc_name, jvm_gc_cause, and needed to remove the unit declaration. this is how it looks now:

# HELP jvm_gc_duration Duration of JVM garbage collection actions.
# TYPE jvm_gc_duration histogram
jvm_gc_duration_bucket{jvm_gc_action="end of minor GC",jvm_gc_cause="G1 Evacuation Pause",jvm_gc_name="G1 Young Generation",le="0.01"} 208
jvm_gc_duration_bucket{jvm_gc_action="end of minor GC",jvm_gc_cause="G1 Evacuation Pause",jvm_gc_name="G1 Young Generation",le="0.1"} 208
jvm_gc_duration_bucket{jvm_gc_action="end of minor GC",jvm_gc_cause="G1 Evacuation Pause",jvm_gc_name="G1 Young Generation",le="1.0"} 208
jvm_gc_duration_bucket{jvm_gc_action="end of minor GC",jvm_gc_cause="G1 Evacuation Pause",jvm_gc_name="G1 Young Generation",le="10.0"} 208
jvm_gc_duration_bucket{jvm_gc_action="end of minor GC",jvm_gc_cause="G1 Evacuation Pause",jvm_gc_name="G1 Young Generation",le="+Inf"} 208
jvm_gc_duration_count{jvm_gc_action="end of minor GC",jvm_gc_cause="G1 Evacuation Pause",jvm_gc_name="G1 Young Generation"} 208
jvm_gc_duration_sum{jvm_gc_action="end of minor GC",jvm_gc_cause="G1 Evacuation Pause",jvm_gc_name="G1 Young Generation"} 0.22500000000000017

Signed-off-by: gniadeck <77535280+gniadeck@users.noreply.github.com>
Signed-off-by: gniadeck <77535280+gniadeck@users.noreply.github.com>
Signed-off-by: gniadeck <77535280+gniadeck@users.noreply.github.com>
Signed-off-by: gniadeck <77535280+gniadeck@users.noreply.github.com>
@gniadeck gniadeck force-pushed the gniadeck/gc-pause-histogram branch from ffd3ad4 to 042e03c Compare December 17, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants