Improve histogram, summary performance under contention by striping observationCount #1794

jack-berg · 2026-01-21T04:47:33Z

Was working on improving the performance of opentelemetry-java metrics under high contention, and realized that the same strategy I identified to help over there helps for the prometheus implementation as well!

The idea here is recognizing that Buffer.observationCount is the bottleneck under contention. In contrast to the other histogram / summary LongAdder fields, Buffer.observationCount is AtomicLong which performs much worse than LongAdder under high contention. Its necessary that the type is AtomicLong because the CAS APIs accommodate the two way communication that the record / collect paths need to signal that a collection has started and all records have successfully completed (preventing partial writes).

However, we can "have our cake and eat it to" by striping Buffer.observationCount into many instances, such that the contention on any instance is reduced. This is actually what LongAdder does under the covers. This implementation stripes it into Runtime.getRuntime().availableProcessors() instances, and uses Thread.currentThread().getId()) % stripedObservationCounts.length to select which instance any particular record thread should use.

Performance increase is substantial. Here's the before and after of HistogramBenchmark on my machine (Apple M4 Mac Pro w/ 48gb RAM):

Before:

Benchmark                                     Mode  Cnt      Score      Error  Units
HistogramBenchmark.openTelemetryClassic      thrpt   25   1138.465 ±  165.921  ops/s
HistogramBenchmark.openTelemetryExponential  thrpt   25    677.483 ±   28.765  ops/s
HistogramBenchmark.prometheusClassic         thrpt   25   5126.048 ±  153.878  ops/s
HistogramBenchmark.prometheusNative          thrpt   25   3854.323 ±  107.789  ops/s
HistogramBenchmark.simpleclient              thrpt   25  13285.351 ± 1784.506  ops/s

After:

Benchmark                                     Mode  Cnt      Score      Error  Units
HistogramBenchmark.openTelemetryClassic      thrpt   25    925.528 ±   13.744  ops/s
HistogramBenchmark.openTelemetryExponential  thrpt   25    584.404 ±   32.762  ops/s
HistogramBenchmark.prometheusClassic         thrpt   25  14623.971 ± 2117.588  ops/s
HistogramBenchmark.prometheusNative          thrpt   25   7405.672 ±  857.611  ops/s
HistogramBenchmark.simpleclient              thrpt   25  13102.822 ± 3081.096  ops/s

…bservationCount Signed-off-by: Jack Berg <34418638+jack-berg@users.noreply.github.com>

jack-berg force-pushed the stripe-buffer-observation-counts branch from 8cd7e50 to 4c2146c Compare January 21, 2026 14:35

Improve histogram, summary performance under contention by striping o…

ad1e7cb

…bservationCount Signed-off-by: Jack Berg <34418638+jack-berg@users.noreply.github.com>

jack-berg force-pushed the stripe-buffer-observation-counts branch from 4c2146c to ad1e7cb Compare January 21, 2026 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve histogram, summary performance under contention by striping observationCount #1794

Improve histogram, summary performance under contention by striping observationCount #1794

jack-berg commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve histogram, summary performance under contention by striping observationCount #1794

Are you sure you want to change the base?

Improve histogram, summary performance under contention by striping observationCount #1794

Conversation

jack-berg commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant