-
Notifications
You must be signed in to change notification settings - Fork 936
Split out cumulative vs. delta storage #8015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Split out cumulative vs. delta storage #8015
Conversation
| aggregatorHolder.activeRecordingThreads.addAndGet(-2); | ||
| } | ||
| } while (true); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key work that cumulative instruments get to avoid.
On each collection, delta instruments substitute out the ConcurrentHashMap which holds AggregatorHandle instances corresponding to each series. In order to prevent lost write scenarios of the following sequence, delta requires additional coordination between collection and recording threads:
- T1: record thread grabs instance 1 of ConcurrentHashMap
- T2: collect thread obtains instance 1 of ConcurrentHashMap, and collects from all series
- T3: collect thread updates the ConcurrentHashMap to a new instance 2
- T4: record thread records to instance 1 of ConcurrentHashMap, which is never seen by any current or future collect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In prom client java, we have 2 sets of variables (for the map in this case) and switch between them using an atomic. I wonder if we could apply this pattern here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite involved, but you can ask the AI to explain you how it works: https://github.com/prometheus/client_java/blob/main/prometheus-metrics-core/src/main/java/io/prometheus/metrics/core/metrics/Buffer.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm familiar with how the prometheus works and will be incrementally applying techniques where useful / applicable - I'm trying to optimize the perf over there as well. The differences between prom and otel boil down to otel needing to accommodate all combinations of memory mode (immutable, reusable) and temporality (cumulative, delta), where prom client java only has the equivalent of immutable memory mode and cumulative temporality.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8015 +/- ##
=========================================
Coverage 90.17% 90.18%
+ Complexity 7479 7456 -23
=========================================
Files 834 834
Lines 22540 22569 +29
Branches 2236 2241 +5
=========================================
+ Hits 20326 20353 +27
+ Misses 1511 1510 -1
- Partials 703 706 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…y-java into split-sync-metric-storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Refactors synchronous metric storage to separate cumulative vs. delta aggregation temporality implementations, aiming to simplify logic and improve performance under contention (especially for cumulative).
Changes:
- Replaced monolithic
DefaultSynchronousMetricStorageimplementation with distinct delta and cumulative implementations selected via a factory method. - Updated
SynchronousMetricStorage.createto delegate to the newDefaultSynchronousMetricStorage.createfactory. - Updated unit tests to construct storage via the factory and adjust internal assertions accordingly.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java | Splits storage logic into delta vs. cumulative implementations and introduces a factory constructor path. |
| sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/state/SynchronousMetricStorage.java | Switches to using the new storage factory rather than directly instantiating the default storage. |
| sdk/metrics/src/test/java/io/opentelemetry/sdk/metrics/internal/state/SynchronousMetricStorageTest.java | Updates tests to use the new factory and adapts assertions around pooling/reset behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assertThat(storage) | ||
| .extracting("aggregatorHandlePool", as(collection(AggregatorHandle.class))) | ||
| .hasSize(0); |
Copilot
AI
Jan 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These assertions reach into a private implementation detail via reflective field extraction ("aggregatorHandlePool"). This is brittle (renames/refactors won’t be caught by the compiler) and leads to repeated boilerplate throughout the test. Consider reintroducing a small test hook on DefaultSynchronousMetricStorage (e.g., a package-private/VisibleForTesting method that returns the pool size or an optional pool) so tests can assert pooling behavior without relying on reflection, or alternatively assert pooling indirectly via createHandle() call counts and remove the pool-size checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aggregatorHandlePool field now only exists on the delta storage, and so tests would need to cast to the storage to delta prior to invoking the visible-for-testing method. I don't want to move aggregatorHandlePool down to the base class DefaultSynchronousMetricStorage because its not a concern for cumulative.
Split our
DefaultSynchronousMetricStorageinto delta and cumulative implementations. This serves two purposes:if (aggregationTemporality == x & memoryMode = y) {..}style special handling. Its hard to keep it all straight when its all mashed together.Before (From #8000):
After:
I calculate ~3x improvement for cumulative instruments under contention (i.e. 4 thread cases) and more modest ~1.2x improvement for cumulative instruments without contention (i.e. 1 thread cases).