-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Support upsert semantics for OFFLINE dimension tables #17536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support upsert semantics for OFFLINE dimension tables #17536
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds upsert semantics to offline dimension tables, allowing later segments to deterministically overwrite earlier rows with the same primary key instead of producing duplicates. The implementation includes configuration validation to prevent conflicting settings.
Changes:
- Added
enableUpsertflag toDimensionTableConfigwith JSON-aware and backwards-compatible constructors - Modified
DimensionTableDataManagerto sort segments by creation time when upsert is enabled and conditionally allow duplicate keys - Added validation in
TableConfigUtilsto reject configurations that enable bothenableUpsertanderrorOnDuplicatePrimaryKey
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/config/table/DimensionTableConfig.java | Adds _enableUpsert field with JSON-aware constructor and backwards-compatible overloads |
| pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java | Adds validation method to reject conflicting upsert and error-on-duplicate settings |
| pinot-segment-local/src/test/java/org/apache/pinot/segment/local/utils/TableConfigUtilsTest.java | Tests validation of invalid dimension table configuration combining upsert and error-on-duplicate |
| pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java | Implements upsert logic with segment sorting and conditional duplicate-key error handling |
| pinot-core/src/test/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManagerTest.java | Adds test helper and tests for upsert behavior across segments, updates existing tests with new parameter |
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
Outdated
Show resolved
Hide resolved
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/TableConfigUtils.java
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17536 +/- ##
============================================
+ Coverage 63.21% 63.22% +0.01%
Complexity 1476 1476
============================================
Files 3170 3170
Lines 189508 189552 +44
Branches 28997 29002 +5
============================================
+ Hits 119789 119845 +56
+ Misses 60417 60399 -18
- Partials 9302 9308 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
|
|
||
| private File createSegmentFromCsv(File csvFile, TableConfig tableConfig, Schema schema, String segmentName) | ||
| throws Exception { | ||
| File tableDataDir = new File(TEMP_DIR, OFFLINE_TABLE_NAME + "_upsert"); |
Copilot
AI
Jan 21, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hardcoded suffix '_upsert' in the directory name could cause collisions if createSegmentFromCsv is called multiple times in the same test or across tests. Consider using a unique directory name per invocation, such as appending the segment name or a timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Motivation
Description
enableUpsertflag toDimensionTableConfigwith a JSON-aware constructor, backwards-compatible constructors, and anisEnableUpsert()getter inpinot-spi.enableUpsertintoDimensionTableDataManagerby adding_enableUpsert, reading it fromDimensionTableConfig, and changing duplicate-key handling to only throw when!_enableUpsert && _errorOnDuplicatePrimaryKeyin both fast-lookup and memory-optimized loading paths.sortSegmentsForUpsert(...)which sorts byindexCreationTimethensegmentNameso later segments overwrite earlier ones.validateDimensionTableConfig(...)inTableConfigUtils.validateto reject configs that enable bothenableUpsertanderrorOnDuplicatePrimaryKeysimultaneously.DimensionTableDataManagerTest(includingtestUpsertOverwritesDuplicatePrimaryKeyand newtestUpsertDedupesAcrossSegments) and add a validation test inTableConfigUtilsTest; also addcreateSegmentFromCsvtest helper and update existing tests to pass the new flag.Testing
DimensionTableDataManagerTest#testUpsertOverwritesDuplicatePrimaryKeyandDimensionTableDataManagerTest#testUpsertDedupesAcrossSegments, plus an invalid-config check inTableConfigUtilsTest; these cover overwrite semantics and invalid config detection.mvnruns or CI) were executed as part of this change.