LAST/FIRST/TAKE aggregation should support TEXT type and Scripts#5091
LAST/FIRST/TAKE aggregation should support TEXT type and Scripts#5091LantaoJin wants to merge 9 commits intoopensearch-project:mainfrom
Conversation
…ation Signed-off-by: Lantao Jin <ltjin@amazon.com>
📝 WalkthroughSummary by CodeRabbit
WalkthroughCentralized TopHits construction and parsing; added script-sort and alias handling; enabled TopHits size pushdown; switched field-fetching between Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client Query
participant AA as AggregateAnalyzer
participant PA as PredicateAnalyzer
participant TBuild as TopHitsBuilder
participant OS as OpenSearch API
participant Parser as TopHitsParser
Client->>AA: Request plan for FIRST/LAST/TAKE
AA->>PA: Ask for script sort type when needed
PA-->>AA: Return ScriptSortType
AA->>TBuild: createTopHitsBuilder(size, sort, script, singleCol)
TBuild->>TBuild: Configure sort/_source/fields/size/scriptFields
AA->>OS: Push OpenSearch aggregation (top_hits) request
OS-->>Parser: Return aggregation response (may include _source or fields)
Parser->>Parser: Choose _source vs fields, extract leaf values or maps
Parser-->>Client: Aggregation result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
fetchSource API instead of fetchFields for LAST/FIRST/TAKE aggregationfetchSource API instead of fetchFields for LAST/FIRST/TAKE aggregation on TEXT type
fetchSource API instead of fetchFields for LAST/FIRST/TAKE aggregation on TEXT typeSigned-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAliasFieldAggregationIT.java`:
- Line 171: Remove the leftover debug print System.out.println(actual) from the
test method in CalciteAliasFieldAggregationIT; locate the occurrence (the
System.out.println call printing the variable actual) and delete it (or replace
with an appropriate assertion or logger call if output is required for test
diagnostics) so the test contains no debug println statements.
In
`@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java`:
- Around line 314-381: Add unit tests covering null/empty/invalid inputs for the
new text/script FIRST/LAST/TAKE functionality: create tests alongside
testFirstAggregationOnTextField, testLastAggregationOnTextField,
testFirstLastByGroupOnTextField, testFirstLastWithOtherAggregationsOnTextField
and testFirstLastMixedFields that (1) query rows where the text field (e.g.,
employer/email) contains NULL and assert FIRST/LAST return nulls via
verifySchema/verifyDataRows, (2) run queries that yield empty result sets and
assert schema is correct and verifyDataRows handles zero rows, and (3) run
invalid TAKE sizes (0 and negative) and assert the query fails with the expected
error (use executeQuery and assert exception or error message). Ensure tests use
the same helper methods (executeQuery, verifySchema, verifyDataRows) and include
descriptive names like testFirstWithNullTextField, testLastEmptyResultSet,
testTakeInvalidSize.
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java`:
- Around line 842-848: The sort type is being derived from the whole row type
(helper.rowType) rather than the specific script expression type, so change the
call to getScriptSortType to use the expression's return type for rex.getKey():
obtain the expression type (e.g., from helper.inferScript(rex.getKey())'s
metadata or rex.getKey().getType()) and pass that type into getScriptSortType
instead of helper.rowType, then use that resulting
ScriptSortBuilder.ScriptSortType when calling
builder.sort(SortBuilders.scriptSort(...)); update any variable names
accordingly (referencing getScriptSortType, helper.inferScript(rex.getKey()),
rex.getKey(), builder.sort and ScriptSortBuilder.ScriptSortType).
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java`:
- Around line 477-484: Rename the mistakenly named pattern variable
hotHitsAggBuilder to topHitsAggBuilder in AggPushDownAction where you check
builder instanceof TopHitsAggregationBuilder; update the instanceof declaration
and all subsequent references in that block (e.g., the size() comparisons and
hotHitsAggBuilder.size(size) call) so they use topHitsAggBuilder to match the
TopHitsAggregationBuilder class name.
🧹 Nitpick comments (3)
integ-test/src/test/resources/alias.json (1)
2-6: Consider adding an edge-case record for text/date fields.The new values look realistic and match ISO-8601, but if alias tests need boundary coverage, consider one doc with an empty/whitespace-only text or a non-UTC offset timestamp.
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java (1)
485-486: Update the comment to include TopHits.The comment on line 485 should be updated to reflect that TopHits is now also supported.
📝 Suggested update
- // now we only have Composite, Terms and MultiTerms bucket aggregations, + // now we only have Composite, Terms, MultiTerms bucket aggregations, and TopHits metric aggregation, // add code here when we could support more in the future.opensearch/src/main/java/org/opensearch/sql/opensearch/response/agg/TopHitsParser.java (1)
150-156: Consider handling List type in getLeafValue.The recursive
getLeafValuemethod handles nestedMapvalues but doesn't account forListvalues. If the source contains a list (e.g., multi-value fields), this would return the list itself rather than extracting a leaf value.Is this the intended behavior? If multi-value fields are expected in the source, you may need to handle
Listsimilarly:private Object getLeafValue(Object object) { if (object instanceof Map map) { return getLeafValue(map.values().iterator().next()); } else if (object instanceof List list && !list.isEmpty()) { return getLeafValue(list.get(0)); } else { return object; } }
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAliasFieldAggregationIT.java
Outdated
Show resolved
Hide resolved
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLAggregationIT.java
Show resolved
Hide resolved
...arch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/AggPushDownAction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java`:
- Around line 844-851: The script-based sort added in AggregateAnalyzer (inside
the branch handling rex.getKey() instanceof RexCall || RexLiteral) currently
calls builder.sort(SortBuilders.scriptSort(script, sortType)) without applying
the requested sortOrder; update this branch to apply the same sortOrder used for
keyword sorts by setting the order on the ScriptSortBuilder (use the existing
sortOrder variable) before passing it to builder.sort, so script sorts honor
ascending/descending like the keyword path.
🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAliasFieldAggregationIT.java (1)
158-193: Consider adding schema assertions for the new alias aggregation mix.
Data-row checks alone won’t catch type regressions across text/alias/numeric outputs; averifySchemaInOrder(...)would tighten coverage.
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
Show resolved
Hide resolved
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Description
This PR includes following fixings:
LAST/FIRST/TAKEaggregations onTEXTtype should use root name:fetchFieldcan be used onTEXTtype, but only with root namefetchScriptAPI instead offetchFieldforLAST/FIRST/TAKEaggregations on Script:fetchFieldcannot be used in script pushdown, for example| eval new_name = upper(name) | stats last(new_name)LAST/FIRST/TAKEaggregations onALIAStype should use alias name:fetchSourcecannot be used in Alias type,fetchFieldwill dedupe the list if we use its original name.Pushdown LIMIT to TopHits Aggregation
Use
fetchFieldinstead offetchSourceas possiblefetchFieldcan return good format ondatetypeRelated Issues
Resolves #5086
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.