Support fetch_size API for PPL#5109
Conversation
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds PPL Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant REST
participant PPLService
participant Request
participant Builder
participant Planner
Client->>REST: POST PPL request (JSON body and/or URL) with fetch_size
REST->>REST: if url fetch_size && body lacks it → inject into payload
REST->>PPLService: create/forward PPLQueryRequest
PPLService->>Request: parse jsonContent
Request-->>PPLService: getFetchSize()
PPLService->>Builder: plan(context with fetchSize)
Builder->>Builder: build AST → rawPlan
alt fetchSize > 0
Builder->>Builder: wrap rawPlan with Head(fetchSize)
end
Builder->>Planner: finalize plan (addSelectAll, explain/exec)
Planner->>Client: return results limited by Head (no cursor)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: Kai Huang <ahkcs@amazon.com> # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
688ea8d to
6470b8e
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/ast/statement/Query.java`:
- Around line 27-31: Add a JavaDoc comment to the public constructor
Query(UnresolvedPlan plan, int fetchSize, QueryType queryType) in class Query
describing the constructor purpose and include `@param` tags for each parameter
(plan, fetchSize, queryType) and an optional `@see` or `@throws` if applicable;
place the comment immediately above the constructor signature to follow project
documentation guidelines.
In `@ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java`:
- Around line 106-116: The getFetchSize method currently accepts negative
fetch_size values via jsonContent.optInt(FETCH_SIZE_FIELD, 0); add validation to
reject negative values: after reading fetchSize, if fetchSize < 0 throw an
IllegalArgumentException with a clear message (e.g., "fetch_size must be
non-negative"), and keep the existing upper-bound check against MAX_FETCH_SIZE;
ensure you reference FETCH_SIZE_FIELD, MAX_FETCH_SIZE and jsonContent.optInt in
your change.
In `@ppl/src/test/java/org/opensearch/sql/ppl/domain/PPLQueryRequestTest.java`:
- Around line 60-94: Add a test that verifies an explicit JSON null fetch_size
returns 0: in PPLQueryRequestTest create a test method (e.g.,
testGetFetchSizeHandlesExplicitNull) that builds a JSONObject with
"query":"source=t" and sets "fetch_size" to JSONObject.NULL (or inserts a
null-valued field), construct a PPLQueryRequest with that JSON, and assert that
request.getFetchSize() == 0; reference PPLQueryRequest and getFetchSize to
locate the behavior under test.
🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1)
2551-2555: Use EXPLAIN_API_ENDPOINT constant to avoid hardcoded path.
Keeps endpoint usage consistent with other tests and reduces string duplication.♻️ Suggested refactor
+import static org.opensearch.sql.plugin.rest.RestPPLQueryAction.EXPLAIN_API_ENDPOINT; ... - Request request = - new Request( - "POST", - String.format( - "/_plugins/_ppl/_explain?format=%s&mode=%s", Format.YAML, ExplainMode.STANDARD)); + Request request = + new Request( + "POST", + String.format( + EXPLAIN_API_ENDPOINT + "?format=%s&mode=%s", Format.YAML, ExplainMode.STANDARD));
| public int getFetchSize() { | ||
| if (jsonContent == null) { | ||
| return 0; | ||
| } | ||
| int fetchSize = jsonContent.optInt(FETCH_SIZE_FIELD, 0); | ||
| if (fetchSize > MAX_FETCH_SIZE) { | ||
| throw new IllegalArgumentException( | ||
| String.format( | ||
| Locale.ROOT, "fetch_size must be less than or equal to %d", MAX_FETCH_SIZE)); | ||
| } | ||
| return fetchSize; |
There was a problem hiding this comment.
Validate fetch_size lower bound (non-negative).
Line 110 allows negative values, which can silently bypass intended limits and produce confusing behavior. Please reject negative inputs.
🔧 Proposed fix
int fetchSize = jsonContent.optInt(FETCH_SIZE_FIELD, 0);
+ if (fetchSize < 0) {
+ throw new IllegalArgumentException("fetch_size must be non-negative");
+ }
if (fetchSize > MAX_FETCH_SIZE) {
throw new IllegalArgumentException(
String.format(
Locale.ROOT, "fetch_size must be less than or equal to %d", MAX_FETCH_SIZE));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public int getFetchSize() { | |
| if (jsonContent == null) { | |
| return 0; | |
| } | |
| int fetchSize = jsonContent.optInt(FETCH_SIZE_FIELD, 0); | |
| if (fetchSize > MAX_FETCH_SIZE) { | |
| throw new IllegalArgumentException( | |
| String.format( | |
| Locale.ROOT, "fetch_size must be less than or equal to %d", MAX_FETCH_SIZE)); | |
| } | |
| return fetchSize; | |
| public int getFetchSize() { | |
| if (jsonContent == null) { | |
| return 0; | |
| } | |
| int fetchSize = jsonContent.optInt(FETCH_SIZE_FIELD, 0); | |
| if (fetchSize < 0) { | |
| throw new IllegalArgumentException("fetch_size must be non-negative"); | |
| } | |
| if (fetchSize > MAX_FETCH_SIZE) { | |
| throw new IllegalArgumentException( | |
| String.format( | |
| Locale.ROOT, "fetch_size must be less than or equal to %d", MAX_FETCH_SIZE)); | |
| } | |
| return fetchSize; | |
| } |
🤖 Prompt for AI Agents
In `@ppl/src/main/java/org/opensearch/sql/ppl/domain/PPLQueryRequest.java` around
lines 106 - 116, The getFetchSize method currently accepts negative fetch_size
values via jsonContent.optInt(FETCH_SIZE_FIELD, 0); add validation to reject
negative values: after reading fetchSize, if fetchSize < 0 throw an
IllegalArgumentException with a clear message (e.g., "fetch_size must be
non-negative"), and keep the existing upper-bound check against MAX_FETCH_SIZE;
ensure you reference FETCH_SIZE_FIELD, MAX_FETCH_SIZE and jsonContent.optInt in
your change.
ppl/src/test/java/org/opensearch/sql/ppl/domain/PPLQueryRequestTest.java
Show resolved
Hide resolved
|
|
||
| private static final String DEFAULT_PPL_PATH = "/_plugins/_ppl"; | ||
| private static final String FETCH_SIZE_FIELD = "fetch_size"; | ||
| private static final int MAX_FETCH_SIZE = 10000; |
There was a problem hiding this comment.
suggestion: This limit could probably be based on the index.max_result_window setting. https://docs.opensearch.org/latest/install-and-configure/configuring-opensearch/index-settings/#dynamic-index-level-index-settings
If we have this 10k limit artificially defined for our plugin elsewhere, we should reuse that constant instead of making a new one (it might just be the max size setting).
There was a problem hiding this comment.
Good catch. Removed the hardcoded MAX_FETCH_SIZE = 10000 — the effective cap is already enforced by LogicalSystemLimit, which uses the plugins.query.size_limit cluster setting. This way it also respects custom configurations.
| * @return fetch_size value from request, or 0 if not specified (meaning use system default) | ||
| * @throws IllegalArgumentException if fetch_size exceeds MAX_FETCH_SIZE (10000) | ||
| */ | ||
| public int getFetchSize() { |
There was a problem hiding this comment.
suggestion: is might also be worth supporting a size url parameter?
Similar to how core lets you supply both ?size=n or "size": n in requests. Also similar to how we support specifying response formats in both the URL and the body.
There was a problem hiding this comment.
Added. fetch_size is now supported as a URL parameter (e.g., POST /_plugins/_ppl?fetch_size=5)
Signed-off-by: Kai Huang <ahkcs@amazon.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java`:
- Around line 107-174: Add boundary and invalid-input tests to FetchSizeIT:
create three new `@Test` methods using executeQueryWithFetchSize/executeQuery —
one that calls executeQueryWithFetchSize(String.format("source=%s",
TEST_INDEX_BANK), 10000) and asserts behavior when fetch_size==10000 (e.g.,
returns all available docs or capped at 10000 using
result.getJSONArray("datarows").length()), one that calls with fetch_size
greater than max (e.g., 10001) and asserts an error/validation response (check
for a non-2xx response or error field in returned JSONObject), and one that
calls with a negative fetch_size (e.g., -1) and asserts an error/validation
response; reference executeQueryWithFetchSize, executeQuery,
TEST_INDEX_BANK/TEST_INDEX_ACCOUNT and ensure each test is annotated with `@Test`
and checks the JSONObject for the expected success or error condition.
| @Test | ||
| public void testFetchSizeZeroReturnsAllResults() throws IOException { | ||
| // fetch_size=0 should be treated as "no limit" (use system default) | ||
| JSONObject result = executeQueryWithFetchSize(String.format("source=%s", TEST_INDEX_BANK), 0); | ||
| JSONArray dataRows = result.getJSONArray("datarows"); | ||
| // Bank index has 7 documents | ||
| assertEquals(7, dataRows.length()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testFetchSizeLargerThanDataset() throws IOException { | ||
| // When fetch_size is larger than the dataset, return all available results | ||
| JSONObject result = | ||
| executeQueryWithFetchSize(String.format("source=%s", TEST_INDEX_BANK), 1000); | ||
| JSONArray dataRows = result.getJSONArray("datarows"); | ||
| // Bank index has 7 documents, so we should get 7, not 1000 | ||
| assertEquals(7, dataRows.length()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testFetchSizeOne() throws IOException { | ||
| JSONObject result = | ||
| executeQueryWithFetchSize( | ||
| String.format("source=%s | fields firstname", TEST_INDEX_ACCOUNT), 1); | ||
| JSONArray dataRows = result.getJSONArray("datarows"); | ||
| assertEquals(1, dataRows.length()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testFetchSizeWithStats() throws IOException { | ||
| // Stats aggregation - fetch_size should still apply to aggregation results | ||
| JSONObject result = | ||
| executeQueryWithFetchSize( | ||
| String.format("source=%s | stats count() by gender", TEST_INDEX_ACCOUNT), 100); | ||
| JSONArray dataRows = result.getJSONArray("datarows"); | ||
| // Stats by gender should return 2 rows (M and F) | ||
| assertEquals(2, dataRows.length()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testFetchSizeWithHead() throws IOException { | ||
| // Both head command and fetch_size - the smaller limit should win | ||
| // head 3 limits to 3, fetch_size 10 would allow 10, so we get 3 | ||
| JSONObject result = | ||
| executeQueryWithFetchSize( | ||
| String.format("source=%s | head 3 | fields firstname", TEST_INDEX_ACCOUNT), 10); | ||
| JSONArray dataRows = result.getJSONArray("datarows"); | ||
| assertEquals(3, dataRows.length()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testFetchSizeSmallerThanHead() throws IOException { | ||
| // fetch_size smaller than head - fetch_size should further limit | ||
| // head 100 would return 100, but fetch_size 5 limits to 5 | ||
| JSONObject result = | ||
| executeQueryWithFetchSize( | ||
| String.format("source=%s | head 100 | fields firstname", TEST_INDEX_ACCOUNT), 5); | ||
| JSONArray dataRows = result.getJSONArray("datarows"); | ||
| assertEquals(5, dataRows.length()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testWithoutFetchSizeReturnsDefaultBehavior() throws IOException { | ||
| // Without fetch_size, should return results up to system default | ||
| JSONObject result = executeQuery(String.format("source=%s", TEST_INDEX_BANK)); | ||
| JSONArray dataRows = result.getJSONArray("datarows"); | ||
| assertEquals(7, dataRows.length()); | ||
| } |
There was a problem hiding this comment.
Add max/invalid fetch_size coverage (negative and > max).
Current tests cover 0/1 and “larger than dataset,” but don’t exercise the max bound (10,000) or invalid inputs. Add explicit tests for fetch_size = 10000, > 10000, and negative values to satisfy boundary/error coverage.
✅ Suggested additions
@@
import org.opensearch.client.Request;
import org.opensearch.client.RequestOptions;
import org.opensearch.client.Response;
+import org.opensearch.client.ResponseException;
@@
public void testFetchSizeLargerThanDataset() throws IOException {
// When fetch_size is larger than the dataset, return all available results
JSONObject result =
- executeQueryWithFetchSize(String.format("source=%s", TEST_INDEX_BANK), 1000);
+ executeQueryWithFetchSize(String.format("source=%s", TEST_INDEX_BANK), 10000);
JSONArray dataRows = result.getJSONArray("datarows");
// Bank index has 7 documents, so we should get 7, not 1000
assertEquals(7, dataRows.length());
}
+
+ `@Test`
+ public void testFetchSizeExceedsMaxReturnsError() {
+ ResponseException ex =
+ expectThrows(
+ ResponseException.class,
+ () -> executeQueryWithFetchSize(String.format("source=%s", TEST_INDEX_BANK), 10001));
+ assertEquals(400, ex.getResponse().getStatusLine().getStatusCode());
+ }
+
+ `@Test`
+ public void testFetchSizeNegativeReturnsError() {
+ ResponseException ex =
+ expectThrows(
+ ResponseException.class,
+ () -> executeQueryWithFetchSize(String.format("source=%s", TEST_INDEX_BANK), -1));
+ assertEquals(400, ex.getResponse().getStatusLine().getStatusCode());
+ }As per coding guidelines, "Include boundary condition tests (min/max values, empty inputs) for all new functions" and "Include error condition tests (invalid inputs, exceptions) for all new functions".
🤖 Prompt for AI Agents
In `@integ-test/src/test/java/org/opensearch/sql/ppl/FetchSizeIT.java` around
lines 107 - 174, Add boundary and invalid-input tests to FetchSizeIT: create
three new `@Test` methods using executeQueryWithFetchSize/executeQuery — one that
calls executeQueryWithFetchSize(String.format("source=%s", TEST_INDEX_BANK),
10000) and asserts behavior when fetch_size==10000 (e.g., returns all available
docs or capped at 10000 using result.getJSONArray("datarows").length()), one
that calls with fetch_size greater than max (e.g., 10001) and asserts an
error/validation response (check for a non-2xx response or error field in
returned JSONObject), and one that calls with a negative fetch_size (e.g., -1)
and asserts an error/validation response; reference executeQueryWithFetchSize,
executeQuery, TEST_INDEX_BANK/TEST_INDEX_ACCOUNT and ensure each test is
annotated with `@Test` and checks the JSONObject for the expected success or error
condition.
Signed-off-by: Kai Huang <ahkcs@amazon.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@plugin/src/main/java/org/opensearch/sql/plugin/request/PPLQueryRequestFactory.java`:
- Around line 88-94: The code in PPLQueryRequestFactory that parses
QUERY_PARAMS_FETCH_SIZE from restRequest.params() can throw
NumberFormatException; wrap the
Integer.parseInt(restRequest.params().get(QUERY_PARAMS_FETCH_SIZE)) call in a
try-catch and handle NumberFormatException by throwing a clear bad-request style
exception (e.g., new IllegalArgumentException with a descriptive message about
invalid fetch_size) or otherwise converting it into a 400 response, so the
invalid URL parameter doesn't propagate as a 500; update the block that writes
into jsonContent to only put the parsed int when parsing succeeds.
plugin/src/main/java/org/opensearch/sql/plugin/request/PPLQueryRequestFactory.java
Show resolved
Hide resolved
Signed-off-by: Kai Huang <ahkcs@amazon.com>
docs/user/interfaces/endpoint.rst
Outdated
| "status": 200 | ||
| } | ||
|
|
||
| Fetch Size (PPL) |
There was a problem hiding this comment.
Mark as experimental. We can target prod at 3.7 release.
docs/user/interfaces/endpoint.rst
Outdated
| Description | ||
| ----------- | ||
|
|
||
| PPL also supports the ``fetch_size`` parameter, but with different semantics from SQL. In PPL, ``fetch_size`` limits the number of rows returned in a single, complete response. **PPL does not support cursor-based pagination** — no cursor is returned and there is no way to fetch additional pages. The value of ``fetch_size`` should be greater than ``0``. In absence of ``fetch_size`` or a value of ``0``, it will use the system default behavior (no limit). The effective upper bound is governed by the ``plugins.query.size_limit`` cluster setting (defaults to ``index.max_result_window``, which is 10000). |
There was a problem hiding this comment.
Repharse.
- Introdroduce what is PPL fetch_sisze in description section.
- Move PPL and SQL difference as a notes
system default behavior (no limit)
not correct. plugins.query.size_limit is the limit.
index.max_result_window
this setting is not visible to PPL user.
There was a problem hiding this comment.
Rephrased. Description now introduces PPL fetch_size directly, SQL/PPL differences moved to a note section. Corrected the default behavior — now states result size is governed by plugins.query.size_limit. Removed index.max_result_window reference.
| * Kmeans | ||
| * `show datasources` and command | ||
| * Commands with `fetch_size` parameter | ||
| * SQL commands with `fetch_size` parameter (cursor-based pagination). Note: PPL's `fetch_size` (response size limiting, no cursor) is supported in Calcite Engine. |
There was a problem hiding this comment.
What is SQL commands? Rephrase it.
There was a problem hiding this comment.
Updated to SQL queries
| Request request = new Request("POST", QUERY_API_ENDPOINT + "?fetch_size=10"); | ||
| String jsonBody = | ||
| String.format( | ||
| Locale.ROOT, | ||
| "{\n \"query\": \"source=%s | fields firstname\",\n \"fetch_size\": 3\n}", | ||
| TEST_INDEX_ACCOUNT); |
There was a problem hiding this comment.
- Any reason we support query paramaters and query body?
- It is not documented in API doc.
There was a problem hiding this comment.
URL parameter support was added per @Swiddis's suggestion — #5109 (comment).
For documentation, endpoint.rst covers this: "fetch_size can be specified either as a URL parameter or in the JSON request body. If both are provided, the JSON body value takes precedence." The Example 2 section also demonstrates the URL parameter syntax.
| PPL query:: | ||
|
|
||
| >> curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{ | ||
| "fetch_size" : 5, |
There was a problem hiding this comment.
what if fetch_size larger than plugins.query.size_limit . which one should follow?
There was a problem hiding this comment.
Since our implementation for PPL fetch_size API is essentially appending a HEAD command at the end of the query, so if fetch_size is larger than plugins.query.size_limit, it would behave the same as using a HEAD command larger than plugins.query.size_limit, which would follow the cap set by plugins.query.size_limit
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Support fetch_size API for PPL Signed-off-by: Kai Huang <ahkcs@amazon.com> # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java * update Signed-off-by: Kai Huang <ahkcs@amazon.com> * remove upper limit Signed-off-by: Kai Huang <ahkcs@amazon.com> * Support fetch_size as a URL parameter Signed-off-by: Kai Huang <ahkcs@amazon.com> * add Test coverage Signed-off-by: Kai Huang <ahkcs@amazon.com> * update docs Signed-off-by: Kai Huang <ahkcs@amazon.com> * formatting Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> (cherry picked from commit 18b1610) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Support fetch_size API for PPL # Conflicts: # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java * update * remove upper limit * Support fetch_size as a URL parameter * add Test coverage * update docs * formatting --------- (cherry picked from commit 18b1610) Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary
Add
fetch_sizeparameter to the PPL REST API (POST /_plugins/_ppl) that limits the number of rows returned in a single response.fetch_sizeis specified, aHeadAST node is injected on top of the query plan, which pushes the limit down to OpenSearch — so only the requested number of documents are fetched, not the full result setfetch_sizewhich enables cursor-based pagination, PPL'sfetch_sizereturns a single complete response with no cursorResolves #3314
Design
When
fetch_size > 0, we injectHead(fetchSize, 0)beforeaddSelectAllinAstStatementBuilder.visitPplStatement(). This is equivalent to appending| head Nto the query at the AST level.The existing optimizer handles limit pushdown to OpenSearch for both the Calcite and Legacy execution paths — no changes to the execution engine are needed.
API Example
Constraints
fetch_sizemust be between 1 and 10,0000or omitting the parameter uses the system default (current behavior)Interaction with
headcommandThe smaller limit always wins:
Related Issues
Resolves #3314
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.