Skip to content

Mvexpand feature#4944

Open
srikanthpadakanti wants to merge 85 commits intoopensearch-project:mainfrom
srikanthpadakanti:main
Open

Mvexpand feature#4944
srikanthpadakanti wants to merge 85 commits intoopensearch-project:mainfrom
srikanthpadakanti:main

Conversation

@srikanthpadakanti
Copy link
Contributor

Description

This pull request adds native support for the mvexpand command in PPL to OpenSearch SQL, enabling users to expand multivalue fields (arrays) into separate rows directly within queries. This functionality is analogous to Splunk's mvexpand command and streamlines analytics, dashboarding, and data preparation involving arrays or multivalue fields.

Key features introduced:

Native mvexpand command for PPL queries to expand array fields into separate rows/events.
Optional limit parameter to restrict the number of expanded values per event/document.
Robust handling of empty/null arrays, large arrays (with memory/resource limits), and non-array fields.
Streaming/distributable execution for performance and scalability.
Comprehensive documentation and edge case coverage.
This feature makes OpenSearch SQL more powerful and user-friendly for log analytics, data exploration, and migration from platforms like Splunk.

Related Issues

Resolves #4439
#4439

Check List

  • [ X] New functionality includes testing.
  • [ X] New functionality has been documented.
  • [ X] New functionality has javadoc added.
  • [ X] New functionality has a user manual doc added.
  • [ X] New PPL command checklist all confirmed.
  • [ X] API changes companion pull request created.
  • [ X] Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Srikanth Padakanti added 30 commits November 22, 2025 14:42
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
…it supposed to

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @docs/user/ppl/cmd/mvexpand.md:
- Around line 7-10: The fenced code block showing the mvexpand syntax is missing
a language identifier; update the triple-backtick fence for the snippet
"mvexpand <field> [limit=<int>]" to include the ppl language (i.e., change ```
to ```ppl) so it matches other examples and satisfies markdownlint MD040.
- Line 4: The description for the mvexpand command currently includes a
comparative phrase "similar to Splunk's `mvexpand` command"; remove that
comparative wording and rewrite the sentence to describe only the OpenSearch
implementation—e.g., state that the `mvexpand` command expands each value in a
multivalue (array) field into a separate row and that for each document every
element in the specified array field is returned as a new row, keeping the
reference to `mvexpand` but omitting any mention of Splunk.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e067b46 and 514a7dc.

📒 Files selected for processing (2)
  • docs/user/ppl/cmd/mvexpand.md
  • docs/user/ppl/index.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/user/ppl/index.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

  • docs/user/ppl/cmd/mvexpand.md
🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/mvexpand.md

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)
docs/user/ppl/cmd/mvexpand.md (2)

12-23: LGTM.

Field descriptions and output naming guidance are clear and consistent with the design intent.


15-18: Reposition the "Notes about these doctests" section.

This section is not inherent to the mvexpand command specification and interrupts the natural flow (Syntax → Output Naming → Examples). Per prior feedback, either remove it entirely or move it to the end of the documentation as procedural guidance.

💡 Suggested actions

Option 1 (Preferred): Move lines 15–18 to the end of the document (after all examples), as a meta-note about how the examples are constructed.

Option 2: Remove the section if it's redundant; the examples themselves make it clear that array() is being used.

Likely an incorrect or invalid review comment.

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @docs/user/ppl/cmd/mvexpand.md:
- Around line 7-10: The fenced code block under the "Syntax" section is missing
a language identifier (MD040); update that block to include the same language
tag used elsewhere (use "ppl") so the block becomes a fenced code block with the
ppl language identifier, matching other examples like the blocks at lines 27 and
52 for linting consistency and clarity.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 514a7dc and 9e8ea2c.

📒 Files selected for processing (1)
  • docs/user/ppl/cmd/mvexpand.md
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

  • docs/user/ppl/cmd/mvexpand.md
🪛 markdownlint-cli2 (0.18.1)
docs/user/ppl/cmd/mvexpand.md

8-8: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

…instead of checking message

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Srikanth Padakanti added 2 commits January 7, 2026 16:38
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
@integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java:
- Around line 26-67: Replace the hard-coded mapping and inline bulkInsert JSONs
in init() with the resource-based loading helpers: call
loadFromFile("mvexpand_edge_cases_mapping.json") to get the mapping string and
pass it to createIndex(INDEX, mapping), then call
loadFromFile("mvexpand_edge_cases.json") to get the test documents and feed
those documents into the existing bulk insertion flow (e.g., split or parse the
file contents into individual JSON docs and pass them to bulkInsert(INDEX,
...)). Keep the existing refreshIndex(INDEX) call and remove the inline mapping
and JSON literals so the test uses the resource files.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00f440b and 50031ce.

📒 Files selected for processing (1)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
🧠 Learnings (5)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Verify AST generation for new PPL parser syntax
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Update corresponding AST builder classes when making PPL grammar changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to integ-test/**/*IT.java : End-to-end scenarios need integration tests in `integ-test/` module

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
🧬 Code graph analysis (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (2)
integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java (1)
  • PPLIntegTestCase (36-415)
integ-test/src/test/java/org/opensearch/sql/sql/IdentifierIT.java (1)
  • Index (218-262)
🔇 Additional comments (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (2)

304-346: Remove duplicate helper methods and use inherited utilities.

These helper methods (createIndex, deleteIndexIfExists, bulkInsert, refreshIndex) duplicate functionality available in SQLIntegTestCase. A previous review indicated these should be replaced with existing helpers, but they remain in the file.

Use the inherited methods from SQLIntegTestCase to follow DRY principles and maintain consistency with other integration tests.

Based on coding guidelines: "Identify code reuse opportunities across similar implementations."

⛔ Skipped due to learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

69-85: OpenSearch SQL mvexpand behavior diverges from Splunk—clarify if this is intentional.

This test validates OpenSearch SQL's mvexpand on typed nested fields: projecting the full field returns the nested object {"name":"go"}, while projecting skills.name returns the scalar "go". However, Splunk's mvexpand works fundamentally differently—it expands multivalue fields and requires explicit spath JSON parsing to extract nested fields. Verify whether OpenSearch SQL's implementation (type-aware nested field expansion) is intentionally different from Splunk's approach, or if Splunk compatibility is required. If compatibility is mandatory, the implementation and test approach may need adjustment.

@srikanthpadakanti
Copy link
Contributor Author

@penghuo @anasalkouz @dai-chen Can you please review this. Thanks

srikanthpadakanti and others added 4 commits January 16, 2026 10:30
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`:
- Around line 3141-3149: buildEmptyResultWithStableSchema currently creates an
ARRAY<ANY> null placeholder which mismatches mvexpand's element-type schema;
change the placeholder to a scalar ANY by replacing the arrayAny creation
(typeFactory.createArrayType(...)) with a scalar any type
(typeFactory.createSqlType(SqlTypeName.ANY)) and use that scalar when calling
relBuilder.getRexBuilder().makeNullLiteral(...) for the alias generation (keep
relBuilder.alias(..., fieldName) and the subsequent
relBuilder.filter(relBuilder.literal(false)) unchanged).
♻️ Duplicate comments (2)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (2)

303-345: Remove duplicate helper methods and use inherited utilities.

createIndex, deleteIndexIfExists, bulkInsert, and refreshIndex duplicate helpers already provided by PPLIntegTestCase/SQLIntegTestCase. Removing the local versions keeps tests consistent and reduces maintenance.

As per coding guidelines, prefer reuse of existing helpers to avoid duplication.


26-67: Use resource-backed mapping/data instead of inline JSON.

Integration tests should load mappings and test data from files under integ-test/src/test/resources/. The inline mapping/docs duplicate mvexpand_edge_cases_mapping.json and mvexpand_edge_cases.json, which risks drift and violates the test data guideline. Consider loading those files and bulk-inserting from the parsed array.

♻️ Suggested refactor
   `@Override`
   public void init() throws Exception {
     super.init();
     enableCalcite();
     deleteIndexIfExists(INDEX);

-    // Single shared mapping for ALL cases (no extra indices)
-    // - skills: nested (mvexpand target)
-    // - skills_not_array: keyword (semantic error test)
-    // - skills_int: integer (semantic error test)
-    final String mapping =
-        "{ \"mappings\": { \"properties\": { "
-            + "\"username\": { \"type\": \"keyword\" },"
-            + "\"skills\": { \"type\": \"nested\" },"
-            + "\"skills_not_array\": { \"type\": \"keyword\" },"
-            + "\"skills_int\": { \"type\": \"integer\" }"
-            + "} } }";
-
-    createIndex(INDEX, mapping);
-
-    bulkInsert(
-        INDEX,
-        "{\"username\":\"happy\",\"skills\":[{\"name\":\"python\"},{\"name\":\"java\"},{\"name\":\"sql\"}]}",
-        "{\"username\":\"single\",\"skills\":[{\"name\":\"go\"}]}",
-        "{\"username\":\"empty\",\"skills\":[]}",
-        "{\"username\":\"nullskills\",\"skills\":null}",
-        "{\"username\":\"noskills\"}",
-        "{\"username\":\"partial\",\"skills\":[{\"name\":\"kotlin\"},{\"level\":\"intern\"},{\"name\":null}]}",
-        "{\"username\":\"mixed_shapes\",\"skills\":[{\"name\":\"elixir\",\"meta\":{\"years\":3}},{\"name\":\"haskell\"}]}",
-        "{\"username\":\"duplicate\",\"skills\":[{\"name\":\"dup\"},{\"name\":\"dup\"}]}",
-        "{\"username\":\"complex\",\"skills\":[{\"name\":\"ml\",\"level\":\"expert\"},{\"name\":\"ai\"},{\"level\":\"novice\"}]}",
-        "{\"username\":\"large\",\"skills\":["
-            + "{\"name\":\"s1\"},{\"name\":\"s2\"},{\"name\":\"s3\"},{\"name\":\"s4\"},{\"name\":\"s5\"},"
-            + "{\"name\":\"s6\"},{\"name\":\"s7\"},{\"name\":\"s8\"},{\"name\":\"s9\"},{\"name\":\"s10\"}"
-            + "]}",
-        "{\"username\":\"hetero_types\",\"skills\":[{\"level\":\"senior\"},{\"level\":3}]}",
-        "{\"username\":\"u1\",\"skills_not_array\":\"scala\"}",
-        "{\"username\":\"u_int\",\"skills_int\":5}",
-        "{\"username\":\"limituser\",\"skills\":[{\"name\":\"a\"},{\"name\":\"b\"},{\"name\":\"c\"},{\"name\":\"d\"},{\"name\":\"e\"}]}");
+    final String mapping = loadFromFile("mvexpand_edge_cases_mapping.json");
+    createIndex(INDEX, mapping);
+
+    String docsJson = loadFromFile("mvexpand_edge_cases.json");
+    var docs = new JSONArray(docsJson);
+    List<String> bulkDocs = new ArrayList<>(docs.length());
+    for (int i = 0; i < docs.length(); i++) {
+      bulkDocs.add(docs.getJSONObject(i).toString());
+    }
+    bulkInsert(INDEX, bulkDocs.toArray(new String[0]));

     refreshIndex(INDEX);
   }

Additional imports:

import java.util.ArrayList;
import java.util.List;
import org.json.JSONArray;

As per coding guidelines, integration tests should load mappings/data from resources.

🧹 Nitpick comments (2)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

3396-3449: Consider extracting helper(s) from buildExpandRelNode (now >50 lines).

This method now exceeds the 50-line guideline and mixes correlation setup, right-node construction, limit application, and projection/rename. Extracting right-side build (unnest + limit) into a small helper would improve readability and maintainability.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (1)

166-179: Avoid relying on implicit null sort ordering.

sort skills.name includes nulls, and the expected row order assumes nulls sort last. Please confirm Calcite/PPL null ordering or make it explicit (e.g., nulls last if supported) or assert without ordering to prevent flakiness.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c743933 and 658c1e1.

📒 Files selected for processing (2)
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
🧰 Additional context used
📓 Path-based instructions (7)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring

  • Flag classes >500 lines as needing organization review
  • Check for dead code, unused imports, and unused variables
  • Identify code reuse opportunities across similar implementations
  • Assess holistic maintainability - is code easy to understand and modify?
  • Flag code that appears AI-generated without sufficient human review
  • Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Integration tests MUST use valid test data from resources

  • Verify test data files exist in integ-test/src/test/resources/
  • Check test assertions are meaningful and specific
  • Validate tests clean up resources after execution
  • Ensure tests are independent and can run in any order
  • Flag tests that reference non-existent indices (e.g., EMP)
  • Verify integration tests are in correct module (integ-test/)
  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify NULL input tests for all new functions

  • Check boundary condition tests (min/max values, empty inputs)
  • Validate error condition tests (invalid inputs, exceptions)
  • Ensure multi-document tests for per-document operations
  • Flag smoke tests without meaningful assertions
  • Check test naming follows pattern: test
  • Verify test data is realistic and covers edge cases
  • Verify test coverage for new business logic
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing Calcite integration patterns

  • Verify RelNode visitor implementations are complete
  • Check RexNode handling follows project conventions
  • Validate SQL generation is correct and optimized
  • Ensure Calcite version compatibility
  • Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/**/*.java

⚙️ CodeRabbit configuration file

core/src/main/java/**/*.java: - New functions MUST have unit tests in the same commit

  • Public methods MUST have JavaDoc with @param, @return, and @throws
  • Follow existing function implementation patterns in the same package
  • New expression functions should follow ExpressionFunction interface patterns
  • Validate function naming follows project conventions (camelCase)

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

⚙️ CodeRabbit configuration file

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java: - Flag methods >50 lines - this file is known to be hard to read

  • Suggest extracting complex logic into helper methods
  • Check for code organization and logical grouping
  • Validate all RelNode types are handled

Files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🧠 Learnings (12)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to integ-test/**/*IT.java : End-to-end scenarios need integration tests in `integ-test/` module

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
📚 Learning: 2026-01-13T17:39:10.972Z
Learnt from: ykmr1224
Repo: opensearch-project/sql PR: 5028
File: integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathCommandIT.java:23-35
Timestamp: 2026-01-13T17:39:10.972Z
Learning: In OpenSearch SQL integration tests extending PPLIntegTestCase or SQLIntegTestCase, index cleanup is handled automatically by the parent class infrastructure. Tests do not need explicit After methods to delete indices created during tests.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Follow existing patterns in `CalciteRelNodeVisitor` and `CalciteRexNodeVisitor` for Calcite integration

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Flag generic exception handling during code review

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Use specific exception types with meaningful messages for error handling

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Flag missing JavaDoc on public APIs during code review

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : All public classes and methods must have proper JavaDoc

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
📚 Learning: 2025-12-11T05:27:39.856Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 0
File: :0-0
Timestamp: 2025-12-11T05:27:39.856Z
Learning: In opensearch-project/sql, for SEMI and ANTI join types in CalciteRelNodeVisitor.java, the `max` option has no effect because these join types only use the left side to filter records based on the existence of matches in the right side. The join results are identical regardless of max value (max=1, max=2, or max=∞). The early return for SEMI/ANTI joins before processing the `max` option is intentional and correct behavior.

Applied to files:

  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
🔇 Additional comments (15)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMvExpandCommandIT.java (15)

69-85: LGTM.


87-110: LGTM.


112-120: LGTM.


122-131: LGTM.


133-142: LGTM.


144-153: LGTM.


155-164: LGTM.


181-190: LGTM.


192-212: LGTM.


214-226: LGTM.


228-238: LGTM.


240-250: LGTM.


252-262: LGTM.


264-287: LGTM.


289-301: LGTM.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@LantaoJin
Copy link
Member

One question in my mind is what's the different between mvexpand and the existed expand and should we combine them?

@penghuo do you know why mvexpand is in the roadmap and is it a task to refactor the existed expand command?

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java (1)

959-960: ⚠️ Potential issue | 🔴 Critical

Build failure: Missing perDocLimit parameter in buildExpandRelNode call.

The method signature was updated to include a @Nullable Integer perDocLimit parameter, but this call site was not updated, causing the compilation failure reported by CI.

🐛 Proposed fix
         buildExpandRelNode(
-            context.relBuilder.field(node.getAlias()), node.getAlias(), node.getAlias(), context);
+            context.relBuilder.field(node.getAlias()), node.getAlias(), node.getAlias(), null, context);
🤖 Fix all issues with AI agents
In `@core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java`:
- Around line 476-478: Add JavaDoc for the new public method visitMvExpand in
class AbstractNodeVisitor: document the method purpose in a short sentence and
include `@param` tags for both parameters (MvExpand node and C context) and an
`@return` describing the returned T value; if the method can throw checked or
runtime exceptions, add an `@throws` tag listing them. Ensure the JavaDoc follows
existing core guidelines and formatting consistent with other visitor methods in
AbstractNodeVisitor.

In `@ppl/src/main/antlr/OpenSearchPPLParser.g4`:
- Around line 82-84: Add negative parser tests covering mvexpandCommand to
CalcitePPLMvExpandTest (or the existing mvexpand test class): create cases that
assert parse failures for non-integer limit values (e.g., mvexpand field
limit="abc"), missing field expressions (e.g., mvexpand with no argument or
empty brackets), and invalid limit operations (e.g., negative limits or limit
specified with invalid tokens). Use the parser test harness already used for
positive cases to feed these malformed queries to the parser and assert that
parsing fails or produces syntax errors, referencing the mvexpandCommand grammar
rule and existing test utilities used in CalcitePPLMvExpandTest.java.

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In
`@core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java`:
- Around line 655-662: The public method visitMvExpand in class
FieldResolutionVisitor lacks Javadoc; add a brief Javadoc block above the method
that describes its purpose and includes `@param` for the MvExpand node and
FieldResolutionContext context, `@return` describing the returned Node, and (if
applicable) `@throws` for any runtime exceptions it may propagate (otherwise omit
`@throws`). Ensure the Javadoc uses standard JavaDoc formatting and is placed
immediately above the visitMvExpand method declaration.

In `@integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java`:
- Around line 216-224: The testMvExpandCommand currently only calls verifyQuery
when executeQuery throws a ResponseException; to cover the success path capture
the successful response into result by assigning result = new
JSONObject(executeQuery(...)) or by creating the JSONObject from the response
returned by executeQuery inside the try, then move the verifyQuery(result) call
outside the try/catch so it runs for both success and exception paths; update
references to executeQuery, result, and verifyQuery in testMvExpandCommand
accordingly.

In
`@integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java`:
- Around line 217-226: Update the misleading comment in the
testCrossClusterAddTotals() test: replace or remove the copy-pasted line "Test
query_string without fields parameter on remote cluster" so it correctly
describes this test's purpose (e.g., "Test addtotals on remote cluster" or
similar). Locate the comment near the testCrossClusterAddTotals method and
update it to reference addtotals and the TEST_INDEX_BANK_REMOTE symbol.
- Around line 228-240: Update the incorrect Javadoc/comment above the
testCrossClusterAddColTotals method: replace the copy-pasted "Test query_string
without fields parameter on remote cluster" description with an accurate brief
description such as "Test cross-cluster addcoltotals" or "Test addcoltotals
across remote cluster" so the comment matches the testAddColTotals behavior in
CrossClusterSearchIT.testCrossClusterAddColTotals.

In
`@ppl/src/test/java/org/opensearch/sql/ppl/parser/FieldResolutionVisitorTest.java`:
- Around line 364-372: Add tests in FieldResolutionVisitorTest to cover mvexpand
boundary/error cases: create new `@Test` methods that call
assertSingleRelationFields (or assertThrows/appropriate assertion) for
"source=logs | mvexpand skills limit=0", "source=logs | mvexpand skills
limit=-1", "source=logs | mvexpand skills limit=abc" (non-numeric), "source=logs
| mvexpand" (missing field), and a null/empty-field case like "source=logs |
mvexpand " or passing null input; for each test assert the expected
error/behavior (use assertThrows when invalid syntax should raise an exception,
or assertSingleRelationFields expecting empty/validated field set where
applicable) so mvexpand's edge cases are covered.

@srikanthpadakanti
Copy link
Contributor Author

One question in my mind is what's the different between mvexpand and the existed expand and should we combine them?

@penghuo do you know why mvexpand is in the roadmap and is it a task to refactor the existed expand command?

Refactoring the existing expand command was not part of the original task. However, @ykmr1224 suggested reusing a common implementation for both expand and mvexpand. Based on that guidance, I refactored expand to align with and reuse the shared logic.

@LantaoJin Please look into this and see if this can be merged. Thanks

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
@srikanthpadakanti
Copy link
Contributor Author

@LantaoJin The failing workflows - I am unable to reproduce in the local.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@integ-test/src/test/java/org/opensearch/sql/ppl/NewAddedCommandsIT.java`:
- Around line 216-225: The testMvExpandCommand currently runs mvexpand against
the scalar field address in TEST_INDEX_BANK; change it to use an actual
multi-valued array field or index (for example replace TEST_INDEX_BANK|address
with TEST_INDEX_BANK|skills if the bank index has a skills array, or run against
the mvexpand_edge_cases index) so mvexpand exercises array expansion; update the
executeQuery invocation in testMvExpandCommand to target the chosen array
field/index (keep the existing try/catch around executeQuery and the call to
verifyQuery(result) unchanged).

Signed-off-by: Srikanth Padakanti <srikanth_padakanti@apple.com>
@srikanthpadakanti
Copy link
Contributor Author

srikanthpadakanti commented Feb 5, 2026

@penghuo I have addressed all the comments from everyone and made the changes. Please look into this and see if this can be merged for this release.
The security workflows are failing.

@LantaoJin explicitly asked me to remove the enableCalcite() method from the CrossClusterSearchIT tests and i did the same for previous command eg: mvcombine.
I am unable to replicate the failing scenarios on my local.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add native support for mvexpand command in PPL

5 participants