Skip to content

Support multi-field syntax for unified SQL relevance functions#5427

Draft
dai-chen wants to merge 1 commit into
opensearch-project:mainfrom
dai-chen:feature/array-multi-field-bracket-syntax
Draft

Support multi-field syntax for unified SQL relevance functions#5427
dai-chen wants to merge 1 commit into
opensearch-project:mainfrom
dai-chen:feature/array-multi-field-bracket-syntax

Conversation

@dai-chen
Copy link
Copy Markdown
Collaborator

@dai-chen dai-chen commented May 8, 2026

Description

This PR adds support for standard SQL ARRAY[] syntax in multi-field relevance functions (multi_match, simple_query_string, query_string) for the unified SQL engine. The NamedArgRewriter is extended to detect the ARRAY[] argument and expands it into a MAP constructor with VARCHAR-typed field names, producing plans consistent with the PPL path and compatible with the Analytics Engine's pushdown pattern matching in opensearch-project/OpenSearch#21562.

Example:

Syntax:
  SELECT * FROM t WHERE multi_match(ARRAY['name', 'department'], 'John')

Plan output (matches PPL):
  multi_match(MAP('fields', MAP('name':VARCHAR, 1, 'department':VARCHAR, 1)), MAP('query', 'John'))

Note: This uses ARRAY['field1', 'field2'] rather than V2's bracket syntax ['field1', 'field2'], as Calcite's SQL parser does not support it and requires complex changes in the .ftl parser template. A follow-up PR will document all such syntax differences between V2 and unified SQL engine.

Related Issues

Part of #5248

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dai-chen dai-chen self-assigned this May 8, 2026
@dai-chen dai-chen added enhancement New feature or request SQL labels May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

PR Reviewer Guide 🔍

(Review updated until commit 0d2dbe1)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

The arrayArgToMap method creates a numeric literal '1.0' using createApproxNumeric, which produces a DOUBLE type. However, the test assertions expect 'DOUBLE' in the output (e.g., '1.0E0:DOUBLE'). If the Analytics Engine's pushdown pattern matching expects exact type consistency with the PPL path, any type mismatch could cause the pattern match to fail. Verify that the PPL path also produces DOUBLE literals with the same representation, or consider using createExactNumeric("1", ...) if INTEGER is expected.

mapArgs.add(SqlLiteral.createApproxNumeric("1.0", SqlParserPos.ZERO));
Missing Validation

The arrayArgToMap method does not validate that the ARRAY elements are string literals or identifiers. If a user passes ARRAY[123, 456] or ARRAY[some_column], the code will attempt to cast these to VARCHAR and construct a MAP, which may produce unexpected results or fail downstream. Consider adding validation to ensure array elements are string literals, or document the expected behavior for non-string inputs.

private static SqlNode arrayArgToMap(SqlCall arrayCall) {
  List<SqlNode> mapArgs = new ArrayList<>();
  for (SqlNode element : arrayCall.getOperandList()) {
    mapArgs.add(cast(element, VARCHAR));
    mapArgs.add(SqlLiteral.createApproxNumeric("1.0", SqlParserPos.ZERO));
  }
  return map(mapArgs);
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

PR Code Suggestions ✨

Latest suggestions up to 0d2dbe1
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate array element types

The method assumes all array elements are valid field names but doesn't validate
them. If an array element is not a string literal or identifier, casting it to
VARCHAR may produce unexpected results or fail. Add validation to ensure array
elements are appropriate field references before processing.

api/src/main/java/org/opensearch/sql/api/spec/search/NamedArgRewriter.java [95-102]

 private static SqlNode arrayArgToMap(SqlCall arrayCall) {
   List<SqlNode> mapArgs = new ArrayList<>();
   for (SqlNode element : arrayCall.getOperandList()) {
+    if (!(element instanceof SqlIdentifier || element instanceof SqlLiteral)) {
+      throw new IllegalArgumentException(
+          "Array elements must be field identifiers or string literals");
+    }
     mapArgs.add(cast(element, VARCHAR));
     mapArgs.add(SqlLiteral.createApproxNumeric("1.0", SqlParserPos.ZERO));
   }
   return map(mapArgs);
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that arrayArgToMap doesn't validate array element types before casting to VARCHAR. Adding validation would improve robustness, though the current implementation may work for expected use cases. The validation logic is reasonable but the error handling could be more specific.

Low
General
Improve argument validation order

The logic checks i >= paramNames.size() before processing array arguments, but this
validation should occur after determining the argument type. If an array is provided
at an invalid position, the error message won't accurately reflect that an array was
involved. Reorder the checks to validate position after identifying the argument
type.

api/src/main/java/org/opensearch/sql/api/spec/search/NamedArgRewriter.java [64-73]

 } else { // Positional arg
-  if (i >= paramNames.size()) {
-    throw new IllegalArgumentException(
-        String.format("Invalid arguments for function '%s'", call.getOperator().getName()));
-  } else if (isArrayArg(op)) {
+  if (isArrayArg(op)) {
+    if (i >= paramNames.size()) {
+      throw new IllegalArgumentException(
+          String.format("Invalid arguments for function '%s'", call.getOperator().getName()));
+    }
     maps[i] = map(paramNames.get(i), arrayArgToMap((SqlCall) op));
   } else {
+    if (i >= paramNames.size()) {
+      throw new IllegalArgumentException(
+          String.format("Invalid arguments for function '%s'", call.getOperator().getName()));
+    }
     maps[i] = map(paramNames.get(i), op);
   }
 }
Suggestion importance[1-10]: 3

__

Why: While the suggestion proposes reordering validation checks, the current logic is functionally correct and the error message is already adequate. The proposed change adds code duplication without significant benefit, as the validation occurs before processing regardless of argument type.

Low

Previous suggestions

Suggestions up to commit 35978d7
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate array element types

The method does not validate that the array elements are valid field identifiers or
string literals. Consider adding validation to ensure array elements are of expected
types (SqlIdentifier or SqlLiteral) before processing to prevent runtime errors with
malformed input.

api/src/main/java/org/opensearch/sql/api/spec/search/NamedArgRewriter.java [89-97]

 private static SqlNode expandFieldsArray(SqlCall arrayCall) {
   List<SqlNode> mapArgs = new ArrayList<>();
   for (SqlNode element : arrayCall.getOperandList()) {
+    if (!(element instanceof SqlIdentifier || element instanceof SqlLiteral)) {
+      throw new IllegalArgumentException(
+          "Array elements must be field identifiers or string literals");
+    }
     mapArgs.add(castToVarchar(element));
     mapArgs.add(SqlLiteral.createExactNumeric(BigDecimal.ONE.toPlainString(), SqlParserPos.ZERO));
   }
   return SqlStdOperatorTable.MAP_VALUE_CONSTRUCTOR.createCall(
       SqlParserPos.ZERO, mapArgs.toArray(SqlNode[]::new));
 }
Suggestion importance[1-10]: 7

__

Why: Adding validation for array element types (SqlIdentifier or SqlLiteral) would prevent runtime errors from malformed input. This is a reasonable defensive programming practice that improves robustness.

Medium

Add support for standard SQL ARRAY syntax in multi-field relevance
functions (multi_match, simple_query_string, query_string). The
NamedArgRewriter expands ARRAY['f1','f2'] into a MAP with VARCHAR-typed
field names and default weight 1.0, producing plans compatible with
the Analytics engine pushdown rules.

Syntax: multi_match(ARRAY['name', 'department'], 'John')

The key technique is wrapping each field literal in CAST(... AS VARCHAR)
at the SqlNode level so Calcite's validator produces bare RexLiterals
without type-widening CASTs in the final plan.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen force-pushed the feature/array-multi-field-bracket-syntax branch from 35978d7 to 0d2dbe1 Compare May 11, 2026 18:22
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 0d2dbe1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant