Add cluster command for PPL by ritvibhatt · Pull Request #5265 · opensearch-project/sql

ritvibhatt · 2026-03-25T16:58:00Z

Description

Adds the cluster command to PPL, which groups documents into clusters based on text similarity.

Syntax

cluster [t=] [match=] [labelfield=] [countfield=] [showcount=] [labelonly=] [delims=]

Supported algorithms

termlist (default) — positional term frequency, word order matters
termset — bag-of-words, word order ignored
ngramset — character trigram frequency for fuzzy matching

All algorithms use cosine similarity against the configured threshold to determine cluster membership.

Behavior

By default (labelonly=false), deduplicates results to one
representative row per cluster
labelonly=true keeps all rows and adds the cluster label
showcount=true adds a count of how many documents belong to each cluster
Rows where the source field is null are filtered out before clustering

Changes

CalciteRelNodeVisitor: implements visitCluster with window-based clustering, null filtering, and optional dedup/showcount
ClusterLabelAggFunction: window aggregate that buffers rows, runs greedy clustering, and returns an array of cluster labels
TextSimilarityClustering: cosine similarity computation with LRU-cached vectorization for termlist, termset, and ngramset modes
Cluster AST node, parser rules, and PPL grammar additions
Documentation with 7 doctest-verified examples
Unit tests for logical plan verification
Integration tests covering all parameters, dedup behavior, labelonly mode, and multi-row clustering

Related Issues

Resolves #5255

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-25T16:58:40Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit df7873a.

Path	Line	Severity	Description
common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java	80	medium	Cache eviction is performed inside ConcurrentHashMap.computeIfAbsent() via parallelStream().forEach(vectorCache::remove). Modifying a ConcurrentHashMap while a computeIfAbsent call is in progress on the same map can cause a deadlock or infinite loop (known JDK issue). If triggered under adversarial input designed to hit MAX_CACHE_SIZE boundary conditions, this could be used as a denial-of-service vector against the query engine. The pattern is anomalous and not necessary — eviction could be done outside the lambda.
ppl/src/main/antlr/OpenSearchPPLParser.g4	1592	low	The ident grammar rule was restructured: previously keywords-as-identifiers fell through a separate alternative; now they are inlined as (ID \| keywordsCanBeId) in the primary alternative. Combined with adding the single-character token T as both a parameter keyword and a searchable keyword, this increases the ambiguity surface of the PPL grammar. Misidentification of tokens could allow carefully crafted queries to bypass identifier-based access control checks, though no specific exploit path is evident from this diff alone.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-25T16:59:20Z

Failed to generate code suggestions for PR

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-30T07:18:56Z

PR Reviewer Guide 🔍

(Review updated until commit `5bfaf69`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Core clustering algorithm and UDAF implementation Relevant files: common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java Sub-PR theme: PPL grammar, AST, and Calcite plan visitor for cluster command Relevant files: ppl/src/main/antlr/OpenSearchPPLLexer.g4 ppl/src/main/antlr/OpenSearchPPLParser.g4 core/src/main/java/org/opensearch/sql/ast/tree/Cluster.java core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java Sub-PR theme: Tests and documentation for cluster command Relevant files: integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteClusterCommandIT.java integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLClusterTest.java ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java docs/user/ppl/cmd/cluster.md docs/user/ppl/index.md
⚡ Recommended focus areas for review Thread Safety The `ClusterLabelAggFunction` class stores mutable state (`threshold`, `matchMode`, `delims`, `bufferLimit`, `maxClusters`) as instance fields that are mutated in the `add` methods. If the same function instance is reused across multiple queries or threads, this could lead to race conditions or incorrect results. The state should be encapsulated in the `Acc` accumulator instead. private int bufferLimit = 50000; // Configurable buffer size private int maxClusters = 10000; // Limit cluster count to prevent memory explosion private double threshold = 0.8; private String matchMode = "termlist"; private String delims = " "; @Override public Acc init() { return new Acc(); } @Override public Object result(Acc acc) { return acc.value(threshold, matchMode, delims, maxClusters); } @Override public Acc add(Acc acc, Object... values) { // Handle case where Calcite calls generic method with null field value if (values.length == 1) { String field = (values[0] != null) ? values[0].toString() : null; return add(acc, field); } throw new SyntaxCheckException( "Unsupported function signature for cluster aggregate. Valid parameters include (field:" + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:" + " optional string algorithm 'termlist'\|'termset'\|'ngramset', default 'termlist')," + " (delims: optional string delimiters, default ' '), (labelfield: optional string" + " output field name, default 'cluster_label'), (countfield: optional string count" + " field name, default 'cluster_count')"); } public Acc add( Acc acc, String field, double threshold, String matchMode, String delims, int bufferLimit, int maxClusters) { // Process all rows, even when field is null - convert null to empty string // This ensures the result array matches input row count String processedField = (field != null) ? field : ""; this.threshold = threshold; this.matchMode = matchMode; this.delims = delims; this.bufferLimit = bufferLimit; this.maxClusters = maxClusters; acc.evaluate(processedField); if (bufferLimit > 0 && acc.bufferSize() == bufferLimit) { acc.partialMerge(threshold, matchMode, delims, maxClusters); acc.clearBuffer(); } return acc; } Possible Issue In `add(Acc acc, Object... values)`, when `values.length == 1`, the method delegates to `add(Acc acc, String field)` which uses the instance-level defaults for `threshold`, `matchMode`, etc. However, the Calcite visitor in `CalciteRelNodeVisitor.visitCluster` always passes 4 parameters (field, threshold, matchMode, delims) to the window function. The single-argument path may never be exercised as intended, and if it is, it silently uses stale defaults rather than the user-specified values. public Acc add(Acc acc, Object... values) { // Handle case where Calcite calls generic method with null field value if (values.length == 1) { String field = (values[0] != null) ? values[0].toString() : null; return add(acc, field); } throw new SyntaxCheckException( "Unsupported function signature for cluster aggregate. Valid parameters include (field:" + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:" + " optional string algorithm 'termlist'\|'termset'\|'ngramset', default 'termlist')," + " (delims: optional string delimiters, default ' '), (labelfield: optional string" + " output field name, default 'cluster_label'), (countfield: optional string count" + " field name, default 'cluster_count')"); } Cache Concurrency The `vectorCache` is a `LinkedHashMap` with LRU eviction, but `LinkedHashMap` is not thread-safe. If `TextSimilarityClustering` instances are shared across threads (e.g., via the UDAF), concurrent access to `vectorCache` via `computeIfAbsent` could cause data corruption or infinite loops. Consider using `Collections.synchronizedMap` or a `ConcurrentLinkedHashMap`. private final Map<String, Map<CharSequence, Integer>> vectorCache = new LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) { @Override protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) { return size() > MAX_CACHE_SIZE; } }; Keyword Conflict Adding `T` as a dedicated lexer token is very broad and may conflict with existing identifiers or other grammar rules that use `T` as a field/variable name. This could break existing queries that use `t` as a field name. The `keywordsCanBeId` list includes `T`, but the lexer token priority may still cause tokenization issues for common single-letter field names. T: 'T'; Row Ordering The cluster implementation relies on `ROW_NUMBER() OVER ()` (no ORDER BY) to index into the cluster labels array. Without a deterministic ORDER BY clause in the window spec, the row ordering is non-deterministic, meaning the array index may not correspond to the correct row across different executions. This could produce incorrect cluster label assignments. String rowNumAlias = "_cluster_row_idx"; RexNode rowNum = context .relBuilder .aggregateCall(SqlStdOperatorTable.ROW_NUMBER) .over() .rowsBetween(RexWindowBounds.UNBOUNDED_PRECEDING, RexWindowBounds.CURRENT_ROW) .as(rowNumAlias); context.relBuilder.projectPlus(rowNum);

github-actions · 2026-03-30T07:19:43Z

PR Code Suggestions ✨

Latest suggestions up to 5bfaf69

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Handle all expected argument counts in generic add method The generic `add(Acc, Object...)` method only handles the case where `values.length ==` `1`, but Calcite may call this method with 4 parameters (field, threshold, matchMode, delims) as configured in `visitCluster`. This would cause an unexpected `SyntaxCheckException` at runtime instead of routing to the correct overload. Add handling for the 4-argument case to match the parameters passed from `visitCluster`. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56] @Override public Acc add(Acc acc, Object... values) { - // Handle case where Calcite calls generic method with null field value if (values.length == 1) { String field = (values[0] != null) ? values[0].toString() : null; return add(acc, field); } + if (values.length == 4) { + String field = (values[0] != null) ? values[0].toString() : null; + double threshold = ((Number) values[1]).doubleValue(); + String matchMode = (values[2] != null) ? values[2].toString() : this.matchMode; + String delims = (values[3] != null) ? values[3].toString() : this.delims; + return add(acc, field, threshold, matchMode, delims); + } throw new SyntaxCheckException( - "Unsupported function signature for cluster aggregate. ..."); + "Unsupported function signature for cluster aggregate. Valid parameters include (field:" + + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:" + + " optional string algorithm 'termlist'\|'termset'\|'ngramset', default 'termlist')," + + " (delims: optional string delimiters, default ' '), (labelfield: optional string" + + " output field name, default 'cluster_label'), (countfield: optional string count" + + " field name, default 'cluster_count')"); } Suggestion importance[1-10]: 7 __ Why: The generic `add` method only handles `values.length == 1`, but Calcite may invoke it with 4 arguments (field, threshold, matchMode, delims) as configured in `visitCluster`. Without handling this case, a `SyntaxCheckException` would be thrown at runtime instead of routing to the correct overload.	Medium
Possible issue	Avoid race conditions from mutable shared instance state The `add` method mutates instance fields (`this.threshold`, `this.matchMode`, etc.) which are shared state on the `ClusterLabelAggFunction` instance. In a concurrent or multi-threaded execution environment (common in query engines), this creates a race condition where one thread's parameters can overwrite another's. These configuration values should be stored in the `Acc` accumulator instead of on the function instance. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84] -public Acc add( - Acc acc, - String field, - double threshold, - String matchMode, - String delims, - int bufferLimit, - int maxClusters) { - ... - this.threshold = threshold; - this.matchMode = matchMode; - this.delims = delims; - this.bufferLimit = bufferLimit; - this.maxClusters = maxClusters; +public static class Acc implements Accumulator { + private final List<String> buffer = new ArrayList<>(); + private final List<ClusterRepresentative> globalClusters = new ArrayList<>(); + private final List<Integer> allLabels = new ArrayList<>(); + private int nextClusterId = 1; + // Store config in accumulator to avoid shared mutable state + double threshold = 0.8; + String matchMode = "termlist"; + String delims = " "; + int bufferLimit = 50000; + int maxClusters = 10000; ... } +// In add(...) method, update acc fields instead of this fields: +acc.threshold = threshold; +acc.matchMode = matchMode; +acc.delims = delims; +acc.bufferLimit = bufferLimit; +acc.maxClusters = maxClusters; Suggestion importance[1-10]: 6 __ Why: Mutating instance fields like `this.threshold`, `this.matchMode`, etc. in the `add` method creates potential race conditions in concurrent environments. Moving these configuration values into the `Acc` accumulator would be safer, though the actual risk depends on how Calcite instantiates and uses the aggregate function.	Low
General	Conditionally include optional parameters in anonymized output The anonymizer always appends `labelfield` and `countfield` regardless of whether `showcount` or `labelonly` were set, which produces output that doesn't match the original command structure and may confuse log analysis. Additionally, `labelonly` and `showcount` boolean parameters are never included in the anonymized output, losing information about the command's behavior. Include these parameters conditionally to produce a more accurate anonymized representation. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-987] @Override public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) { String child = node.getChild().get(0).accept(this, context); - ... + String sourceField = visitExpression(node.getSourceField()); + StringBuilder command = new StringBuilder(); + command.append(child).append(" \| cluster ").append(sourceField); + + command.append(" t=").append(node.getThreshold()); + + if (!"termlist".equals(node.getMatchMode())) { + command.append(" match=").append(node.getMatchMode()); + } + command.append(" labelfield=").append(MASK_COLUMN); - command.append(" countfield=").append(MASK_COLUMN); + + if (node.isShowCount()) { + command.append(" countfield=").append(MASK_COLUMN); + command.append(" showcount=true"); + } + + if (node.isLabelOnly()) { + command.append(" labelonly=true"); + } return command.toString(); } Suggestion importance[1-10]: 4 __ Why: The anonymizer always appends `labelfield` and `countfield` and omits `showcount`/`labelonly` flags, producing output that doesn't accurately reflect the original command structure. The improved code conditionally includes these parameters for a more faithful anonymized representation.	Low
General	Declare constant before use in field initializer `MAX_CACHE_SIZE` is referenced in the field initializer before it is declared as a `static final` field below it. In Java, instance field initializers can reference static fields declared later in the class, but this ordering is fragile and may cause confusion or issues in some edge cases. Move the `MAX_CACHE_SIZE` declaration before the `vectorCache` field declaration. common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [26-33] +private static final int MAX_CACHE_SIZE = 10000; + private final Map<String, Map<CharSequence, Integer>> vectorCache = new LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) { @Override protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) { return size() > MAX_CACHE_SIZE; } }; -private static final int MAX_CACHE_SIZE = 10000; Suggestion importance[1-10]: 3 __ Why: In Java, static fields are initialized before instance fields regardless of declaration order, so `MAX_CACHE_SIZE` being declared after `vectorCache` is technically valid. This is a minor code style/readability improvement with no functional impact.	Low

Previous suggestions

Suggestions up to commit 3fbd709

Category	Suggestion	Impact
Possible issue	Handle all expected argument counts in generic add method The generic `add(Acc, Object...)` method only handles the case where `values.length ==` `1`, but Calcite may call this method with 4 parameters (field, threshold, matchMode, delims) as defined in the window function invocation. This would cause an unexpected `SyntaxCheckException` at runtime instead of routing to the correct overloaded `add` method. Add handling for the 4-argument case (and potentially others) before throwing the exception. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56] @Override public Acc add(Acc acc, Object... values) { - // Handle case where Calcite calls generic method with null field value if (values.length == 1) { String field = (values[0] != null) ? values[0].toString() : null; return add(acc, field); } + if (values.length == 4) { + String field = (values[0] != null) ? values[0].toString() : null; + double threshold = values[1] != null ? ((Number) values[1]).doubleValue() : this.threshold; + String matchMode = values[2] != null ? values[2].toString() : this.matchMode; + String delims = values[3] != null ? values[3].toString() : this.delims; + return add(acc, field, threshold, matchMode, delims); + } throw new SyntaxCheckException( - "Unsupported function signature for cluster aggregate. ..."); + "Unsupported function signature for cluster aggregate. Valid parameters include (field:" + + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:" + + " optional string algorithm 'termlist'\|'termset'\|'ngramset', default 'termlist')," + + " (delims: optional string delimiters, default ' '), (labelfield: optional string" + + " output field name, default 'cluster_label'), (countfield: optional string count" + + " field name, default 'cluster_count')"); } Suggestion importance[1-10]: 7 __ Why: The generic `add` method only handles `values.length == 1`, but Calcite may invoke it with 4 arguments (field, threshold, matchMode, delims) as passed in the window function call. This could cause a runtime `SyntaxCheckException` instead of routing to the correct overload, making this a real potential issue.	Medium
Possible issue	Avoid mutable shared state in aggregate function instance Mutating instance fields (`this.threshold`, `this.matchMode`, etc.) inside the `add` method is not thread-safe and can cause incorrect behavior when the aggregate function instance is shared across rows or threads. These configuration values should be stored in the `Acc` accumulator instead of as mutable instance fields, or the instance should be treated as stateless with parameters passed through the accumulator. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84] -public Acc add( - Acc acc, - String field, - double threshold, - String matchMode, - String delims, - int bufferLimit, - int maxClusters) { - ... - this.threshold = threshold; - this.matchMode = matchMode; - this.delims = delims; - this.bufferLimit = bufferLimit; - this.maxClusters = maxClusters; - ... +// Move threshold, matchMode, delims, bufferLimit, maxClusters into Acc +public static class Acc implements Accumulator { + private final List<String> buffer = new ArrayList<>(); + private final List<ClusterRepresentative> globalClusters = new ArrayList<>(); + private final List<Integer> allLabels = new ArrayList<>(); + private int nextClusterId = 1; + // Configuration stored per-accumulator + double threshold = 0.8; + String matchMode = "termlist"; + String delims = " "; + int bufferLimit = 50000; + int maxClusters = 10000; + // ... rest of Acc } Suggestion importance[1-10]: 6 __ Why: Mutating instance fields like `this.threshold`, `this.matchMode`, etc. inside `add` is not thread-safe and can cause incorrect behavior if the function instance is shared. Moving these into the `Acc` accumulator would be a cleaner design, though the practical impact depends on how Calcite manages function instances.	Low
General	Declare constant before its use in field initializer `MAX_CACHE_SIZE` is referenced in the field initializer before it is declared as a `static final` field below it. In Java, instance field initializers can reference static fields declared later in the class, but this ordering is confusing and error-prone. Move the `MAX_CACHE_SIZE` declaration before the `vectorCache` field to make the dependency order explicit and avoid potential issues. common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [26-33] +private static final int MAX_CACHE_SIZE = 10000; + private final Map<String, Map<CharSequence, Integer>> vectorCache = new LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) { @Override protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) { return size() > MAX_CACHE_SIZE; } }; -private static final int MAX_CACHE_SIZE = 10000; Suggestion importance[1-10]: 4 __ Why: While Java technically allows referencing a `static final` field declared later in the class from an instance field initializer, the current ordering of `vectorCache` before `MAX_CACHE_SIZE` is confusing and error-prone. Moving `MAX_CACHE_SIZE` before `vectorCache` improves readability and makes the dependency explicit.	Low
General	Include all parameters in anonymized cluster command output The anonymizer always appends `labelfield` and `countfield` regardless of whether `showcount` or `labelonly` were set, which produces output that doesn't match the original command structure and may confuse log analysis. Additionally, `labelonly` and `showcount` parameters are silently dropped from the anonymized output, losing information about the command's behavior. These parameters should be included in the anonymized output. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-987] @Override public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) { String child = node.getChild().get(0).accept(this, context); - ... + String sourceField = visitExpression(node.getSourceField()); + StringBuilder command = new StringBuilder(); + command.append(child).append(" \| cluster ").append(sourceField); + + command.append(" t=").append(node.getThreshold()); + + if (!"termlist".equals(node.getMatchMode())) { + command.append(" match=").append(node.getMatchMode()); + } + command.append(" labelfield=").append(MASK_COLUMN); command.append(" countfield=").append(MASK_COLUMN); + + if (node.isLabelOnly()) { + command.append(" labelonly=true"); + } + if (node.isShowCount()) { + command.append(" showcount=true"); + } return command.toString(); } Suggestion importance[1-10]: 4 __ Why: The `visitCluster` anonymizer silently drops `labelonly` and `showcount` parameters, which reduces the fidelity of anonymized output for log analysis. Including these boolean flags (which don't contain sensitive data) would make the anonymized representation more accurate.	Low

Suggestions up to commit 1763fb8

Category	Suggestion	Impact
Possible issue	Handle all expected argument counts in generic add method The generic `add(Acc, Object...)` method only handles the case where `values.length ==` `1`, but Calcite may call this method with 4 parameters (field, threshold, matchMode, delims) as configured in `visitCluster`. This would cause a `SyntaxCheckException` at runtime instead of routing to the correct overload. Add handling for the 4-argument case to match the parameters passed from `visitCluster`. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56] @Override public Acc add(Acc acc, Object... values) { - // Handle case where Calcite calls generic method with null field value if (values.length == 1) { String field = (values[0] != null) ? values[0].toString() : null; return add(acc, field); } + if (values.length == 4) { + String field = (values[0] != null) ? values[0].toString() : null; + double threshold = ((Number) values[1]).doubleValue(); + String matchMode = (values[2] != null) ? values[2].toString() : this.matchMode; + String delims = (values[3] != null) ? values[3].toString() : this.delims; + return add(acc, field, threshold, matchMode, delims); + } throw new SyntaxCheckException( - "Unsupported function signature for cluster aggregate. ..."); + "Unsupported function signature for cluster aggregate. Valid parameters include (field:" + + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:" + + " optional string algorithm 'termlist'\|'termset'\|'ngramset', default 'termlist')," + + " (delims: optional string delimiters, default ' '), (labelfield: optional string" + + " output field name, default 'cluster_label'), (countfield: optional string count" + + " field name, default 'cluster_count')"); } Suggestion importance[1-10]: 6 __ Why: The generic `add` method only handles `values.length == 1`, but Calcite may call it with 4 arguments (field, threshold, matchMode, delims) as configured in `visitCluster`. This could cause a `SyntaxCheckException` at runtime. However, Calcite's reflection-based dispatch typically routes to the most specific typed overload, so this may not be triggered in practice.	Low
Possible issue	Avoid mutating shared instance state in accumulator method Mutating instance fields (`this.threshold`, `this.matchMode`, etc.) inside the `add` method is not thread-safe and can cause incorrect behavior when the aggregate function is used concurrently across multiple partitions or threads. These configuration values should be set once at construction time or passed explicitly through the accumulator rather than stored as mutable instance state. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84] public Acc add( Acc acc, String field, double threshold, String matchMode, String delims, int bufferLimit, int maxClusters) { - ... - this.threshold = threshold; - this.matchMode = matchMode; - this.delims = delims; - this.bufferLimit = bufferLimit; - this.maxClusters = maxClusters; - ... + String processedField = (field != null) ? field : ""; + + acc.evaluate(processedField); + + if (bufferLimit > 0 && acc.bufferSize() == bufferLimit) { + acc.partialMerge(threshold, matchMode, delims, maxClusters); + acc.clearBuffer(); + } + + return acc; } Suggestion importance[1-10]: 6 __ Why: Mutating instance fields like `this.threshold`, `this.matchMode`, etc. inside `add` is not thread-safe and can cause incorrect behavior in concurrent scenarios. The improved code removes these mutations and passes parameters directly, which is a valid correctness concern.	Low
General	Declare constant before its use in field initializer `MAX_CACHE_SIZE` is referenced in the field initializer before it is declared as a `static final` field below it. In Java, instance field initializers can reference static fields declared later in the class, but this ordering is confusing and error-prone. Move the `MAX_CACHE_SIZE` declaration before the `vectorCache` field to make the dependency explicit and avoid potential issues. common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [26-33] +private static final int MAX_CACHE_SIZE = 10000; + private final Map<String, Map<CharSequence, Integer>> vectorCache = new LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) { @Override protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) { return size() > MAX_CACHE_SIZE; } }; -private static final int MAX_CACHE_SIZE = 10000; Suggestion importance[1-10]: 4 __ Why: While Java allows static fields to be referenced in instance initializers regardless of declaration order, moving `MAX_CACHE_SIZE` before `vectorCache` improves readability and avoids confusion. This is a minor style/maintainability improvement.	Low
General	Guard against empty child list before access Calling `node.getChild().get(0)` without checking if the child list is empty will throw an `IndexOutOfBoundsException` if the cluster node has no child (e.g., when used as the root node or in error scenarios). Add a null/empty check before accessing the first child, consistent with how `getChild()` can return an empty list per the `Cluster.getChild()` implementation. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-987] @Override public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) { - String child = node.getChild().get(0).accept(this, context); - ... + List<UnresolvedPlan> children = node.getChild(); + String child = children.isEmpty() ? "" : children.get(0).accept(this, context); + String sourceField = visitExpression(node.getSourceField()); + StringBuilder command = new StringBuilder(); + command.append(child).append(" \| cluster ").append(sourceField); + + command.append(" t=").append(node.getThreshold()); + + if (!"termlist".equals(node.getMatchMode())) { + command.append(" match=").append(node.getMatchMode()); + } + + command.append(" labelfield=").append(MASK_COLUMN); + command.append(" countfield=").append(MASK_COLUMN); + + return command.toString(); } Suggestion importance[1-10]: 4 __ Why: Calling `node.getChild().get(0)` without checking for an empty list could throw an `IndexOutOfBoundsException` in edge cases. However, in normal PPL query processing, the cluster node always has a child, making this a low-probability defensive improvement.	Low

Suggestions up to commit 2d99d92

Category	Suggestion	Impact
Possible issue	Avoid mutable shared state in aggregate function instance Mutating instance fields (`this.threshold`, `this.matchMode`, etc.) inside `add()` is not thread-safe and can cause incorrect behavior when the same `ClusterLabelAggFunction` instance is reused across concurrent queries or multiple partitions. These configuration values should be stored in the `Acc` accumulator instead of on the function instance, or passed directly through to `partialMerge` and `value` without side effects on the function object. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84] public Acc add( Acc acc, String field, double threshold, String matchMode, String delims, int bufferLimit, int maxClusters) { - // Process all rows, even when field is null - convert null to empty string String processedField = (field != null) ? field : ""; - this.threshold = threshold; - this.matchMode = matchMode; - this.delims = delims; - this.bufferLimit = bufferLimit; - this.maxClusters = maxClusters; + // Store config in accumulator rather than mutating function instance fields + acc.setConfig(threshold, matchMode, delims, bufferLimit, maxClusters); + acc.evaluate(processedField); - acc.evaluate(processedField); - ... + if (bufferLimit > 0 && acc.bufferSize() == bufferLimit) { + acc.partialMerge(threshold, matchMode, delims, maxClusters); + acc.clearBuffer(); + } + + return acc; } Suggestion importance[1-10]: 6 __ Why: Mutating instance fields like `this.threshold`, `this.matchMode`, etc. inside `add()` is a legitimate thread-safety concern when the same function instance is reused across concurrent queries. However, the `improved_code` references a `setConfig` method on `Acc` that doesn't exist in the PR, making the suggested implementation incomplete and not directly applicable.	Low
Possible issue	Handle all parameter counts in varargs dispatch method The generic `add(Acc, Object...)` method only handles the case where `values.length ==` `1`, but the actual invocation from `visitCluster` passes 4 parameters (field, threshold, matchMode, delims). When Calcite dispatches through the generic varargs method with 4 arguments, it will throw a `SyntaxCheckException` instead of routing to the correct overload. Add a case for `values.length == 4` to handle the standard invocation path. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56] @Override public Acc add(Acc acc, Object... values) { - // Handle case where Calcite calls generic method with null field value if (values.length == 1) { String field = (values[0] != null) ? values[0].toString() : null; return add(acc, field); } + if (values.length == 4) { + String field = (values[0] != null) ? values[0].toString() : null; + double threshold = ((Number) values[1]).doubleValue(); + String matchMode = (values[2] != null) ? values[2].toString() : this.matchMode; + String delims = (values[3] != null) ? values[3].toString() : this.delims; + return add(acc, field, threshold, matchMode, delims); + } throw new SyntaxCheckException( "Unsupported function signature for cluster aggregate. Valid parameters include (field:" - + " required string), ..."); + + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:" + + " optional string algorithm 'termlist'\|'termset'\|'ngramset', default 'termlist')," + + " (delims: optional string delimiters, default ' '), (labelfield: optional string" + + " output field name, default 'cluster_label'), (countfield: optional string count" + + " field name, default 'cluster_count')"); } Suggestion importance[1-10]: 5 __ Why: The concern is valid in theory, but Calcite's reflection-based dispatch typically routes to the most specific typed overload rather than the varargs method. The actual call path from `visitCluster` passes 4 typed parameters which would match the `add(Acc, String, double, String, String)` overload directly. The risk is real only if Calcite falls back to the generic varargs method, making this a moderate defensive improvement.	Low
General	Use LRU eviction policy for vector cache The `vectorCache` is a `HashMap` (not a `LinkedHashMap`), so the eviction loop removes an arbitrary half of the entries rather than the least-recently-used ones. More importantly, the eviction check happens before inserting the new entry, so after eviction the cache still holds `MAX_CACHE_SIZE / 2` entries and the new entry is added, which is correct, but using a plain `HashMap` means the eviction order is unpredictable and may repeatedly evict recently-computed vectors. Consider using a `LinkedHashMap` with access-order to implement a proper LRU eviction policy. common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [87-102] -private Map<CharSequence, Integer> vectorizeWithCache(String value) { - Map<CharSequence, Integer> cached = vectorCache.get(value); - if (cached != null) { - return cached; - } - if (vectorCache.size() > MAX_CACHE_SIZE) { - var it = vectorCache.keySet().iterator(); - for (int i = 0; i < MAX_CACHE_SIZE / 2 && it.hasNext(); i++) { - it.next(); - it.remove(); - } - } - Map<CharSequence, Integer> result = vectorize(value); - vectorCache.put(value, result); - return result; -} +private final Map<String, Map<CharSequence, Integer>> vectorCache = + new java.util.LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) { + @Override + protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) { + return size() > MAX_CACHE_SIZE; + } + }; Suggestion importance[1-10]: 4 __ Why: The suggestion to use a `LinkedHashMap` with access-order for proper LRU eviction is a valid improvement over the current arbitrary eviction from a plain `HashMap`. However, this is a performance/quality optimization rather than a correctness issue, and the current approach still bounds memory usage correctly.	Low
General	Guard against empty child list before access `node.getChild().get(0)` will throw an `IndexOutOfBoundsException` if the child list is empty (e.g., when the `Cluster` node has not yet been attached to a child plan). The `getChild()` method in `Cluster` returns an empty list when `child` is `null`. Add a null/empty guard before accessing the first child. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-987] @Override public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) { - String child = node.getChild().get(0).accept(this, context); - ... + List<UnresolvedPlan> children = node.getChild(); + if (children.isEmpty()) { + return ""; + } + String child = children.get(0).accept(this, context); + String sourceField = visitExpression(node.getSourceField()); + StringBuilder command = new StringBuilder(); + command.append(child).append(" \| cluster ").append(sourceField); + + command.append(" t=").append(node.getThreshold()); + + if (!"termlist".equals(node.getMatchMode())) { + command.append(" match=").append(node.getMatchMode()); + } + + command.append(" labelfield=").append(MASK_COLUMN); + command.append(" countfield=").append(MASK_COLUMN); + + return command.toString(); } Suggestion importance[1-10]: 3 __ Why: While the guard is technically correct, in practice `visitCluster` is only called on a fully-constructed AST where the `Cluster` node always has a child attached via the parser pipeline. The `IndexOutOfBoundsException` scenario is unlikely in normal usage, making this a low-impact defensive check.	Low

Suggestions up to commit e009b93

Category	Suggestion	Impact
Possible issue	Handle multi-argument varargs dispatch correctly The generic `add(Acc, Object...)` method only handles the single-argument case and throws for any other number of arguments. However, Calcite may call this varargs method with multiple arguments (e.g., field + threshold + matchMode + delims) when the typed overloads are not resolved at runtime. This would cause a `SyntaxCheckException` at query execution time instead of routing to the correct typed overload. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56] @Override public Acc add(Acc acc, Object... values) { - // Handle case where Calcite calls generic method with null field value if (values.length == 1) { String field = (values[0] != null) ? values[0].toString() : null; return add(acc, field); + } else if (values.length == 4) { + String field = (values[0] != null) ? values[0].toString() : null; + double threshold = ((Number) values[1]).doubleValue(); + String matchMode = (values[2] != null) ? values[2].toString() : this.matchMode; + String delims = (values[3] != null) ? values[3].toString() : this.delims; + return add(acc, field, threshold, matchMode, delims); } throw new SyntaxCheckException( - "Unsupported function signature for cluster aggregate. ... + "Unsupported function signature for cluster aggregate. Valid parameters include (field:" + + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:" + + " optional string algorithm 'termlist'\|'termset'\|'ngramset', default 'termlist')," + + " (delims: optional string delimiters, default ' '), (labelfield: optional string" + + " output field name, default 'cluster_label'), (countfield: optional string count" + + " field name, default 'cluster_count')"); +} Suggestion importance[1-10]: 5 __ Why: The suggestion raises a valid concern that Calcite may invoke the varargs `add` method with multiple arguments at runtime, bypassing typed overloads. However, the PR already passes 4 parameters (`field`, `threshold`, `matchMode`, `delims`) via the `visitCluster` method, so this could be a real runtime issue worth addressing.	Low
Possible issue	Fix off-by-one in cache eviction threshold The cache eviction removes arbitrary entries since `HashMap` has no defined iteration order, which may not evict the least-recently-used entries. More critically, the eviction check happens before inserting the new entry, so the cache can grow to `MAX_CACHE_SIZE + 1` entries before eviction triggers. Consider checking `>=` `MAX_CACHE_SIZE` instead of `> MAX_CACHE_SIZE`. common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [92-98] -if (vectorCache.size() > MAX_CACHE_SIZE) { +if (vectorCache.size() >= MAX_CACHE_SIZE) { var it = vectorCache.keySet().iterator(); for (int i = 0; i < MAX_CACHE_SIZE / 2 && it.hasNext(); i++) { it.next(); it.remove(); } } Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that using `>` instead of `>=` allows the cache to grow to `MAX_CACHE_SIZE + 1` before eviction. This is a minor off-by-one issue that doesn't cause functional problems but is worth fixing for correctness.	Low
General	Avoid mutable instance state in aggregate function Mutating instance fields (`threshold`, `matchMode`, etc.) inside `add()` is not thread-safe and can cause incorrect behavior if the same `ClusterLabelAggFunction` instance is reused across concurrent queries. These configuration values should be set once at construction time or passed through the `Acc` object rather than stored as mutable instance state that is overwritten on every row. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [70-74] -this.threshold = threshold; -this.matchMode = matchMode; -this.delims = delims; -this.bufferLimit = bufferLimit; -this.maxClusters = maxClusters; +// Pass configuration through Acc or set at construction; do not mutate instance fields per-row. +// Remove the instance field mutations here and rely on the values passed as parameters. Suggestion importance[1-10]: 5 __ Why: Mutating instance fields like `threshold`, `matchMode`, etc. on every call to `add()` is a legitimate thread-safety concern. The `improved_code` is just a comment and doesn't provide a concrete fix, but the issue itself is valid for concurrent query scenarios.	Low
General	Preserve all parameters in anonymized output The anonymizer always emits `labelfield` and `countfield` regardless of whether `showcount` is enabled or whether these fields differ from their defaults. This produces anonymized output that doesn't accurately reflect the original query structure. Additionally, `labelonly` and `showcount` boolean parameters are silently dropped, which may cause the anonymized query to be semantically different from the original. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [977-986] command.append(" t=").append(node.getThreshold()); if (!"termlist".equals(node.getMatchMode())) { command.append(" match=").append(node.getMatchMode()); } command.append(" labelfield=").append(MASK_COLUMN); command.append(" countfield=").append(MASK_COLUMN); +if (node.isLabelOnly()) { + command.append(" labelonly=true"); +} +if (node.isShowCount()) { + command.append(" showcount=true"); +} + Suggestion importance[1-10]: 3 __ Why: The suggestion correctly notes that `labelonly` and `showcount` are silently dropped in the anonymized output, making it semantically different from the original query. However, this is a minor issue in an anonymizer utility and doesn't affect query execution correctness.	Low

Suggestions up to commit 5792e98

Category	Suggestion	Impact
Possible issue	Fix unsafe cache eviction inside computeIfAbsent The cache eviction logic inside `computeIfAbsent` is unsafe and ineffective. Modifying the `ConcurrentHashMap` while inside `computeIfAbsent` can cause a `ConcurrentModificationException` or deadlock (as documented in Java). The eviction check should be performed outside the `computeIfAbsent` call, and the parallel stream removal is also unreliable for a `ConcurrentHashMap`. common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [88-99] private Map<CharSequence, Integer> vectorizeWithCache(String value) { - return vectorCache.computeIfAbsent( - value, - k -> { - if (vectorCache.size() > MAX_CACHE_SIZE) { - vectorCache.keySet().parallelStream() - .limit(MAX_CACHE_SIZE / 2) - .forEach(vectorCache::remove); - } - return vectorize(k); - }); + if (vectorCache.size() > MAX_CACHE_SIZE) { + vectorCache.keySet().stream() + .limit(MAX_CACHE_SIZE / 2) + .forEach(vectorCache::remove); + } + return vectorCache.computeIfAbsent(value, this::vectorize); } Suggestion importance[1-10]: 8 __ Why: Modifying a `ConcurrentHashMap` inside `computeIfAbsent` is explicitly documented as unsafe in Java and can cause deadlocks or infinite loops. Moving the eviction check outside the lambda is a critical correctness fix.	Medium
Possible issue	Handle all parameter counts in varargs add method The generic `add(Acc acc, Object... values)` method only handles the case where `values.length == 1`, but Calcite may call this method with the full set of parameters (field, threshold, matchMode, delims). This means calls with more than one argument will always throw a `SyntaxCheckException` instead of being dispatched correctly, breaking the function when all parameters are provided via the varargs path. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56] @Override public Acc add(Acc acc, Object... values) { - // Handle case where Calcite calls generic method with null field value if (values.length == 1) { String field = (values[0] != null) ? values[0].toString() : null; return add(acc, field); + } else if (values.length == 4) { + String field = (values[0] != null) ? values[0].toString() : null; + double threshold = ((Number) values[1]).doubleValue(); + String matchMode = (String) values[2]; + String delims = (String) values[3]; + return add(acc, field, threshold, matchMode, delims); } throw new SyntaxCheckException( - "Unsupported function signature for cluster aggregate. ... + "Unsupported function signature for cluster aggregate. Valid parameters include (field:" + + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:" + + " optional string algorithm 'termlist'\|'termset'\|'ngramset', default 'termlist')," + + " (delims: optional string delimiters, default ' '), (labelfield: optional string" + + " output field name, default 'cluster_label'), (countfield: optional string count" + + " field name, default 'cluster_count')"); +} Suggestion importance[1-10]: 5 __ Why: The suggestion is valid in principle, but the actual call path from Calcite uses the typed overloads rather than the varargs method, so this is a defensive improvement rather than a critical bug fix. The improved code adds handling for 4-argument calls which may not be needed in practice.	Low
General	Remove unsafe mutable instance state in add method Mutating instance fields (`this.threshold`, `this.matchMode`, etc.) inside the `add` method is not thread-safe and can cause incorrect behavior when the aggregate function is used concurrently. These configuration values should be passed through the `Acc` accumulator or stored immutably, rather than being set as mutable instance state during accumulation. core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84] public Acc add( Acc acc, String field, double threshold, String matchMode, String delims, int bufferLimit, int maxClusters) { - ... - this.threshold = threshold; - this.matchMode = matchMode; - this.delims = delims; - this.bufferLimit = bufferLimit; - this.maxClusters = maxClusters; - ... + String processedField = (field != null) ? field : ""; + acc.evaluate(processedField); + + if (bufferLimit > 0 && acc.bufferSize() == bufferLimit) { + acc.partialMerge(threshold, matchMode, delims, maxClusters); + acc.clearBuffer(); + } + + return acc; +} + Suggestion importance[1-10]: 6 __ Why: Mutating instance fields like `this.threshold` and `this.matchMode` inside `add` is not thread-safe and could cause incorrect behavior under concurrent use. However, the `result` method relies on these instance fields, so the fix requires also updating `result` to pass parameters differently, making the improved code incomplete as shown.	Low
General	Guard against empty child list access Calling `node.getChild().get(0)` without checking if the child list is empty will throw an `IndexOutOfBoundsException` if the node has no child (e.g., when the `Cluster` node is the root or is not yet attached). A null/empty check should be added before accessing the first child. ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-972] @Override public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) { - String child = node.getChild().get(0).accept(this, context); - ... + List<UnresolvedPlan> children = node.getChild(); + if (children.isEmpty()) { + return ""; + } + String child = children.get(0).accept(this, context); + String sourceField = visitExpression(node.getSourceField()); + StringBuilder command = new StringBuilder(); + command.append(child).append(" \| cluster ").append(sourceField); + command.append(" t=").append(node.getThreshold()); + + if (!"termlist".equals(node.getMatchMode())) { + command.append(" match=").append(node.getMatchMode()); + } + + command.append(" labelfield=").append(MASK_COLUMN); + command.append(" countfield=").append(MASK_COLUMN); + + return command.toString(); +} + Suggestion importance[1-10]: 4 __ Why: The `Cluster` node in normal usage always has a child attached via `attach()`, so this is a defensive check for an edge case. The suggestion is valid but has low practical impact since the anonymizer is only called on fully-constructed ASTs.	Low

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-30T08:05:43Z

Persistent review updated to latest commit 152d2d4

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-30T08:46:06Z

Persistent review updated to latest commit 74001d0

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-30T16:26:51Z

Failed to generate code suggestions for PR

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-30T18:57:45Z

Failed to generate code suggestions for PR

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-30T22:59:12Z

Persistent review updated to latest commit 5792e98

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-30T23:08:12Z

Persistent review updated to latest commit e009b93

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-30T23:24:32Z

Persistent review updated to latest commit 2d99d92

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-31T05:45:29Z

Persistent review updated to latest commit 1763fb8

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-31T16:23:21Z

Persistent review updated to latest commit 3fbd709

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-03-31T16:34:00Z

Persistent review updated to latest commit 5bfaf69

implement cluster command

8f2804e

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix anonymizer

b3cc8ce

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix default values

152d2d4

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix anonymizer

74001d0

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

ritvibhatt added 2 commits March 30, 2026 02:49

fix integ tests

2edff79

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix explain tests

082a8e9

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix doctest

df7873a

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix docs and null handling

5792e98

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

address concurrency

e009b93

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix cross cluster tests

2d99d92

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix integ tests

1763fb8

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix formatting

3fbd709

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

fix tests

5bfaf69

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

ritvibhatt marked this pull request as ready for review March 31, 2026 16:51

ritvibhatt requested review from joshuali925 and ps48 as code owners March 31, 2026 16:51

ritvibhatt requested review from LantaoJin, RyanL1997, Swiddis, acarbonetto, ahkcs, anirudha, dai-chen, mengweieric, noCharger, penghuo, qianheng-aws, songkant-aws, vamsimanohar, ykmr1224 and yuancu as code owners March 31, 2026 16:51

Conversation

ritvibhatt commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Description

Syntax

Supported algorithms

Behavior

Changes

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 5bfaf69)

Uh oh!

github-actions bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ritvibhatt commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 30, 2026 •

edited

Loading

(Review updated until commit `5bfaf69`)

github-actions bot commented Mar 30, 2026 •

edited

Loading