Skip to content

Add cluster command for PPL#5265

Open
ritvibhatt wants to merge 13 commits intoopensearch-project:mainfrom
ritvibhatt:ppl-cluster-command
Open

Add cluster command for PPL#5265
ritvibhatt wants to merge 13 commits intoopensearch-project:mainfrom
ritvibhatt:ppl-cluster-command

Conversation

@ritvibhatt
Copy link
Copy Markdown
Contributor

@ritvibhatt ritvibhatt commented Mar 25, 2026

Description

Description

Adds the cluster command to PPL, which groups documents into clusters based on text similarity.

Syntax

cluster [t=] [match=] [labelfield=] [countfield=] [showcount=] [labelonly=] [delims=]

Supported algorithms

  • termlist (default) — positional term frequency, word order matters
  • termset — bag-of-words, word order ignored
  • ngramset — character trigram frequency for fuzzy matching

All algorithms use cosine similarity against the configured threshold to determine cluster membership.

Behavior

  • By default (labelonly=false), deduplicates results to one
    representative row per cluster
  • labelonly=true keeps all rows and adds the cluster label
  • showcount=true adds a count of how many documents belong to each cluster
  • Rows where the source field is null are filtered out before clustering

Changes

  • CalciteRelNodeVisitor: implements visitCluster with window-based clustering, null filtering, and optional dedup/showcount
  • ClusterLabelAggFunction: window aggregate that buffers rows, runs greedy clustering, and returns an array of cluster labels
  • TextSimilarityClustering: cosine similarity computation with LRU-cached vectorization for termlist, termset, and ngramset modes
  • Cluster AST node, parser rules, and PPL grammar additions
  • Documentation with 7 doctest-verified examples
  • Unit tests for logical plan verification
  • Integration tests covering all parameters, dedup behavior, labelonly mode, and multi-row clustering

Related Issues

Resolves #5255

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit df7873a.

PathLineSeverityDescription
common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java80mediumCache eviction is performed inside ConcurrentHashMap.computeIfAbsent() via parallelStream().forEach(vectorCache::remove). Modifying a ConcurrentHashMap while a computeIfAbsent call is in progress on the same map can cause a deadlock or infinite loop (known JDK issue). If triggered under adversarial input designed to hit MAX_CACHE_SIZE boundary conditions, this could be used as a denial-of-service vector against the query engine. The pattern is anomalous and not necessary — eviction could be done outside the lambda.
ppl/src/main/antlr/OpenSearchPPLParser.g41592lowThe ident grammar rule was restructured: previously keywords-as-identifiers fell through a separate alternative; now they are inlined as (ID | keywordsCanBeId) in the primary alternative. Combined with adding the single-character token T as both a parameter keyword and a searchable keyword, this increases the ambiguity surface of the PPL grammar. Misidentification of tokens could allow carefully crafted queries to bypass identifier-based access control checks, though no specific exploit path is evident from this diff alone.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Reviewer Guide 🔍

(Review updated until commit 5bfaf69)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Core clustering algorithm and UDAF implementation

Relevant files:

  • common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java
  • core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java
  • core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
  • core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java

Sub-PR theme: PPL grammar, AST, and Calcite plan visitor for cluster command

Relevant files:

  • ppl/src/main/antlr/OpenSearchPPLLexer.g4
  • ppl/src/main/antlr/OpenSearchPPLParser.g4
  • core/src/main/java/org/opensearch/sql/ast/tree/Cluster.java
  • core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
  • ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java
  • core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Sub-PR theme: Tests and documentation for cluster command

Relevant files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteClusterCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
  • integ-test/src/test/java/org/opensearch/sql/security/CrossClusterSearchIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLClusterTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
  • docs/user/ppl/cmd/cluster.md
  • docs/user/ppl/index.md

⚡ Recommended focus areas for review

Thread Safety

The ClusterLabelAggFunction class stores mutable state (threshold, matchMode, delims, bufferLimit, maxClusters) as instance fields that are mutated in the add methods. If the same function instance is reused across multiple queries or threads, this could lead to race conditions or incorrect results. The state should be encapsulated in the Acc accumulator instead.

private int bufferLimit = 50000; // Configurable buffer size
private int maxClusters = 10000; // Limit cluster count to prevent memory explosion
private double threshold = 0.8;
private String matchMode = "termlist";
private String delims = " ";

@Override
public Acc init() {
  return new Acc();
}

@Override
public Object result(Acc acc) {
  return acc.value(threshold, matchMode, delims, maxClusters);
}

@Override
public Acc add(Acc acc, Object... values) {
  // Handle case where Calcite calls generic method with null field value
  if (values.length == 1) {
    String field = (values[0] != null) ? values[0].toString() : null;
    return add(acc, field);
  }

  throw new SyntaxCheckException(
      "Unsupported function signature for cluster aggregate. Valid parameters include (field:"
          + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:"
          + " optional string algorithm 'termlist'|'termset'|'ngramset', default 'termlist'),"
          + " (delims: optional string delimiters, default ' '), (labelfield: optional string"
          + " output field name, default 'cluster_label'), (countfield: optional string count"
          + " field name, default 'cluster_count')");
}

public Acc add(
    Acc acc,
    String field,
    double threshold,
    String matchMode,
    String delims,
    int bufferLimit,
    int maxClusters) {
  // Process all rows, even when field is null - convert null to empty string
  // This ensures the result array matches input row count
  String processedField = (field != null) ? field : "";

  this.threshold = threshold;
  this.matchMode = matchMode;
  this.delims = delims;
  this.bufferLimit = bufferLimit;
  this.maxClusters = maxClusters;

  acc.evaluate(processedField);

  if (bufferLimit > 0 && acc.bufferSize() == bufferLimit) {
    acc.partialMerge(threshold, matchMode, delims, maxClusters);
    acc.clearBuffer();
  }

  return acc;
}
Possible Issue

In add(Acc acc, Object... values), when values.length == 1, the method delegates to add(Acc acc, String field) which uses the instance-level defaults for threshold, matchMode, etc. However, the Calcite visitor in CalciteRelNodeVisitor.visitCluster always passes 4 parameters (field, threshold, matchMode, delims) to the window function. The single-argument path may never be exercised as intended, and if it is, it silently uses stale defaults rather than the user-specified values.

public Acc add(Acc acc, Object... values) {
  // Handle case where Calcite calls generic method with null field value
  if (values.length == 1) {
    String field = (values[0] != null) ? values[0].toString() : null;
    return add(acc, field);
  }

  throw new SyntaxCheckException(
      "Unsupported function signature for cluster aggregate. Valid parameters include (field:"
          + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:"
          + " optional string algorithm 'termlist'|'termset'|'ngramset', default 'termlist'),"
          + " (delims: optional string delimiters, default ' '), (labelfield: optional string"
          + " output field name, default 'cluster_label'), (countfield: optional string count"
          + " field name, default 'cluster_count')");
}
Cache Concurrency

The vectorCache is a LinkedHashMap with LRU eviction, but LinkedHashMap is not thread-safe. If TextSimilarityClustering instances are shared across threads (e.g., via the UDAF), concurrent access to vectorCache via computeIfAbsent could cause data corruption or infinite loops. Consider using Collections.synchronizedMap or a ConcurrentLinkedHashMap.

private final Map<String, Map<CharSequence, Integer>> vectorCache =
    new LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) {
      @Override
      protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) {
        return size() > MAX_CACHE_SIZE;
      }
    };
Keyword Conflict

Adding T as a dedicated lexer token is very broad and may conflict with existing identifiers or other grammar rules that use T as a field/variable name. This could break existing queries that use t as a field name. The keywordsCanBeId list includes T, but the lexer token priority may still cause tokenization issues for common single-letter field names.

T:                                  'T';
Row Ordering

The cluster implementation relies on ROW_NUMBER() OVER () (no ORDER BY) to index into the cluster labels array. Without a deterministic ORDER BY clause in the window spec, the row ordering is non-deterministic, meaning the array index may not correspond to the correct row across different executions. This could produce incorrect cluster label assignments.

String rowNumAlias = "_cluster_row_idx";
RexNode rowNum =
    context
        .relBuilder
        .aggregateCall(SqlStdOperatorTable.ROW_NUMBER)
        .over()
        .rowsBetween(RexWindowBounds.UNBOUNDED_PRECEDING, RexWindowBounds.CURRENT_ROW)
        .as(rowNumAlias);
context.relBuilder.projectPlus(rowNum);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Code Suggestions ✨

Latest suggestions up to 5bfaf69

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle all expected argument counts in generic add method

The generic add(Acc, Object...) method only handles the case where values.length ==
1, but Calcite may call this method with 4 parameters (field, threshold, matchMode,
delims) as configured in visitCluster. This would cause an unexpected
SyntaxCheckException at runtime instead of routing to the correct overload. Add
handling for the 4-argument case to match the parameters passed from visitCluster.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56]

 @Override
 public Acc add(Acc acc, Object... values) {
-  // Handle case where Calcite calls generic method with null field value
   if (values.length == 1) {
     String field = (values[0] != null) ? values[0].toString() : null;
     return add(acc, field);
   }
+  if (values.length == 4) {
+    String field = (values[0] != null) ? values[0].toString() : null;
+    double threshold = ((Number) values[1]).doubleValue();
+    String matchMode = (values[2] != null) ? values[2].toString() : this.matchMode;
+    String delims = (values[3] != null) ? values[3].toString() : this.delims;
+    return add(acc, field, threshold, matchMode, delims);
+  }
 
   throw new SyntaxCheckException(
-      "Unsupported function signature for cluster aggregate. ...");
+      "Unsupported function signature for cluster aggregate. Valid parameters include (field:"
+          + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:"
+          + " optional string algorithm 'termlist'|'termset'|'ngramset', default 'termlist'),"
+          + " (delims: optional string delimiters, default ' '), (labelfield: optional string"
+          + " output field name, default 'cluster_label'), (countfield: optional string count"
+          + " field name, default 'cluster_count')");
 }
Suggestion importance[1-10]: 7

__

Why: The generic add method only handles values.length == 1, but Calcite may invoke it with 4 arguments (field, threshold, matchMode, delims) as configured in visitCluster. Without handling this case, a SyntaxCheckException would be thrown at runtime instead of routing to the correct overload.

Medium
Avoid race conditions from mutable shared instance state

The add method mutates instance fields (this.threshold, this.matchMode, etc.) which
are shared state on the ClusterLabelAggFunction instance. In a concurrent or
multi-threaded execution environment (common in query engines), this creates a race
condition where one thread's parameters can overwrite another's. These configuration
values should be stored in the Acc accumulator instead of on the function instance.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84]

-public Acc add(
-    Acc acc,
-    String field,
-    double threshold,
-    String matchMode,
-    String delims,
-    int bufferLimit,
-    int maxClusters) {
-  ...
-  this.threshold = threshold;
-  this.matchMode = matchMode;
-  this.delims = delims;
-  this.bufferLimit = bufferLimit;
-  this.maxClusters = maxClusters;
+public static class Acc implements Accumulator {
+  private final List<String> buffer = new ArrayList<>();
+  private final List<ClusterRepresentative> globalClusters = new ArrayList<>();
+  private final List<Integer> allLabels = new ArrayList<>();
+  private int nextClusterId = 1;
+  // Store config in accumulator to avoid shared mutable state
+  double threshold = 0.8;
+  String matchMode = "termlist";
+  String delims = " ";
+  int bufferLimit = 50000;
+  int maxClusters = 10000;
   ...
 }
+// In add(...) method, update acc fields instead of this fields:
+acc.threshold = threshold;
+acc.matchMode = matchMode;
+acc.delims = delims;
+acc.bufferLimit = bufferLimit;
+acc.maxClusters = maxClusters;
Suggestion importance[1-10]: 6

__

Why: Mutating instance fields like this.threshold, this.matchMode, etc. in the add method creates potential race conditions in concurrent environments. Moving these configuration values into the Acc accumulator would be safer, though the actual risk depends on how Calcite instantiates and uses the aggregate function.

Low
General
Conditionally include optional parameters in anonymized output

The anonymizer always appends labelfield and countfield regardless of whether
showcount or labelonly were set, which produces output that doesn't match the
original command structure and may confuse log analysis. Additionally, labelonly and
showcount boolean parameters are never included in the anonymized output, losing
information about the command's behavior. Include these parameters conditionally to
produce a more accurate anonymized representation.

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-987]

 @Override
 public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) {
   String child = node.getChild().get(0).accept(this, context);
-  ...
+  String sourceField = visitExpression(node.getSourceField());
+  StringBuilder command = new StringBuilder();
+  command.append(child).append(" | cluster ").append(sourceField);
+
+  command.append(" t=").append(node.getThreshold());
+
+  if (!"termlist".equals(node.getMatchMode())) {
+    command.append(" match=").append(node.getMatchMode());
+  }
+
   command.append(" labelfield=").append(MASK_COLUMN);
-  command.append(" countfield=").append(MASK_COLUMN);
+
+  if (node.isShowCount()) {
+    command.append(" countfield=").append(MASK_COLUMN);
+    command.append(" showcount=true");
+  }
+
+  if (node.isLabelOnly()) {
+    command.append(" labelonly=true");
+  }
 
   return command.toString();
 }
Suggestion importance[1-10]: 4

__

Why: The anonymizer always appends labelfield and countfield and omits showcount/labelonly flags, producing output that doesn't accurately reflect the original command structure. The improved code conditionally includes these parameters for a more faithful anonymized representation.

Low
Declare constant before use in field initializer

MAX_CACHE_SIZE is referenced in the field initializer before it is declared as a
static final field below it. In Java, instance field initializers can reference
static fields declared later in the class, but this ordering is fragile and may
cause confusion or issues in some edge cases. Move the MAX_CACHE_SIZE declaration
before the vectorCache field declaration.

common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [26-33]

+private static final int MAX_CACHE_SIZE = 10000;
+
 private final Map<String, Map<CharSequence, Integer>> vectorCache =
     new LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) {
       @Override
       protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) {
         return size() > MAX_CACHE_SIZE;
       }
     };
-private static final int MAX_CACHE_SIZE = 10000;
Suggestion importance[1-10]: 3

__

Why: In Java, static fields are initialized before instance fields regardless of declaration order, so MAX_CACHE_SIZE being declared after vectorCache is technically valid. This is a minor code style/readability improvement with no functional impact.

Low

Previous suggestions

Suggestions up to commit 3fbd709
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle all expected argument counts in generic add method

The generic add(Acc, Object...) method only handles the case where values.length ==
1, but Calcite may call this method with 4 parameters (field, threshold, matchMode,
delims) as defined in the window function invocation. This would cause an unexpected
SyntaxCheckException at runtime instead of routing to the correct overloaded add
method. Add handling for the 4-argument case (and potentially others) before
throwing the exception.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56]

 @Override
 public Acc add(Acc acc, Object... values) {
-  // Handle case where Calcite calls generic method with null field value
   if (values.length == 1) {
     String field = (values[0] != null) ? values[0].toString() : null;
     return add(acc, field);
   }
+  if (values.length == 4) {
+    String field = (values[0] != null) ? values[0].toString() : null;
+    double threshold = values[1] != null ? ((Number) values[1]).doubleValue() : this.threshold;
+    String matchMode = values[2] != null ? values[2].toString() : this.matchMode;
+    String delims = values[3] != null ? values[3].toString() : this.delims;
+    return add(acc, field, threshold, matchMode, delims);
+  }
 
   throw new SyntaxCheckException(
-      "Unsupported function signature for cluster aggregate. ...");
+      "Unsupported function signature for cluster aggregate. Valid parameters include (field:"
+          + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:"
+          + " optional string algorithm 'termlist'|'termset'|'ngramset', default 'termlist'),"
+          + " (delims: optional string delimiters, default ' '), (labelfield: optional string"
+          + " output field name, default 'cluster_label'), (countfield: optional string count"
+          + " field name, default 'cluster_count')");
 }
Suggestion importance[1-10]: 7

__

Why: The generic add method only handles values.length == 1, but Calcite may invoke it with 4 arguments (field, threshold, matchMode, delims) as passed in the window function call. This could cause a runtime SyntaxCheckException instead of routing to the correct overload, making this a real potential issue.

Medium
Avoid mutable shared state in aggregate function instance

Mutating instance fields (this.threshold, this.matchMode, etc.) inside the add
method is not thread-safe and can cause incorrect behavior when the aggregate
function instance is shared across rows or threads. These configuration values
should be stored in the Acc accumulator instead of as mutable instance fields, or
the instance should be treated as stateless with parameters passed through the
accumulator.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84]

-public Acc add(
-    Acc acc,
-    String field,
-    double threshold,
-    String matchMode,
-    String delims,
-    int bufferLimit,
-    int maxClusters) {
-  ...
-  this.threshold = threshold;
-  this.matchMode = matchMode;
-  this.delims = delims;
-  this.bufferLimit = bufferLimit;
-  this.maxClusters = maxClusters;
-  ...
+// Move threshold, matchMode, delims, bufferLimit, maxClusters into Acc
+public static class Acc implements Accumulator {
+  private final List<String> buffer = new ArrayList<>();
+  private final List<ClusterRepresentative> globalClusters = new ArrayList<>();
+  private final List<Integer> allLabels = new ArrayList<>();
+  private int nextClusterId = 1;
+  // Configuration stored per-accumulator
+  double threshold = 0.8;
+  String matchMode = "termlist";
+  String delims = " ";
+  int bufferLimit = 50000;
+  int maxClusters = 10000;
+  // ... rest of Acc
 }
Suggestion importance[1-10]: 6

__

Why: Mutating instance fields like this.threshold, this.matchMode, etc. inside add is not thread-safe and can cause incorrect behavior if the function instance is shared. Moving these into the Acc accumulator would be a cleaner design, though the practical impact depends on how Calcite manages function instances.

Low
General
Declare constant before its use in field initializer

MAX_CACHE_SIZE is referenced in the field initializer before it is declared as a
static final field below it. In Java, instance field initializers can reference
static fields declared later in the class, but this ordering is confusing and
error-prone. Move the MAX_CACHE_SIZE declaration before the vectorCache field to
make the dependency order explicit and avoid potential issues.

common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [26-33]

+private static final int MAX_CACHE_SIZE = 10000;
+
 private final Map<String, Map<CharSequence, Integer>> vectorCache =
     new LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) {
       @Override
       protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) {
         return size() > MAX_CACHE_SIZE;
       }
     };
-private static final int MAX_CACHE_SIZE = 10000;
Suggestion importance[1-10]: 4

__

Why: While Java technically allows referencing a static final field declared later in the class from an instance field initializer, the current ordering of vectorCache before MAX_CACHE_SIZE is confusing and error-prone. Moving MAX_CACHE_SIZE before vectorCache improves readability and makes the dependency explicit.

Low
Include all parameters in anonymized cluster command output

The anonymizer always appends labelfield and countfield regardless of whether
showcount or labelonly were set, which produces output that doesn't match the
original command structure and may confuse log analysis. Additionally, labelonly and
showcount parameters are silently dropped from the anonymized output, losing
information about the command's behavior. These parameters should be included in the
anonymized output.

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-987]

 @Override
 public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) {
   String child = node.getChild().get(0).accept(this, context);
-  ...
+  String sourceField = visitExpression(node.getSourceField());
+  StringBuilder command = new StringBuilder();
+  command.append(child).append(" | cluster ").append(sourceField);
+
+  command.append(" t=").append(node.getThreshold());
+
+  if (!"termlist".equals(node.getMatchMode())) {
+    command.append(" match=").append(node.getMatchMode());
+  }
+
   command.append(" labelfield=").append(MASK_COLUMN);
   command.append(" countfield=").append(MASK_COLUMN);
+
+  if (node.isLabelOnly()) {
+    command.append(" labelonly=true");
+  }
+  if (node.isShowCount()) {
+    command.append(" showcount=true");
+  }
 
   return command.toString();
 }
Suggestion importance[1-10]: 4

__

Why: The visitCluster anonymizer silently drops labelonly and showcount parameters, which reduces the fidelity of anonymized output for log analysis. Including these boolean flags (which don't contain sensitive data) would make the anonymized representation more accurate.

Low
Suggestions up to commit 1763fb8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle all expected argument counts in generic add method

The generic add(Acc, Object...) method only handles the case where values.length ==
1, but Calcite may call this method with 4 parameters (field, threshold, matchMode,
delims) as configured in visitCluster. This would cause a SyntaxCheckException at
runtime instead of routing to the correct overload. Add handling for the 4-argument
case to match the parameters passed from visitCluster.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56]

 @Override
 public Acc add(Acc acc, Object... values) {
-  // Handle case where Calcite calls generic method with null field value
   if (values.length == 1) {
     String field = (values[0] != null) ? values[0].toString() : null;
     return add(acc, field);
   }
+  if (values.length == 4) {
+    String field = (values[0] != null) ? values[0].toString() : null;
+    double threshold = ((Number) values[1]).doubleValue();
+    String matchMode = (values[2] != null) ? values[2].toString() : this.matchMode;
+    String delims = (values[3] != null) ? values[3].toString() : this.delims;
+    return add(acc, field, threshold, matchMode, delims);
+  }
 
   throw new SyntaxCheckException(
-      "Unsupported function signature for cluster aggregate. ...");
+      "Unsupported function signature for cluster aggregate. Valid parameters include (field:"
+          + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:"
+          + " optional string algorithm 'termlist'|'termset'|'ngramset', default 'termlist'),"
+          + " (delims: optional string delimiters, default ' '), (labelfield: optional string"
+          + " output field name, default 'cluster_label'), (countfield: optional string count"
+          + " field name, default 'cluster_count')");
 }
Suggestion importance[1-10]: 6

__

Why: The generic add method only handles values.length == 1, but Calcite may call it with 4 arguments (field, threshold, matchMode, delims) as configured in visitCluster. This could cause a SyntaxCheckException at runtime. However, Calcite's reflection-based dispatch typically routes to the most specific typed overload, so this may not be triggered in practice.

Low
Avoid mutating shared instance state in accumulator method

Mutating instance fields (this.threshold, this.matchMode, etc.) inside the add
method is not thread-safe and can cause incorrect behavior when the aggregate
function is used concurrently across multiple partitions or threads. These
configuration values should be set once at construction time or passed explicitly
through the accumulator rather than stored as mutable instance state.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84]

 public Acc add(
     Acc acc,
     String field,
     double threshold,
     String matchMode,
     String delims,
     int bufferLimit,
     int maxClusters) {
-  ...
-  this.threshold = threshold;
-  this.matchMode = matchMode;
-  this.delims = delims;
-  this.bufferLimit = bufferLimit;
-  this.maxClusters = maxClusters;
-  ...
+  String processedField = (field != null) ? field : "";
+
+  acc.evaluate(processedField);
+
+  if (bufferLimit > 0 && acc.bufferSize() == bufferLimit) {
+    acc.partialMerge(threshold, matchMode, delims, maxClusters);
+    acc.clearBuffer();
+  }
+
+  return acc;
 }
Suggestion importance[1-10]: 6

__

Why: Mutating instance fields like this.threshold, this.matchMode, etc. inside add is not thread-safe and can cause incorrect behavior in concurrent scenarios. The improved code removes these mutations and passes parameters directly, which is a valid correctness concern.

Low
General
Declare constant before its use in field initializer

MAX_CACHE_SIZE is referenced in the field initializer before it is declared as a
static final field below it. In Java, instance field initializers can reference
static fields declared later in the class, but this ordering is confusing and
error-prone. Move the MAX_CACHE_SIZE declaration before the vectorCache field to
make the dependency explicit and avoid potential issues.

common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [26-33]

+private static final int MAX_CACHE_SIZE = 10000;
+
 private final Map<String, Map<CharSequence, Integer>> vectorCache =
     new LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) {
       @Override
       protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) {
         return size() > MAX_CACHE_SIZE;
       }
     };
-private static final int MAX_CACHE_SIZE = 10000;
Suggestion importance[1-10]: 4

__

Why: While Java allows static fields to be referenced in instance initializers regardless of declaration order, moving MAX_CACHE_SIZE before vectorCache improves readability and avoids confusion. This is a minor style/maintainability improvement.

Low
Guard against empty child list before access

Calling node.getChild().get(0) without checking if the child list is empty will
throw an IndexOutOfBoundsException if the cluster node has no child (e.g., when used
as the root node or in error scenarios). Add a null/empty check before accessing the
first child, consistent with how getChild() can return an empty list per the
Cluster.getChild() implementation.

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-987]

 @Override
 public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) {
-  String child = node.getChild().get(0).accept(this, context);
-  ...
+  List<UnresolvedPlan> children = node.getChild();
+  String child = children.isEmpty() ? "" : children.get(0).accept(this, context);
+  String sourceField = visitExpression(node.getSourceField());
+  StringBuilder command = new StringBuilder();
+  command.append(child).append(" | cluster ").append(sourceField);
+
+  command.append(" t=").append(node.getThreshold());
+
+  if (!"termlist".equals(node.getMatchMode())) {
+    command.append(" match=").append(node.getMatchMode());
+  }
+
+  command.append(" labelfield=").append(MASK_COLUMN);
+  command.append(" countfield=").append(MASK_COLUMN);
+
+  return command.toString();
 }
Suggestion importance[1-10]: 4

__

Why: Calling node.getChild().get(0) without checking for an empty list could throw an IndexOutOfBoundsException in edge cases. However, in normal PPL query processing, the cluster node always has a child, making this a low-probability defensive improvement.

Low
Suggestions up to commit 2d99d92
CategorySuggestion                                                                                                                                    Impact
Possible issue
Avoid mutable shared state in aggregate function instance

Mutating instance fields (this.threshold, this.matchMode, etc.) inside add() is not
thread-safe and can cause incorrect behavior when the same ClusterLabelAggFunction
instance is reused across concurrent queries or multiple partitions. These
configuration values should be stored in the Acc accumulator instead of on the
function instance, or passed directly through to partialMerge and value without side
effects on the function object.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84]

 public Acc add(
     Acc acc,
     String field,
     double threshold,
     String matchMode,
     String delims,
     int bufferLimit,
     int maxClusters) {
-  // Process all rows, even when field is null - convert null to empty string
   String processedField = (field != null) ? field : "";
 
-  this.threshold = threshold;
-  this.matchMode = matchMode;
-  this.delims = delims;
-  this.bufferLimit = bufferLimit;
-  this.maxClusters = maxClusters;
+  // Store config in accumulator rather than mutating function instance fields
+  acc.setConfig(threshold, matchMode, delims, bufferLimit, maxClusters);
+  acc.evaluate(processedField);
 
-  acc.evaluate(processedField);
-  ...
+  if (bufferLimit > 0 && acc.bufferSize() == bufferLimit) {
+    acc.partialMerge(threshold, matchMode, delims, maxClusters);
+    acc.clearBuffer();
+  }
+
+  return acc;
 }
Suggestion importance[1-10]: 6

__

Why: Mutating instance fields like this.threshold, this.matchMode, etc. inside add() is a legitimate thread-safety concern when the same function instance is reused across concurrent queries. However, the improved_code references a setConfig method on Acc that doesn't exist in the PR, making the suggested implementation incomplete and not directly applicable.

Low
Handle all parameter counts in varargs dispatch method

The generic add(Acc, Object...) method only handles the case where values.length ==
1, but the actual invocation from visitCluster passes 4 parameters (field,
threshold, matchMode, delims). When Calcite dispatches through the generic varargs
method with 4 arguments, it will throw a SyntaxCheckException instead of routing to
the correct overload. Add a case for values.length == 4 to handle the standard
invocation path.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56]

 @Override
 public Acc add(Acc acc, Object... values) {
-  // Handle case where Calcite calls generic method with null field value
   if (values.length == 1) {
     String field = (values[0] != null) ? values[0].toString() : null;
     return add(acc, field);
   }
+  if (values.length == 4) {
+    String field = (values[0] != null) ? values[0].toString() : null;
+    double threshold = ((Number) values[1]).doubleValue();
+    String matchMode = (values[2] != null) ? values[2].toString() : this.matchMode;
+    String delims = (values[3] != null) ? values[3].toString() : this.delims;
+    return add(acc, field, threshold, matchMode, delims);
+  }
 
   throw new SyntaxCheckException(
       "Unsupported function signature for cluster aggregate. Valid parameters include (field:"
-          + " required string), ...");
+          + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:"
+          + " optional string algorithm 'termlist'|'termset'|'ngramset', default 'termlist'),"
+          + " (delims: optional string delimiters, default ' '), (labelfield: optional string"
+          + " output field name, default 'cluster_label'), (countfield: optional string count"
+          + " field name, default 'cluster_count')");
 }
Suggestion importance[1-10]: 5

__

Why: The concern is valid in theory, but Calcite's reflection-based dispatch typically routes to the most specific typed overload rather than the varargs method. The actual call path from visitCluster passes 4 typed parameters which would match the add(Acc, String, double, String, String) overload directly. The risk is real only if Calcite falls back to the generic varargs method, making this a moderate defensive improvement.

Low
General
Use LRU eviction policy for vector cache

The vectorCache is a HashMap (not a LinkedHashMap), so the eviction loop removes an
arbitrary half of the entries rather than the least-recently-used ones. More
importantly, the eviction check happens before inserting the new entry, so after
eviction the cache still holds MAX_CACHE_SIZE / 2 entries and the new entry is
added, which is correct, but using a plain HashMap means the eviction order is
unpredictable and may repeatedly evict recently-computed vectors. Consider using a
LinkedHashMap with access-order to implement a proper LRU eviction policy.

common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [87-102]

-private Map<CharSequence, Integer> vectorizeWithCache(String value) {
-  Map<CharSequence, Integer> cached = vectorCache.get(value);
-  if (cached != null) {
-    return cached;
-  }
-  if (vectorCache.size() > MAX_CACHE_SIZE) {
-    var it = vectorCache.keySet().iterator();
-    for (int i = 0; i < MAX_CACHE_SIZE / 2 && it.hasNext(); i++) {
-      it.next();
-      it.remove();
-    }
-  }
-  Map<CharSequence, Integer> result = vectorize(value);
-  vectorCache.put(value, result);
-  return result;
-}
+private final Map<String, Map<CharSequence, Integer>> vectorCache =
+    new java.util.LinkedHashMap<>(MAX_CACHE_SIZE, 0.75f, true) {
+      @Override
+      protected boolean removeEldestEntry(Map.Entry<String, Map<CharSequence, Integer>> eldest) {
+        return size() > MAX_CACHE_SIZE;
+      }
+    };
Suggestion importance[1-10]: 4

__

Why: The suggestion to use a LinkedHashMap with access-order for proper LRU eviction is a valid improvement over the current arbitrary eviction from a plain HashMap. However, this is a performance/quality optimization rather than a correctness issue, and the current approach still bounds memory usage correctly.

Low
Guard against empty child list before access

node.getChild().get(0) will throw an IndexOutOfBoundsException if the child list is
empty (e.g., when the Cluster node has not yet been attached to a child plan). The
getChild() method in Cluster returns an empty list when child is null. Add a
null/empty guard before accessing the first child.

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-987]

 @Override
 public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) {
-  String child = node.getChild().get(0).accept(this, context);
-  ...
+  List<UnresolvedPlan> children = node.getChild();
+  if (children.isEmpty()) {
+    return "";
+  }
+  String child = children.get(0).accept(this, context);
+  String sourceField = visitExpression(node.getSourceField());
+  StringBuilder command = new StringBuilder();
+  command.append(child).append(" | cluster ").append(sourceField);
+
+  command.append(" t=").append(node.getThreshold());
+
+  if (!"termlist".equals(node.getMatchMode())) {
+    command.append(" match=").append(node.getMatchMode());
+  }
+
+  command.append(" labelfield=").append(MASK_COLUMN);
+  command.append(" countfield=").append(MASK_COLUMN);
+
+  return command.toString();
 }
Suggestion importance[1-10]: 3

__

Why: While the guard is technically correct, in practice visitCluster is only called on a fully-constructed AST where the Cluster node always has a child attached via the parser pipeline. The IndexOutOfBoundsException scenario is unlikely in normal usage, making this a low-impact defensive check.

Low
Suggestions up to commit e009b93
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle multi-argument varargs dispatch correctly

The generic add(Acc, Object...) method only handles the single-argument case and
throws for any other number of arguments. However, Calcite may call this varargs
method with multiple arguments (e.g., field + threshold + matchMode + delims) when
the typed overloads are not resolved at runtime. This would cause a
SyntaxCheckException at query execution time instead of routing to the correct typed
overload.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56]

 @Override
 public Acc add(Acc acc, Object... values) {
-  // Handle case where Calcite calls generic method with null field value
   if (values.length == 1) {
     String field = (values[0] != null) ? values[0].toString() : null;
     return add(acc, field);
+  } else if (values.length == 4) {
+    String field = (values[0] != null) ? values[0].toString() : null;
+    double threshold = ((Number) values[1]).doubleValue();
+    String matchMode = (values[2] != null) ? values[2].toString() : this.matchMode;
+    String delims = (values[3] != null) ? values[3].toString() : this.delims;
+    return add(acc, field, threshold, matchMode, delims);
   }
 
   throw new SyntaxCheckException(
-      "Unsupported function signature for cluster aggregate. ...
+      "Unsupported function signature for cluster aggregate. Valid parameters include (field:"
+          + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:"
+          + " optional string algorithm 'termlist'|'termset'|'ngramset', default 'termlist'),"
+          + " (delims: optional string delimiters, default ' '), (labelfield: optional string"
+          + " output field name, default 'cluster_label'), (countfield: optional string count"
+          + " field name, default 'cluster_count')");
+}
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern that Calcite may invoke the varargs add method with multiple arguments at runtime, bypassing typed overloads. However, the PR already passes 4 parameters (field, threshold, matchMode, delims) via the visitCluster method, so this could be a real runtime issue worth addressing.

Low
Fix off-by-one in cache eviction threshold

The cache eviction removes arbitrary entries since HashMap has no defined iteration
order, which may not evict the least-recently-used entries. More critically, the
eviction check happens before inserting the new entry, so the cache can grow to
MAX_CACHE_SIZE + 1 entries before eviction triggers. Consider checking >=
MAX_CACHE_SIZE instead of > MAX_CACHE_SIZE.

common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [92-98]

-if (vectorCache.size() > MAX_CACHE_SIZE) {
+if (vectorCache.size() >= MAX_CACHE_SIZE) {
   var it = vectorCache.keySet().iterator();
   for (int i = 0; i < MAX_CACHE_SIZE / 2 && it.hasNext(); i++) {
     it.next();
     it.remove();
   }
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies that using > instead of >= allows the cache to grow to MAX_CACHE_SIZE + 1 before eviction. This is a minor off-by-one issue that doesn't cause functional problems but is worth fixing for correctness.

Low
General
Avoid mutable instance state in aggregate function

Mutating instance fields (threshold, matchMode, etc.) inside add() is not
thread-safe and can cause incorrect behavior if the same ClusterLabelAggFunction
instance is reused across concurrent queries. These configuration values should be
set once at construction time or passed through the Acc object rather than stored as
mutable instance state that is overwritten on every row.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [70-74]

-this.threshold = threshold;
-this.matchMode = matchMode;
-this.delims = delims;
-this.bufferLimit = bufferLimit;
-this.maxClusters = maxClusters;
+// Pass configuration through Acc or set at construction; do not mutate instance fields per-row.
+// Remove the instance field mutations here and rely on the values passed as parameters.
Suggestion importance[1-10]: 5

__

Why: Mutating instance fields like threshold, matchMode, etc. on every call to add() is a legitimate thread-safety concern. The improved_code is just a comment and doesn't provide a concrete fix, but the issue itself is valid for concurrent query scenarios.

Low
Preserve all parameters in anonymized output

The anonymizer always emits labelfield and countfield regardless of whether
showcount is enabled or whether these fields differ from their defaults. This
produces anonymized output that doesn't accurately reflect the original query
structure. Additionally, labelonly and showcount boolean parameters are silently
dropped, which may cause the anonymized query to be semantically different from the
original.

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [977-986]

 command.append(" t=").append(node.getThreshold());
 
 if (!"termlist".equals(node.getMatchMode())) {
   command.append(" match=").append(node.getMatchMode());
 }
 
 command.append(" labelfield=").append(MASK_COLUMN);
 command.append(" countfield=").append(MASK_COLUMN);
 
+if (node.isLabelOnly()) {
+  command.append(" labelonly=true");
+}
+if (node.isShowCount()) {
+  command.append(" showcount=true");
+}
+
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly notes that labelonly and showcount are silently dropped in the anonymized output, making it semantically different from the original query. However, this is a minor issue in an anonymizer utility and doesn't affect query execution correctness.

Low
Suggestions up to commit 5792e98
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix unsafe cache eviction inside computeIfAbsent

The cache eviction logic inside computeIfAbsent is unsafe and ineffective. Modifying
the ConcurrentHashMap while inside computeIfAbsent can cause a
ConcurrentModificationException or deadlock (as documented in Java). The eviction
check should be performed outside the computeIfAbsent call, and the parallel stream
removal is also unreliable for a ConcurrentHashMap.

common/src/main/java/org/opensearch/sql/common/cluster/TextSimilarityClustering.java [88-99]

 private Map<CharSequence, Integer> vectorizeWithCache(String value) {
-  return vectorCache.computeIfAbsent(
-      value,
-      k -> {
-        if (vectorCache.size() > MAX_CACHE_SIZE) {
-          vectorCache.keySet().parallelStream()
-              .limit(MAX_CACHE_SIZE / 2)
-              .forEach(vectorCache::remove);
-        }
-        return vectorize(k);
-      });
+  if (vectorCache.size() > MAX_CACHE_SIZE) {
+    vectorCache.keySet().stream()
+        .limit(MAX_CACHE_SIZE / 2)
+        .forEach(vectorCache::remove);
+  }
+  return vectorCache.computeIfAbsent(value, this::vectorize);
 }
Suggestion importance[1-10]: 8

__

Why: Modifying a ConcurrentHashMap inside computeIfAbsent is explicitly documented as unsafe in Java and can cause deadlocks or infinite loops. Moving the eviction check outside the lambda is a critical correctness fix.

Medium
Handle all parameter counts in varargs add method

The generic add(Acc acc, Object... values) method only handles the case where
values.length == 1, but Calcite may call this method with the full set of parameters
(field, threshold, matchMode, delims). This means calls with more than one argument
will always throw a SyntaxCheckException instead of being dispatched correctly,
breaking the function when all parameters are provided via the varargs path.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [42-56]

 @Override
 public Acc add(Acc acc, Object... values) {
-  // Handle case where Calcite calls generic method with null field value
   if (values.length == 1) {
     String field = (values[0] != null) ? values[0].toString() : null;
     return add(acc, field);
+  } else if (values.length == 4) {
+    String field = (values[0] != null) ? values[0].toString() : null;
+    double threshold = ((Number) values[1]).doubleValue();
+    String matchMode = (String) values[2];
+    String delims = (String) values[3];
+    return add(acc, field, threshold, matchMode, delims);
   }
 
   throw new SyntaxCheckException(
-      "Unsupported function signature for cluster aggregate. ...
+      "Unsupported function signature for cluster aggregate. Valid parameters include (field:"
+          + " required string), (t: optional double threshold 0.0-1.0, default 0.8), (match:"
+          + " optional string algorithm 'termlist'|'termset'|'ngramset', default 'termlist'),"
+          + " (delims: optional string delimiters, default ' '), (labelfield: optional string"
+          + " output field name, default 'cluster_label'), (countfield: optional string count"
+          + " field name, default 'cluster_count')");
+}
Suggestion importance[1-10]: 5

__

Why: The suggestion is valid in principle, but the actual call path from Calcite uses the typed overloads rather than the varargs method, so this is a defensive improvement rather than a critical bug fix. The improved code adds handling for 4-argument calls which may not be needed in practice.

Low
General
Remove unsafe mutable instance state in add method

Mutating instance fields (this.threshold, this.matchMode, etc.) inside the add
method is not thread-safe and can cause incorrect behavior when the aggregate
function is used concurrently. These configuration values should be passed through
the Acc accumulator or stored immutably, rather than being set as mutable instance
state during accumulation.

core/src/main/java/org/opensearch/sql/calcite/udf/udaf/ClusterLabelAggFunction.java [58-84]

 public Acc add(
     Acc acc,
     String field,
     double threshold,
     String matchMode,
     String delims,
     int bufferLimit,
     int maxClusters) {
-  ...
-  this.threshold = threshold;
-  this.matchMode = matchMode;
-  this.delims = delims;
-  this.bufferLimit = bufferLimit;
-  this.maxClusters = maxClusters;
-  ...
+  String processedField = (field != null) ? field : "";
 
+  acc.evaluate(processedField);
+
+  if (bufferLimit > 0 && acc.bufferSize() == bufferLimit) {
+    acc.partialMerge(threshold, matchMode, delims, maxClusters);
+    acc.clearBuffer();
+  }
+
+  return acc;
+}
+
Suggestion importance[1-10]: 6

__

Why: Mutating instance fields like this.threshold and this.matchMode inside add is not thread-safe and could cause incorrect behavior under concurrent use. However, the result method relies on these instance fields, so the fix requires also updating result to pass parameters differently, making the improved code incomplete as shown.

Low
Guard against empty child list access

Calling node.getChild().get(0) without checking if the child list is empty will
throw an IndexOutOfBoundsException if the node has no child (e.g., when the Cluster
node is the root or is not yet attached). A null/empty check should be added before
accessing the first child.

ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java [971-972]

 @Override
 public String visitCluster(org.opensearch.sql.ast.tree.Cluster node, String context) {
-  String child = node.getChild().get(0).accept(this, context);
-  ...
+  List<UnresolvedPlan> children = node.getChild();
+  if (children.isEmpty()) {
+    return "";
+  }
+  String child = children.get(0).accept(this, context);
+  String sourceField = visitExpression(node.getSourceField());
+  StringBuilder command = new StringBuilder();
+  command.append(child).append(" | cluster ").append(sourceField);
 
+  command.append(" t=").append(node.getThreshold());
+
+  if (!"termlist".equals(node.getMatchMode())) {
+    command.append(" match=").append(node.getMatchMode());
+  }
+
+  command.append(" labelfield=").append(MASK_COLUMN);
+  command.append(" countfield=").append(MASK_COLUMN);
+
+  return command.toString();
+}
+
Suggestion importance[1-10]: 4

__

Why: The Cluster node in normal usage always has a child attached via attach(), so this is a defensive check for an edge case. The suggestion is valid but has low practical impact since the anonymizer is only called on fully-constructed ASTs.

Low

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 152d2d4

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 74001d0

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 5792e98

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit e009b93

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2d99d92

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1763fb8

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 3fbd709

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 5bfaf69

@ritvibhatt ritvibhatt marked this pull request as ready for review March 31, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] PPL cluster Command

1 participant