Skip to content

Adding a default range to events that don't have the range field on searching/filtering/slicing operations#25577

Open
janheise wants to merge 11 commits intomasterfrom
events/fix-range-filter
Open

Adding a default range to events that don't have the range field on searching/filtering/slicing operations#25577
janheise wants to merge 11 commits intomasterfrom
events/fix-range-filter

Conversation

@janheise
Copy link
Copy Markdown
Contributor

@janheise janheise commented Apr 8, 2026

Description

Adding a default range to events that don't have the range field on searching/filtering/slicing operations.
Also fixes the correct inclusion of missing elements into asc/desc sorting.

/nocl because it's in unreleased functionality

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (non-breaking change)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have requested a documentation update.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.

@janheise janheise requested a review from luk-kaminski April 13, 2026 13:50
@janheise janheise marked this pull request as ready for review April 13, 2026 13:56
@luk-kaminski luk-kaminski requested a review from Copilot April 13, 2026 14:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds logic intended to treat missing range fields (notably scores.normalized_risk) as having a default lower bound during event search/filter/slice operations, and adjusts sorting behavior for missing values.

Changes:

  • Introduces a helper to detect >=0 range filter values.
  • Adds an event-field constant for scores.normalized_risk and updates ES/OS adapters to include documents missing that field in certain range scenarios.
  • Adjusts missing-value handling for asc/desc sorting and refactors slice aggregation helpers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
graylog2-server/src/main/java/org/graylog/events/search/MoreSearchAdapter.java Adds helper to detect a “lower bound is 0” range filter value.
graylog2-server/src/main/java/org/graylog/events/event/EventDto.java Adds constant for scores.normalized_risk field path used in queries.
graylog-storage-opensearch3/src/main/java/org/graylog/storage/opensearch3/MoreSearchAdapterOS.java Alters query construction and range slicing to (attempt to) include docs missing normalized risk; fixes missing sort placement logic.
graylog-storage-opensearch2/src/main/java/org/graylog/storage/opensearch2/MoreSearchAdapterOS2.java Alters query construction and range slicing to (attempt to) include docs missing normalized risk.
graylog-storage-elasticsearch7/src/main/java/org/graylog/storage/elasticsearch7/MoreSearchAdapterES7.java Alters query construction and range slicing to (attempt to) include docs missing normalized risk.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
return Query.of(a -> a.bool(b -> b.should(Query.of(c -> c.bool(boolQuery.build())),
Query.of(d -> d.bool(inner -> inner.mustNot(mn -> mn.exists(e -> e.field(EventDto.NORMALIZED_RISK_PATH))))))
.minimumShouldMatch("0")));
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new top-level bool query uses only should(...) clauses and explicitly sets minimumShouldMatch("0"). With minimumShouldMatch=0, all should clauses become optional and the query can match all documents, effectively bypassing the existing filters. To keep filtering semantics, require at least one should clause (minimumShouldMatch("1")) or avoid setting it and rely on the default behavior.

Suggested change
.minimumShouldMatch("0")));
.minimumShouldMatch("1")));

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +213
return Query.of(a -> a.bool(b -> b.should(Query.of(c -> c.bool(boolQuery.build())),
Query.of(d -> d.bool(inner -> inner.mustNot(mn -> mn.exists(e -> e.field(EventDto.NORMALIZED_RISK_PATH))))))
.minimumShouldMatch("0")));
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback should branch that matches mustNot exists(scores.normalized_risk) is not constrained by the base filters (query string, timerange, stream filters, etc.), so it can include unrelated documents that only happen to be missing the field. This should be implemented as (baseFilters) AND (normalized_risk_range OR missing(normalized_risk)), not (baseFilters) OR missing(normalized_risk).

Copilot uses AI. Check for mistakes.
}
return Query.of(b -> b.bool(boolQuery.build()));

if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition checks only the values of extraFilters, so any filter value like ">=0" on an unrelated field will trigger the normalized_risk-missing fallback and broaden results unexpectedly. Gate this behavior on the field key as well (e.g., only when the extra filter key equals EventDto.NORMALIZED_RISK_PATH).

Suggested change
if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
final boolean includeDefaultForMissingNormalizedRisk = extraFilters.entrySet().stream()
.filter(entry -> EventDto.NORMALIZED_RISK_PATH.equals(entry.getKey()))
.map(Map.Entry::getValue)
.flatMap(Collection::stream)
.anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0);
if (isRangeQueryIncludeDefaultForMissingField || includeDefaultForMissingNormalizedRisk) {

Copilot uses AI. Check for mistakes.
Comment on lines +371 to +373
if(from == null || from == 0d) {
filter.set(createRangeQueryIncludeDefaultForMissingField(queryString, timerange, eventStreams, filterString, forbiddenSourceStreams, extraFilters));
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range slicing: toggling the query when from is null/0 targets scores.normalized_risk unconditionally, so slicing any other numeric field with a 0 lower bound will incorrectly broaden the result set. Also, including missing-field docs in the query won’t make them appear in range buckets; to treat missing values as 0, set the range aggregation’s missing value to 0 (where supported) and only do this when slicingColumn is EventDto.NORMALIZED_RISK_PATH.

Copilot uses AI. Check for mistakes.
Comment on lines +276 to +279
if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
return boolQuery().should(filter).should(boolQuery().mustNot(existsQuery(EventDto.NORMALIZED_RISK_PATH))).minimumShouldMatch("0");
} else {
return filter;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the query to (baseFilter) OR (NOT exists(scores.normalized_risk)) and sets minimumShouldMatch("0"), which can make the should clauses optional and match all documents. Even without that, the NOT exists branch is not constrained by the base filters (streams/timerange/query). This should be structured as (baseFilters) AND (normalized_risk_range OR missing(normalized_risk)), with at least one should required when using should.

Copilot uses AI. Check for mistakes.
}

return filter;
if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback is triggered by any extra filter value matching ">=0"/">=0.0" regardless of field. This can broaden results unexpectedly for unrelated fields. Gate this on the filter key (only apply to EventDto.NORMALIZED_RISK_PATH).

Suggested change
if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
final boolean normalizedRiskHasLowerBoundZeroRange = extraFilters
.getOrDefault(EventDto.NORMALIZED_RISK_PATH, Collections.emptySet())
.stream()
.anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0);
if (isRangeQueryIncludeDefaultForMissingField || normalizedRiskHasLowerBoundZeroRange) {

Copilot uses AI. Check for mistakes.
Comment on lines +370 to +372
if(from == null || from == 0d) {
filter.set(createRangeQueryIncludeDefaultForMissingField(queryString, timerange, eventStreams, filterString, forbiddenSourceStreams, extraFilters));
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range slicing: toggling the query when from is null/0 targets scores.normalized_risk unconditionally, so slicing any other numeric field with a 0 lower bound will incorrectly broaden results. Also, adding missing-field docs to the query won’t bucket them in a range aggregation; if the intent is to treat missing values as 0, set the range aggregation’s missing value to 0 (where supported) and only do this when slicingColumn is EventDto.NORMALIZED_RISK_PATH.

Copilot uses AI. Check for mistakes.
Comment on lines +276 to +277
if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
return boolQuery().should(filter).should(boolQuery().mustNot(existsQuery(EventDto.NORMALIZED_RISK_PATH))).minimumShouldMatch("0");
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior in this file is not covered by the existing unit/integration tests: there are tests for extraFilters and sorting, but none that assert how events missing scores.normalized_risk should be handled when applying a ">=0" normalized_risk filter or when slicing by normalized_risk ranges. Adding an IT using the existing fixture to cover these cases would prevent regressions across ES7/OS2/OS3 adapters.

Suggested change
if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
return boolQuery().should(filter).should(boolQuery().mustNot(existsQuery(EventDto.NORMALIZED_RISK_PATH))).minimumShouldMatch("0");
final boolean includeMissingNormalizedRisk = isRangeQueryIncludeDefaultForMissingField
|| extraFilters.entrySet().stream()
.filter(entry -> EventDto.NORMALIZED_RISK_PATH.equals(entry.getKey()))
.flatMap(entry -> entry.getValue().stream())
.anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0);
if (includeMissingNormalizedRisk) {
return boolQuery()
.should(filter)
.should(boolQuery().mustNot(existsQuery(EventDto.NORMALIZED_RISK_PATH)))
.minimumShouldMatch(1);

Copilot uses AI. Check for mistakes.
}

return filter;
if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd consider letting the client code decide on each "extra filter" level if to include empty values or not, rather than polluting this general class with:

  • reference to certain, specific field (EventDto.NORMALIZED_RISK_PATH),
  • difficult to understand var (isRangeQueryIncludeDefaultForMissingField),
  • difficult to understand default method (isRangeValueLowerBoundsIs0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants