Adding a default range to events that don't have the range field on searching/filtering/slicing operations#25577
Adding a default range to events that don't have the range field on searching/filtering/slicing operations#25577
Conversation
…ents/fix-range-filter
There was a problem hiding this comment.
Pull request overview
Adds logic intended to treat missing range fields (notably scores.normalized_risk) as having a default lower bound during event search/filter/slice operations, and adjusts sorting behavior for missing values.
Changes:
- Introduces a helper to detect
>=0range filter values. - Adds an event-field constant for
scores.normalized_riskand updates ES/OS adapters to include documents missing that field in certain range scenarios. - Adjusts missing-value handling for asc/desc sorting and refactors slice aggregation helpers.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
graylog2-server/src/main/java/org/graylog/events/search/MoreSearchAdapter.java |
Adds helper to detect a “lower bound is 0” range filter value. |
graylog2-server/src/main/java/org/graylog/events/event/EventDto.java |
Adds constant for scores.normalized_risk field path used in queries. |
graylog-storage-opensearch3/src/main/java/org/graylog/storage/opensearch3/MoreSearchAdapterOS.java |
Alters query construction and range slicing to (attempt to) include docs missing normalized risk; fixes missing sort placement logic. |
graylog-storage-opensearch2/src/main/java/org/graylog/storage/opensearch2/MoreSearchAdapterOS2.java |
Alters query construction and range slicing to (attempt to) include docs missing normalized risk. |
graylog-storage-elasticsearch7/src/main/java/org/graylog/storage/elasticsearch7/MoreSearchAdapterES7.java |
Alters query construction and range slicing to (attempt to) include docs missing normalized risk. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
graylog2-server/src/main/java/org/graylog/events/search/MoreSearchAdapter.java
Show resolved
Hide resolved
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { | ||
| return Query.of(a -> a.bool(b -> b.should(Query.of(c -> c.bool(boolQuery.build())), | ||
| Query.of(d -> d.bool(inner -> inner.mustNot(mn -> mn.exists(e -> e.field(EventDto.NORMALIZED_RISK_PATH)))))) | ||
| .minimumShouldMatch("0"))); |
There was a problem hiding this comment.
The new top-level bool query uses only should(...) clauses and explicitly sets minimumShouldMatch("0"). With minimumShouldMatch=0, all should clauses become optional and the query can match all documents, effectively bypassing the existing filters. To keep filtering semantics, require at least one should clause (minimumShouldMatch("1")) or avoid setting it and rely on the default behavior.
| .minimumShouldMatch("0"))); | |
| .minimumShouldMatch("1"))); |
| return Query.of(a -> a.bool(b -> b.should(Query.of(c -> c.bool(boolQuery.build())), | ||
| Query.of(d -> d.bool(inner -> inner.mustNot(mn -> mn.exists(e -> e.field(EventDto.NORMALIZED_RISK_PATH)))))) | ||
| .minimumShouldMatch("0"))); |
There was a problem hiding this comment.
The fallback should branch that matches mustNot exists(scores.normalized_risk) is not constrained by the base filters (query string, timerange, stream filters, etc.), so it can include unrelated documents that only happen to be missing the field. This should be implemented as (baseFilters) AND (normalized_risk_range OR missing(normalized_risk)), not (baseFilters) OR missing(normalized_risk).
| } | ||
| return Query.of(b -> b.bool(boolQuery.build())); | ||
|
|
||
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { |
There was a problem hiding this comment.
This condition checks only the values of extraFilters, so any filter value like ">=0" on an unrelated field will trigger the normalized_risk-missing fallback and broaden results unexpectedly. Gate this behavior on the field key as well (e.g., only when the extra filter key equals EventDto.NORMALIZED_RISK_PATH).
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { | |
| final boolean includeDefaultForMissingNormalizedRisk = extraFilters.entrySet().stream() | |
| .filter(entry -> EventDto.NORMALIZED_RISK_PATH.equals(entry.getKey())) | |
| .map(Map.Entry::getValue) | |
| .flatMap(Collection::stream) | |
| .anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0); | |
| if (isRangeQueryIncludeDefaultForMissingField || includeDefaultForMissingNormalizedRisk) { |
| if(from == null || from == 0d) { | ||
| filter.set(createRangeQueryIncludeDefaultForMissingField(queryString, timerange, eventStreams, filterString, forbiddenSourceStreams, extraFilters)); | ||
| } |
There was a problem hiding this comment.
Range slicing: toggling the query when from is null/0 targets scores.normalized_risk unconditionally, so slicing any other numeric field with a 0 lower bound will incorrectly broaden the result set. Also, including missing-field docs in the query won’t make them appear in range buckets; to treat missing values as 0, set the range aggregation’s missing value to 0 (where supported) and only do this when slicingColumn is EventDto.NORMALIZED_RISK_PATH.
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { | ||
| return boolQuery().should(filter).should(boolQuery().mustNot(existsQuery(EventDto.NORMALIZED_RISK_PATH))).minimumShouldMatch("0"); | ||
| } else { | ||
| return filter; |
There was a problem hiding this comment.
This changes the query to (baseFilter) OR (NOT exists(scores.normalized_risk)) and sets minimumShouldMatch("0"), which can make the should clauses optional and match all documents. Even without that, the NOT exists branch is not constrained by the base filters (streams/timerange/query). This should be structured as (baseFilters) AND (normalized_risk_range OR missing(normalized_risk)), with at least one should required when using should.
| } | ||
|
|
||
| return filter; | ||
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { |
There was a problem hiding this comment.
The fallback is triggered by any extra filter value matching ">=0"/">=0.0" regardless of field. This can broaden results unexpectedly for unrelated fields. Gate this on the filter key (only apply to EventDto.NORMALIZED_RISK_PATH).
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { | |
| final boolean normalizedRiskHasLowerBoundZeroRange = extraFilters | |
| .getOrDefault(EventDto.NORMALIZED_RISK_PATH, Collections.emptySet()) | |
| .stream() | |
| .anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0); | |
| if (isRangeQueryIncludeDefaultForMissingField || normalizedRiskHasLowerBoundZeroRange) { |
| if(from == null || from == 0d) { | ||
| filter.set(createRangeQueryIncludeDefaultForMissingField(queryString, timerange, eventStreams, filterString, forbiddenSourceStreams, extraFilters)); | ||
| } |
There was a problem hiding this comment.
Range slicing: toggling the query when from is null/0 targets scores.normalized_risk unconditionally, so slicing any other numeric field with a 0 lower bound will incorrectly broaden results. Also, adding missing-field docs to the query won’t bucket them in a range aggregation; if the intent is to treat missing values as 0, set the range aggregation’s missing value to 0 (where supported) and only do this when slicingColumn is EventDto.NORMALIZED_RISK_PATH.
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { | ||
| return boolQuery().should(filter).should(boolQuery().mustNot(existsQuery(EventDto.NORMALIZED_RISK_PATH))).minimumShouldMatch("0"); |
There was a problem hiding this comment.
New behavior in this file is not covered by the existing unit/integration tests: there are tests for extraFilters and sorting, but none that assert how events missing scores.normalized_risk should be handled when applying a ">=0" normalized_risk filter or when slicing by normalized_risk ranges. Adding an IT using the existing fixture to cover these cases would prevent regressions across ES7/OS2/OS3 adapters.
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { | |
| return boolQuery().should(filter).should(boolQuery().mustNot(existsQuery(EventDto.NORMALIZED_RISK_PATH))).minimumShouldMatch("0"); | |
| final boolean includeMissingNormalizedRisk = isRangeQueryIncludeDefaultForMissingField | |
| || extraFilters.entrySet().stream() | |
| .filter(entry -> EventDto.NORMALIZED_RISK_PATH.equals(entry.getKey())) | |
| .flatMap(entry -> entry.getValue().stream()) | |
| .anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0); | |
| if (includeMissingNormalizedRisk) { | |
| return boolQuery() | |
| .should(filter) | |
| .should(boolQuery().mustNot(existsQuery(EventDto.NORMALIZED_RISK_PATH))) | |
| .minimumShouldMatch(1); |
| } | ||
|
|
||
| return filter; | ||
| if(isRangeQueryIncludeDefaultForMissingField || extraFilters.values().stream().flatMap(Collection::stream).anyMatch(MoreSearchAdapter::isRangeValueLowerBoundsIs0)) { |
There was a problem hiding this comment.
I'd consider letting the client code decide on each "extra filter" level if to include empty values or not, rather than polluting this general class with:
- reference to certain, specific field (EventDto.NORMALIZED_RISK_PATH),
- difficult to understand var (isRangeQueryIncludeDefaultForMissingField),
- difficult to understand default method (isRangeValueLowerBoundsIs0)
Description
Adding a default range to events that don't have the range field on searching/filtering/slicing operations.
Also fixes the correct inclusion of missing elements into asc/desc sorting.
/nocl because it's in unreleased functionality
Motivation and Context
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: