(feat) Add aggregations framework with analytics and bucket aggregations #2248

ajroetker · 2025-11-18T04:31:16Z

This PR combines the aggregations framework and additional aggregation types into a unified implementation.

Builds off discussions in #2243 and ports #2242.

Aggregations Framework

Enable powerful analytics and data exploration capabilities that go beyond simple faceting. Users can now compute metrics (sum, avg, min, max, count, sumsquares, stats) across search results and group them by field values or ranges with nested sub-aggregations for multi-dimensional analysis.

Problems addressed:

Computing statistics across filtered result sets (e.g., "average price of products matching 'laptop'")
Multi-level grouping and metrics (e.g., "total sales per region per category")
Complex analytics queries without requiring separate aggregation passes

Notes:

Metric aggregations: sum, avg, min, max, count, sumsquares, stats
Bucket aggregations: terms (group by values), range (group by ranges)
Nested sub-aggregations for multi-dimensional analytics
Computed efficiently during query execution using visitor pattern
Fully backward compatible - Facets API unchanged

Prefix and Regex Filtering for Terms Aggregations

(Port of #2242)

Enable search-as-you-type style aggregations where bucket terms dynamically match user input. Users can now aggregate by field values that match what's being typed in a search box, making autosuggestions cleaner and more focused (e.g., as user types "ste", show matching authors, titles, categories all filtered to terms starting with "ste").

Problems addressed:

Dynamic faceted autosuggestions that update as users type
Filtering high-cardinality fields to relevant matches only
Consistent filtering API between facets and aggregations

Notes:

Add TermPrefix and TermPattern fields to AggregationRequest
Pre-compile regex patterns in NewTermsAggregation (now returns error)
Add NewTermsAggregationWithFilter helper

Additional Aggregation Types

Cardinality aggregation:

Unique value counting using HyperLogLog++
Configurable precision (10-18) with ~1% standard error at default (14)

Bucket aggregations:

histogram: Fixed-interval numeric buckets with minDocCount filtering
date_histogram: minute/hour/day/week/month/quarter/year time intervals
geohash_grid: Geo point clustering by geohash cells (precision 1-12)
geo_distance: Distance range buckets from a center point

Significant terms aggregation:

Identifies terms uncommonly common in results vs entire index
Four algorithms: JLH, Mutual Information, Chi-Squared, Percentage
Two-phase architecture using pre-search infrastructure for background stats
Configurable size, minDocCount, and scoring algorithm

All aggregations support sub-aggregations and distributed queries.

Dependencies: Added github.com/axiomhq/hyperloglog for HLL++

Enable powerful analytics and data exploration capabilities that go beyond simple faceting. Users can now compute metrics (sum, avg, min, max, count, sumsquares, stats) across search results and group them by field values or ranges with nested sub-aggregations for multi-dimensional analysis. This addresses the need for: - Computing statistics across filtered result sets (e.g., "average price of products matching 'laptop'") - Multi-level grouping and metrics (e.g., "total sales per region per category") - Complex analytics queries without requiring separate aggregation passes Key features: - Metric aggregations: sum, avg, min, max, count, sumsquares, stats - Bucket aggregations: terms (group by values), range (group by ranges) - Nested sub-aggregations for multi-dimensional analytics - Computed efficiently during query execution using visitor pattern - Fully backward compatible - Facets API unchanged Example - average price per brand: byBrand := bleve.NewTermsAggregation("brand", 10) byBrand.AddSubAggregation("avg_price", bleve.NewAggregationRequest("avg", "price")) searchRequest.Aggregations = bleve.AggregationsRequest{"by_brand": byBrand}

Enable search-as-you-type style aggregations where bucket terms dynamically match user input. Users can now aggregate by field values that match what's being typed in a search box, making autosuggestions cleaner and more focused (e.g., as user types "ste", show matching authors, titles, categories all filtered to terms starting with "ste"). This addresses the need for: - Dynamic faceted autosuggestions that update as users type - Filtering high-cardinality fields to relevant matches only - Consistent filtering API between facets and aggregations (ports existing facet filtering feature) Performance benefits: - Zero-allocation filtering - only matching terms convert from []byte to string - Filters apply before bucket creation and sub-aggregation processing - Fast prefix checks with bytes.HasPrefix before regex evaluation Key changes: - Add TermPrefix and TermPattern fields to AggregationRequest - Pre-compile regex patterns in NewTermsAggregation (now returns error) - Add NewTermsAggregationWithFilter helper Example - autocomplete aggregation: agg, _ := bleve.NewTermsAggregationWithFilter("brand", 10, userInput, "")

Fixes bug in nested bucket aggregations where metric values were duplicated due to duplicate field registration in SubAggregationFields(). Also fixes StartDoc/EndDoc lifecycle for bucket sub-aggregations and min/max comparison logic in optimized aggregations. Adds Clone() method to AggregationBuilder interface for proper deep copying of nested aggregation hierarchies. Adopts setter pattern for aggregation filters (SetPrefixFilter, SetRegexFilter).

- Fix double-counting in bucket aggregations with sawValue guard - Remove unused count fields from Sum and SumSquares aggregations - Move StatsResult to search package for cleaner stats merging - Add field deduplication and validation for term filters

Also properly adds support for average for merging

…distance, and significant_terms aggregations Cardinality aggregation: - Unique value counting using HyperLogLog++ - Configurable precision (10-18) with ~1% standard error at default (14) Bucket aggregations: - histogram: Fixed-interval numeric buckets with minDocCount filtering - date_histogram: minute/hour/day/week/month/quarter/year time intervals - geohash_grid: Geo point clustering by geohash cells (precision 1-12) - geo_distance: Distance range buckets from a center point Significant terms aggregation: - Identifies terms uncommonly common in results vs entire index - Four algorithms: JLH, Mutual Information, Chi-Squared, Percentage - Two-phase architecture using pre-search infrastructure for background stats - Configurable size, minDocCount, and scoring algorithm All aggregations support sub-aggregations and distributed queries. Dependencies: Added github.com/axiomhq/hyperloglog for HLL++

…re-complex-aggregation-support

Add AddNumericRange, AddDateTimeRange, AddDateTimeRangeString, and AddDistanceRange methods to AggregationRequest, matching the pattern used by FacetRequest. This allows external code to add range buckets without needing access to the unexported range types.

ajroetker and others added 8 commits November 12, 2025 16:52

(bug) Add aggregations to SearchResult merging

5be06f6

Also properly adds support for average for merging

Fix issue with subaggregations and non-deterministic ordering

aec56d4

Merge branch 'ajroetker/add-bleve-aggregations' into ajroetker/add-mo…

4d26ce3

…re-complex-aggregation-support

abhinavdangeti added this to the v2.6.0 milestone Nov 18, 2025

ajroetker mentioned this pull request Dec 10, 2025

(feat) Add aggregations framework to enable numeric analytics on search results #2244

Closed

ajroetker changed the title ~~(feat) Add cardinality, histogram, date_histogram, geohash_grid, geo_distance, and significant_terms aggregations~~ (feat) Add aggregations framework with cardinality, histogram, date_histogram, geohash_grid, geo_distance, and significant_terms Jan 1, 2026

ajroetker changed the title ~~(feat) Add aggregations framework with cardinality, histogram, date_histogram, geohash_grid, geo_distance, and significant_terms~~ (feat) Add aggregations framework with analytics and bucket aggregations Jan 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(feat) Add aggregations framework with analytics and bucket aggregations #2248

(feat) Add aggregations framework with analytics and bucket aggregations #2248

Uh oh!

ajroetker commented Nov 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

(feat) Add aggregations framework with analytics and bucket aggregations #2248

Are you sure you want to change the base?

(feat) Add aggregations framework with analytics and bucket aggregations #2248

Uh oh!

Conversation

ajroetker commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Aggregations Framework

Prefix and Regex Filtering for Terms Aggregations

Additional Aggregation Types

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ajroetker commented Nov 18, 2025 •

edited

Loading