Skip to content

chore: reuse time bin logic across apis#1660

Open
nikhilsinhaparseable wants to merge 3 commits into
parseablehq:mainfrom
nikhilsinhaparseable:reuse-time-bin
Open

chore: reuse time bin logic across apis#1660
nikhilsinhaparseable wants to merge 3 commits into
parseablehq:mainfrom
nikhilsinhaparseable:reuse-time-bin

Conversation

@nikhilsinhaparseable
Copy link
Copy Markdown
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Jun 2, 2026

remove binning logic in counts api
make reusable component
reuse in counts api, errors api, agent-observability related apis

Summary by CodeRabbit

  • New Features
    • Added time-binning utilities with an epoch-aligned anchor and dynamic interval selection for consistent bin labels and boundaries.
  • Refactor
    • Count queries now compute adaptive bin intervals and build aligned bin start/end expressions instead of using fixed thresholds.
  • Bug Fixes
    • Reject zero-bin requests and surface binning errors instead of continuing with invalid computations.
  • Tests
    • Added unit tests for zero-bin handling, epoch-aligned boundaries, and timezone-aware matching.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0b6e02e1-75a7-4b17-ac64-1660fbc3314d

📥 Commits

Reviewing files that changed from the base of the PR and between cf8d7ef and d0631f8.

📒 Files selected for processing (1)
  • src/utils/time.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/time.rs

Walkthrough

Extracts time-binning logic into new utilities and updates CountsRequest SQL generation to compute DATE_BIN intervals dynamically, using a shared epoch anchor for bin start/end expressions.

Changes

Dynamic time-binning for counts query

Layer / File(s) Summary
Time-binning utility functions
src/utils/time.rs
Adds DATE_BIN_EPOCH_ANCHOR, TimeBinError, and exported helpers: count_api_bin_interval, interval_for_num_bins, expected_time_bins, and match_time_bin_key, plus unit tests.
Counts query refactoring
src/query/mod.rs
Imports new utilities; CountsRequest::get_bounds is fallible and rejects num_bins == 0, get_bin_density propagates errors, and get_df_sql computes a dynamic bin_interval and anchors DATE_BIN expressions to DATE_BIN_EPOCH_ANCHOR.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • parseablehq/parseable#1425: Prior refactor of counts query date-binning SQL that relates to this PR's interval selection and formatting changes.

Poem

I nibble at epochs, carrots in rows, 🥕
New bins spring where the cool wind blows,
Anchored to time, neat slices unfold,
Counts march steady, tidy and bold,
A rabbit's tally, crisp and gold.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal and lacks the structure specified in the template, missing detailed sections for rationale, key changes, testing confirmation, and documentation. Expand the description to follow the template: add 'Fixes #XXXX', describe the goal and rationale, explain key changes in detail, and confirm testing and documentation have been added.
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'chore: reuse time bin logic across apis' directly describes the main objective of the PR—consolidating binning logic and making it reusable across multiple APIs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/utils/time.rs (1)

60-121: ⚡ Quick win

Missing unit tests for new time-binning utilities.

The four new exported functions (count_api_bin_interval, interval_for_num_bins, expected_time_bins, match_time_bin_key) lack unit test coverage. These are foundational utilities that will be reused across multiple APIs per the PR objectives.

Do you want me to generate unit tests for these functions?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/time.rs` around lines 60 - 121, Tests are missing for the four
exported utilities: count_api_bin_interval, interval_for_num_bins,
expected_time_bins, and match_time_bin_key; add a new test module (or file) that
exercises these functions: verify count_api_bin_interval returns
"1m","5m","1h","1d" for durations around the 5-minute, 24-hour, and 10-day
thresholds using DateTime<Utc> instances; test interval_for_num_bins for correct
seconds calculation and formatting and for edge case num_bins=0/1 (or guard
against 0 by using at least 1); test expected_time_bins returns num_bins
entries, that bins cover contiguous ranges and that the final bin_end equals the
provided end; and test match_time_bin_key with RFC3339 strings with and without
timezone suffixes (Z, +00:00, other offsets) to ensure normalization and
matching within 1 second and that non-matching inputs are returned unchanged—use
DateTime::parse_from_rfc3339 and Utc conversions in asserts and reference the
functions count_api_bin_interval, interval_for_num_bins, expected_time_bins, and
match_time_bin_key when locating code to test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/time.rs`:
- Around line 101-106: The timezone normalization in match_time_bin_key
incorrectly treats strings with negative offsets as lacking a timezone; change
the logic that builds normalized from sql_bin_start so it detects any explicit
timezone suffix (either 'Z' or a +/-HH:MM offset) instead of only checking for
'+'; for example, use a regex or attempt parse_from_rfc3339 to detect a present
offset and only append "+00:00" when no timezone is present, keeping function
name match_time_bin_key and variable sql_bin_start as the points to modify.

---

Nitpick comments:
In `@src/utils/time.rs`:
- Around line 60-121: Tests are missing for the four exported utilities:
count_api_bin_interval, interval_for_num_bins, expected_time_bins, and
match_time_bin_key; add a new test module (or file) that exercises these
functions: verify count_api_bin_interval returns "1m","5m","1h","1d" for
durations around the 5-minute, 24-hour, and 10-day thresholds using
DateTime<Utc> instances; test interval_for_num_bins for correct seconds
calculation and formatting and for edge case num_bins=0/1 (or guard against 0 by
using at least 1); test expected_time_bins returns num_bins entries, that bins
cover contiguous ranges and that the final bin_end equals the provided end; and
test match_time_bin_key with RFC3339 strings with and without timezone suffixes
(Z, +00:00, other offsets) to ensure normalization and matching within 1 second
and that non-matching inputs are returned unchanged—use
DateTime::parse_from_rfc3339 and Utc conversions in asserts and reference the
functions count_api_bin_interval, interval_for_num_bins, expected_time_bins, and
match_time_bin_key when locating code to test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f5900f5a-f916-4783-a03d-a99b6fa0c426

📥 Commits

Reviewing files that changed from the base of the PR and between cbfbfa1 and 06d123e.

📒 Files selected for processing (2)
  • src/query/mod.rs
  • src/utils/time.rs

Comment thread src/utils/time.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/utils/time.rs (1)

101-106: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Negative UTC offsets are still normalized incorrectly.

Strings like 2023-01-01T00:00:00-05:00 still fail the '+'/'Z' check, so this appends +00:00 and produces an invalid RFC3339 value. That leaves valid SQL timestamps unmatched.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/time.rs` around lines 101 - 106, The code in match_time_bin_key
incorrectly assumes absence of '+' or 'Z' means no timezone and appends
"+00:00", which breaks negative offsets like "2023-01-01T00:00:00-05:00"; update
the normalization to detect a timezone suffix properly by checking for a
trailing 'Z' or a numeric offset pattern (e.g. use a regex like
r"[+-]\d{2}:\d{2}$" or equivalent) against sql_bin_start, and only append
"+00:00" when that pattern does not match; modify the normalization logic around
sql_bin_start in match_time_bin_key to use this check.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/time.rs`:
- Around line 80-97: The expected_time_bins function currently anchors bins at
time_range.start which diverges from the SQL DATE_BIN(...,
DATE_BIN_EPOCH_ANCHOR) behavior; change expected_time_bins to compute the first
bin start aligned to the same epoch anchor used by SQL (or accept a resolved
anchor/bin_start parameter) and then generate subsequent bins by adding
bin_seconds increments so the Rust keys match the SQL keys; modify the signature
of expected_time_bins (or add an extra parameter) to accept an epoch_anchor or a
precomputed first_bin_start, locate logic in expected_time_bins and replace the
bin_start/bin_end calculation to start from that aligned anchor (use TimeRange
and num_bins to compute total_seconds and bin_seconds as before), and ensure the
last bin_end still uses time_range.end.
- Around line 74-77: interval_for_num_bins currently divides by num_bins and
will panic on num_bins == 0; change the function to validate num_bins first and
return a Result<String, ValidationError> (or a suitable crate error) instead of
a bare String, returning Err when num_bins == 0 and Ok(format!("{bin_seconds}
seconds")) otherwise; apply the same fix to the other public helper that
performs division by num_bins (the sibling helper defined nearby), update
callers to handle the Result, and preserve the existing calculation logic after
the guard.

---

Duplicate comments:
In `@src/utils/time.rs`:
- Around line 101-106: The code in match_time_bin_key incorrectly assumes
absence of '+' or 'Z' means no timezone and appends "+00:00", which breaks
negative offsets like "2023-01-01T00:00:00-05:00"; update the normalization to
detect a timezone suffix properly by checking for a trailing 'Z' or a numeric
offset pattern (e.g. use a regex like r"[+-]\d{2}:\d{2}$" or equivalent) against
sql_bin_start, and only append "+00:00" when that pattern does not match; modify
the normalization logic around sql_bin_start in match_time_bin_key to use this
check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 416da683-6946-4d4e-a3ca-14a900e1e4e0

📥 Commits

Reviewing files that changed from the base of the PR and between 06d123e and 28361c9.

📒 Files selected for processing (2)
  • src/query/mod.rs
  • src/utils/time.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/query/mod.rs

Comment thread src/utils/time.rs Outdated
Comment thread src/utils/time.rs Outdated
remove binning logic in counts api
make reusable component
reuse in counts api, errors api, agent-observability related apis
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/query/mod.rs (1)

569-637: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

get_bounds creates one extra bin when total_minutes % num_bins != 0.

Current control flow emits num_bins bins in the loop and then appends a final remainder bin, resulting in num_bins + 1 bins for non-divisible ranges.

🔧 Suggested fix
-        let loop_end = if have_remainder {
-            num_bins
-        } else {
-            num_bins - 1
-        };
+        let loop_end = num_bins - 1;

         // Create bins for all but the last date
         for _ in 0..loop_end {
             let end = start + Duration::minutes(quotient as i64);
             bounds.push(TimeBounds { start, end });
             start = end;
         }

-        // Add the last bin, accounting for any remainder, should we include it?
-        if have_remainder {
-            bounds.push(TimeBounds {
-                start,
-                end: start + Duration::minutes(remainder as i64),
-            });
-        } else {
-            bounds.push(TimeBounds {
-                start,
-                end: start + Duration::minutes(quotient as i64),
-            });
-        }
+        // Add the last bin and fold remainder into it.
+        bounds.push(TimeBounds {
+            start,
+            end: start + Duration::minutes((quotient + remainder) as i64),
+        });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/query/mod.rs` around lines 569 - 637, get_bounds currently emits num_bins
bins in the loop and then adds a remainder bin, producing num_bins+1 when
total_minutes % num_bins != 0; fix by iterating exactly num_bins-1 times (or
zero if num_bins==1) and make the final push produce the remaining time span
(quotient + remainder) so total bins equal num_bins. Update the loop_end/loop
logic in get_bounds to always produce num_bins-1 initial bins using quotient
minutes each, then push one final TimeBounds from start to start +
Duration::minutes((quotient + remainder) as i64); keep the existing num_bins
validation and variable names (num_bins, total_minutes, quotient, remainder,
start, bounds, TimeBounds).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/time.rs`:
- Around line 96-136: The function expected_time_bins can produce out-of-range
or inverted bins when num_bins exceeds the available seconds; before computing
bin_seconds, compute total_seconds from TimeRange (as already done) and if
num_bins > total_seconds return Err(TimeBinError::InvalidNumBins) (or clamp
num_bins to total_seconds), then proceed; also ensure when computing bin_end you
cap it to time_range.end (e.g., take the minimum of computed bin_end and
time_range.end) so bin_start never exceeds bin_end; update checks around
total_seconds, bin_seconds, first_bin_start, bin_start and bin_end in
expected_time_bins accordingly.

---

Outside diff comments:
In `@src/query/mod.rs`:
- Around line 569-637: get_bounds currently emits num_bins bins in the loop and
then adds a remainder bin, producing num_bins+1 when total_minutes % num_bins !=
0; fix by iterating exactly num_bins-1 times (or zero if num_bins==1) and make
the final push produce the remaining time span (quotient + remainder) so total
bins equal num_bins. Update the loop_end/loop logic in get_bounds to always
produce num_bins-1 initial bins using quotient minutes each, then push one final
TimeBounds from start to start + Duration::minutes((quotient + remainder) as
i64); keep the existing num_bins validation and variable names (num_bins,
total_minutes, quotient, remainder, start, bounds, TimeBounds).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ec56e752-1bfb-4b61-a998-f8d7a1753594

📥 Commits

Reviewing files that changed from the base of the PR and between 28361c9 and cf8d7ef.

📒 Files selected for processing (2)
  • src/query/mod.rs
  • src/utils/time.rs

Comment thread src/utils/time.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant