Skip to content

Piggyback routing update on persist response#6173

Merged
nadav-govari merged 9 commits intonadav/feature-node-based-routingfrom
nadav/pr3.5
Mar 3, 2026
Merged

Piggyback routing update on persist response#6173
nadav-govari merged 9 commits intonadav/feature-node-based-routingfrom
nadav/pr3.5

Conversation

@nadav-govari
Copy link
Collaborator

@nadav-govari nadav-govari commented Feb 26, 2026

Description

Addressing #6163 (comment), ingesters had more current shard availability data than the router that was sending the request. In the shard based table, we would use that information; the new node based table didnt.

This PR adds support for piggybacking the most up to date routing information on persist requests, including closed shards. It works because:

  1. Subsequent control plane calls will overwrite this data, as needed
  2. Crucially, it keeps track of when shards are closed, so that newly updated totals can be merged, including if open shard count is 0.

How was this PR tested?

Unit tests. Will test on a local cluster as well.

Copy link
Member

@guilload guilload left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

let state = IngesterState::load(wal_dir_path, rate_limiter_settings);

let weak_state = state.weak();
let wal_capacity_time_series =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fold the wal_capacity_time_series into the state. It's doable with a small refactor and gets rid of another Arc<<Mutex<_>>>.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea :)

/// table. For existing nodes, updates their open shard count, including counts of 0, from the
/// CP response while preserving capacity scores if they already exist.
/// New nodes get a default capacity_score of 5.
pub fn merge_from_shards(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment is off because regular ingesters also provide routing updates and then why we need to handle the case where not all shards in the routing update are open.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case was if we had an ingester that had open shards before, but doesn't have any open shards now, an entry wouldn't be created in this function, and updating from (example) 4 open shards to 0 wouldn't happen (they'd all get filtered out). So the only case that matters here is the case where there are shards on the node but none of them are open.

@nadav-govari nadav-govari merged commit 006f951 into nadav/feature-node-based-routing Mar 3, 2026
5 of 7 checks passed
@nadav-govari nadav-govari deleted the nadav/pr3.5 branch March 3, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants