Skip to content

Exclude decomissioning nodes when opening new shards, using chitchat#6165

Open
ncoiffier-celonis wants to merge 8 commits intoquickwit-oss:mainfrom
ncoiffier-celonis:fix-ingestion-gap-when-decomissioning-node
Open

Exclude decomissioning nodes when opening new shards, using chitchat#6165
ncoiffier-celonis wants to merge 8 commits intoquickwit-oss:mainfrom
ncoiffier-celonis:fix-ingestion-gap-when-decomissioning-node

Conversation

@ncoiffier-celonis
Copy link

@ncoiffier-celonis ncoiffier-celonis commented Feb 20, 2026

Description

Attempt to fix #6158

This PR:

  • broadcast the ingester status through chitchat
  • enrich the ControlPlaneModel to maintain a list of decomissioning indexer
  • filter out the decomissioning nodes when opening new shards, rebalancing or scaling up shards

With this approach, even if we have some propagation delay before decomissioning, it is still possible to fail to ingest some documents if the chitchat takes longer than expected to gossip the ingester state to the control-plane.

Alternative approach to #6166, but using chitchat stream instead of gRPC stream to propagate the decomissioning status to the control-plane.

I am wondering if this could conflict with the approach implemented here though #6163

Any feedback is welcome!!

How was this PR tested?

In addition of the unit and integration tests, I've run it against a local cluster with 2 indexer and observed that the number of errors reported in #6158 decreases from a few 100 to less than 10.

Other considerations

I also considered these 2 approaches:

  • re-using the indexer state (i.e. READY/NOT_READY, by adding a DRAINING state), but an indexer needs to be ready to successfully completed the decomission process
  • using the shard status itself in the decomissioning routine, but the changes were much more "spaghetti", and I couldnt quite make them working.
  • using gRPC stream call to have the indexer call the control-plane when decomissioning Exclude decomissioning nodes when opening new shards, using gRPC stream #6166

If we want to de-riskify this change, we could put it behind a feature-flag/config property.

@ncoiffier-celonis ncoiffier-celonis changed the title Exclude decomissioning nodes when opening new shards Exclude decomissioning nodes when opening new shards, using chitchat Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Indexer graceful shutdown causes ingestion gap and 500 errors "no shards available"

1 participant