Skip to content

Rebalance shards when ingester status changes#6185

Open
ncoiffier-celonis wants to merge 9 commits intoquickwit-oss:mainfrom
ncoiffier-celonis:ingester-status-rebased
Open

Rebalance shards when ingester status changes#6185
ncoiffier-celonis wants to merge 9 commits intoquickwit-oss:mainfrom
ncoiffier-celonis:ingester-status-rebased

Conversation

@ncoiffier-celonis
Copy link

Description

Attempt to fix #6158

Following @guilload's suggestion here, this PR:

  • gossip the ingester status over chit chat
  • update the ingester pool when ingester status changes
  • update the indexer pool too when ingester status changes (to fix no open shard found on ingester error)
  • have the control plane rebalance the shards when the ingester status changes

With this approach, even if we have some 10s propagation delay before decomissioning, it is still possible to fail to ingest some documents if the chitchat takes longer than expected to gossip the ingester status to the control-plane.

Any feedback is welcome!!

How was this PR tested?

In addition of the unit and integration tests, I've run it against a local cluster with 2 indexer and observed that the number of errors reported in #6158 decreases from a few 100 to no errors.

Other approches

This PR is fairly identical to the branch guilload/ingester-status, rebased on main and with some additional bugfixes:

  • fix bug in timeout_after being always 0, causing to not wait
  • update ingester pool when IngesterStatus change (not only indexer pool)
  • more unit and integration tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Indexer graceful shutdown causes ingestion gap and 500 errors "no shards available"

2 participants