Skip to content

Conversation

@michaelfeil
Copy link
Owner

What does this PR do?

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
  • Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@michaelfeil
Copy link
Owner Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 218 to 220
// fall-back to num_cpus - 1 to leave some CPU for the backend, and at most 64 workers.
let tokenization_workers =
tokenization_workers.unwrap_or_else(|| std::cmp::min(std::cmp::max(2, num_cpus::get() - 1), 64));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fallback workers oversubscribe small hosts

When tokenization_workers is not set, the new fallback min(max(2, num_cpus::get() - 1), 64) still spawns at least two tokenization threads even on 1–2 CPU hosts, leaving no core free for the backend despite the intent expressed in the preceding comment. On such small instances this change can regress inference throughput by competing with the backend for all available CPU, unlike the previous default of matching the core count. Consider leaving one core free without forcing a minimum of two workers.

Useful? React with 👍 / 👎.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair, i'll change to 1 minium.

chatgpt-codex-connector[bot]

This comment was marked as off-topic.

@michaelfeil
Copy link
Owner Author

@codex review

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants