Skip to content

fix: avoid NLTK download race condition in parallel test workers#477

Open
bsbodden wants to merge 1 commit intomainfrom
nltk-race-condition-fix
Open

fix: avoid NLTK download race condition in parallel test workers#477
bsbodden wants to merge 1 commit intomainfrom
nltk-race-condition-fix

Conversation

@bsbodden
Copy link
Collaborator

Try loading stopwords before downloading to prevent concurrent pytest-xdist workers from racing on nltk.download(), which raises FileExistsError when multiple processes write to the same path.

Try loading stopwords before downloading to prevent concurrent
pytest-xdist workers from racing on nltk.download(), which raises
FileExistsError when multiple processes write to the same path.
Copilot AI review requested due to automatic review settings February 17, 2026 03:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition that occurs when multiple pytest-xdist workers attempt to download NLTK stopwords data concurrently. The fix implements a "try-load-first" pattern that checks if the data is already available before attempting to download it, preventing FileExistsError exceptions when multiple processes write to the same path simultaneously.

Changes:

  • Wrapped NLTK stopwords loading in a try-except block to check if data exists before downloading
  • Applied the fix consistently in both _get_stopwords and _set_stopwords methods

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
redisvl/utils/full_text_query_helper.py Updated _get_stopwords to try loading stopwords before downloading, preventing race conditions in parallel test workers
redisvl/query/query.py Updated _set_stopwords in TextQuery class with the same race condition fix

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@vishal-bala vishal-bala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'd guess there's technically still a race condition in that

  • two workers could try to start downloading at the same time, or
  • a worker could try to load a partially downloaded dataset, fail, and then decide to start downloading it as well

But these are both fairly niche and unlikely, and probably not worth designing for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants