-
Notifications
You must be signed in to change notification settings - Fork 609
[Common] Enable determinism for cuDNN >= 9.18.1 on Blackwell #2584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cyanguwa
wants to merge
34
commits into
NVIDIA:main
Choose a base branch
from
cyanguwa:blackwell_determinism
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
259662b
update FE to 1.17
cyanguwa c243794
add determinism flag
cyanguwa 5578a60
add determinism to test
cyanguwa 9bc1d64
add determinism to qa/
cyanguwa b1bdab7
move bias/dbias/versioning/dropout logic to C API
cyanguwa ea109c2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 70fc94b
Update qa/L0_pytorch_unittest/test.sh
cyanguwa e82bd96
add determinism to Jax extension
cyanguwa 8365962
add determinism to Jax tests
cyanguwa c7db02b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 4aaa627
Update tests/jax/test_fused_attn.py
cyanguwa 0ee6b87
Update transformer_engine/common/fused_attn/fused_attn.cpp
cyanguwa 4bd5e95
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] bd31e01
fix the AI fixes
cyanguwa eb2e055
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6f0e515
fix Jax extension call
cyanguwa 2c22cbf
minor fixes based on comments
cyanguwa 279f2f6
Merge branch 'main' into blackwell_determinism
cyanguwa aae98f3
fix selection logic and fwd arg
cyanguwa b962d32
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 1068594
fix version check in Jax test
cyanguwa fed07f2
Merge branch 'main' into blackwell_determinism
cyanguwa c51cf44
fix pytorch CI failures
cyanguwa 3885684
fix Jax CI failures
cyanguwa 8bf3a0f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] f3a0234
Merge branch 'main' into blackwell_determinism
cyanguwa f526569
fix non-/determinism logic and CI
cyanguwa 0cb374a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 9162ff6
fix formatting
cyanguwa ee90c5a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 77d0f2a
Update transformer_engine/common/fused_attn/fused_attn.cpp
cyanguwa 6a3346f
Merge branch 'main' into blackwell_determinism
cyanguwa 65a67c6
update to 9.18.1 for requirement
cyanguwa 7187d02
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
Submodule cudnn-frontend
updated
102 files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this will first run the non-deterministic fused attn tests as part of L31, which runs all non distributed tests, followed by running the fused attn deterministic tests as part of L32.
Is that the intention ? - to run fused attn 2x - with and without determinism ?
That will greatly increase our test time and might be unnecessary. The last pipeline launched was for L1 so I am unsure that I can track the effect this change will have on timing as this is an L0 change. Could you report that in the PR please ?
Thanks !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could come with an approach that runs half the fused attn tests deterministically and the other half non-deterministically ?
Or run all deterministically only ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this extra line tests
test_fused_attn.pywith determinism, while the line before tests everything with non-determinism. The extratest_fused_attn.pytest takes ~20mins on Blackwell: