-
Notifications
You must be signed in to change notification settings - Fork 726
Implement 4over6 NVFP4 recipe #2972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
zianglih
wants to merge
69
commits into
NVIDIA:main
Choose a base branch
from
zianglih:4over6
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
69 commits
Select commit
Hold shift + click to select a range
19b6b08
Initial implementation
zianglih 7b0b2d0
Make 4over6 compile time for dequant
zianglih 1e5b6ad
Expand 1d fwd+bwd test
zianglih 99660fc
Refactor
zianglih cb2e0a3
Clean up
zianglih 2c066f9
Clean up
zianglih 69e8f3a
Add gemm test
zianglih 009e651
Add more tests and fix offload
zianglih 3153fc3
Fix offload
zianglih e31b758
Clean up arg
zianglih fcd526c
Add more test
zianglih 100c378
Add more tests
zianglih 1c9f26b
Clean up test
zianglih 93fe922
Refactor cuh kernel impl
zianglih f4e4a4e
Further extract
zianglih b3f59ee
Clean up
zianglih 31decf9
Add recipe_id
zianglih 2fa6b8c
Fix failing unit tests
zianglih 7df2db0
Clean up test
zianglih ce85be2
Clean up
zianglih 1b68038
Refactor ref
zianglih bb722a3
Update comments and docs
zianglih fe18a1e
Drop unnecessary test_sanity workaround
zianglih 522e93e
Refactor `QuantizerRole`
zianglih 782b7ee
Allow separate recipe 4over6 config
zianglih d9cd12c
Support 2d
zianglih 708c1ec
Refactor 2d
zianglih 4d31f18
Clean up anti pattern
zianglih dfc15f2
Enforce 4over6 consistency
zianglih 9453670
Update comments
zianglih 6d871da
Update docs
zianglih f8338e8
Fix test
zianglih c9bc921
Drop test_fusible_ops
zianglih 00ba694
Revert "Drop test_fusible_ops"
zianglih 3252d4e
Refactor test_fusible_ops
zianglih 3f33c1d
Refactor ref and extend cpp test
zianglih 8607e03
Clean up cpp test
zianglih d3dbf34
Minor comment
zianglih 565f33f
Drop doc
zianglih 54b4da8
Explicit handle conditional smem buffer
zianglih fa09200
Further clean up
zianglih e57e8be
More templates
zianglih a1df319
Simplify cpp
zianglih 21720da
Drop write back lifting
zianglih b1d073a
Add MAE and dedicated fast math env var
zianglih 0392708
Harden cpp test
zianglih 0b77a37
Add warning and err fast math coverage
zianglih 81e579e
Fold test case and clean up cpp test
zianglih 1e311ef
Initial 448 vs 256 implementation
zianglih 38a1c4c
Use e4m3 max instead of boolean, more template
zianglih 3cdd9d9
Add benchmark script and minor optimization
zianglih 7deba75
Use standalone kernels
zianglih 93dbf2b
Use cp async
zianglih 8819d12
Add benchmark script
zianglih 24e417b
Minor fix after rebase
zianglih 472e5b8
Naming consistency
zianglih 83e2308
Remove 4over6 benchmark
zianglih 2980cb1
Refactor modes
zianglih 967293f
Relax tol for `test_layernorm_mlp` for `nvfp4_4over6`
zianglih f555bf2
Minor fix recipe naming
zianglih 7a4b5c0
Remove gradient 4over6 quantization and partially allow SR/RHT
zianglih e036a7c
Allow RHT in pytorch ref
zianglih f8c4373
Update transformer_engine/pytorch/csrc/quantizer.cpp
timmoon10 96fcb43
Minor fix TODO lint
zianglih 3e6d4cd
Use standard nvfp4 for grad ref in test_fusible_ops.py since 4over6 i…
zianglih 1a5c19d
Minor fix test-fusible_ops 4over6 helper
zianglih 63b82a5
Default to 256 for 4over6
zianglih 3e130f7
Reset RNG state for each TE ops test
timmoon10 5f2d761
Merge branch 'main' into 4over6
zianglih File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.