Skip to content

Add low precision attention API from torchao to TorchAoConfig#13285

Draft
howardzhang-cv wants to merge 1 commit intohuggingface:mainfrom
howardzhang-cv:feature/fp8_attn_ao
Draft

Add low precision attention API from torchao to TorchAoConfig#13285
howardzhang-cv wants to merge 1 commit intohuggingface:mainfrom
howardzhang-cv:feature/fp8_attn_ao

Conversation

@howardzhang-cv
Copy link
Contributor

@howardzhang-cv howardzhang-cv commented Mar 19, 2026

What does this PR do?

Adds low precision attention API from TorchAO to diffusers by updating TorchAoConfig with attn_backend option.
Note: this will require torchao 0.17.0

Todo:

Results:

Results are on flux.1-dev with 2048x2048 image size

Config Median Time (s) Speedup
bf16 baseline 4.39 1.00x
fp8_attn 4.14 1.06x
bf16 baseline + torch.compile 3.17 1.00x
fp8_attn + compile 2.66 1.19x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant