Fix the attention mask in ulysses SP for QwenImage#13278
Fix the attention mask in ulysses SP for QwenImage#13278zhtmike wants to merge 3 commits intohuggingface:mainfrom
Conversation
| if attn_mask is not None and attn_mask.dim() == 2: | ||
| attn_mask = attn_mask[:, None, None, :] |
There was a problem hiding this comment.
I haven't test with other models. But I think models with a 2D masks input should have the similar problem
There was a problem hiding this comment.
Possible to check out one other? And also run the
There was a problem hiding this comment.
Thanks. From a quick scan, most models seem to handle the mask shape correctly in their own implementations. So I’ve limited the modification to QwenImage only.
Should I run any test cases?
There was a problem hiding this comment.
Thanks! Maybe we could add a similar test to
?I will give you a ping once it's refactored to follow the latest pattern.
There was a problem hiding this comment.
Could we try with "native_cudnn"?
How to enable that?
There was a problem hiding this comment.
Sorry disregard my suggestion on using the CUDNN backend.
Yes, native attention x Ulysses is perfect for single prompt input. Currently batch inputs have some problem.
Is it the case just for Qwen or the same happens for Flux, as well? Also, the test under consideration -- does it not use a single prompt?
There was a problem hiding this comment.
Is it the case just for Qwen or the same happens for Flux, as well?
So far, I have only found that Qwen has this problem. Other models, such as Z-Image, HunyuanImage
expand the attention mask in a similar way before entering the attention block. For Flux, I tested with the main branch, and it works fine with both CP and batch inputs.
Also, the test under consideration -- does it not use a single prompt?
I am wondering whether we should add a batch input test if possible. At the beginning, I think we should first ensure that all unit tests pass without modifying them.
The background of this bug is that we are working on the training engine based on the Diffusers backend, using QwenImage as the first example. Therefore, we may need a combination of batch inputs (for high throughput) as well as Ulysses SP support. This is why we encountered this bug during the forward process.
There was a problem hiding this comment.
Agreed and thanks so much for the context!
I am wondering whether we should add a batch input test if possible. At the beginning, I think we should first ensure that all unit tests pass without modifying them.
Would you like to take a crack at this? We'll be quick to review.
I think first we need to ensure that the test_context_parallel_inference() test is xfailed when ring attention is enabled with the SDPA. #13182 is adding a test suite for CP-backends and attention backends.
And then a test for batched inputs.
Then let's revisit this PR?
WDYT?
There was a problem hiding this comment.
Sure NP, I will add a UT test for batch input
|
@naykun if you want to take a look |
| batch_size, image_seq_len = hidden_states.shape[:2] | ||
| image_mask = torch.ones((batch_size, image_seq_len), dtype=torch.bool, device=hidden_states.device) | ||
| joint_attention_mask = torch.cat([encoder_hidden_states_mask, image_mask], dim=1) | ||
| joint_attention_mask = joint_attention_mask[:, None, None, :] |
There was a problem hiding this comment.
Yes. The image is same w/o this change.
What does this PR do?
Fix issue #13277.
QwenImagePipeline cannot run with Ulysses SP together with batch prompt inputs. It is related to the mask is not correctly broadcasted.
We need to broadcast the attention mask from [B, S] to [B, H, S_q, S_kv] or simply [B, 1, 1, S_kv] before feeding into SDPA.
Before Fix:
We have the error when running the code snippet mentioned in the issue.
After Fix:
The images are correctly produced.


Fixes # (issue)
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sayakpaul
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.