fix: compute image_seq_len from spatial dims in Lumina2 pipeline#13272
Open
gambletan wants to merge 1 commit intohuggingface:mainfrom
Open
fix: compute image_seq_len from spatial dims in Lumina2 pipeline#13272gambletan wants to merge 1 commit intohuggingface:mainfrom
gambletan wants to merge 1 commit intohuggingface:mainfrom
Conversation
…na2 pipeline Fixes huggingface#12913 `image_seq_len` was computed as `latents.shape[1]`, which gives the channel dimension (e.g. 16) since Lumina2 latents have shape `(batch, channels, height, width)` and are NOT packed/reshaped before this point. The Lumina2 transformer internally patchifies the latents with `patch_size=2`, so the correct spatial sequence length is `(H // patch_size) * (W // patch_size)`. This incorrect value was passed to `calculate_shift()`, which computes the `mu` parameter for the flow-matching scheduler. Using channel count instead of token count produces a completely wrong shift, degrading generation quality. The fix reads `patch_size` from `self.transformer.config.patch_size` and computes `image_seq_len` from the last two (spatial) dimensions of the latents tensor, matching how the transformer itself computes its input sequence length. For reference, the Flux pipeline correctly uses `latents.shape[1]` because Flux latents are pre-packed into `(batch, seq_len, channels)` before this computation. Lumina2 does not pre-pack, so the same indexing does not apply. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #12913
image_seq_len = latents.shape[1]takes the channel dimension (e.g. 16) instead of the spatial sequence length. Lumina2 latents have shape(batch, channels, height, width)and are NOT packed before this point.image_seq_lenfeeds intocalculate_shift()which computesmufor the flow-matching scheduler. Using channel count (~16) instead of token count (e.g. 4096 for 1024x1024 images) produces a completely wrong shift value, degrading generation quality.image_seq_lenas(latents.shape[-2] // patch_size) * (latents.shape[-1] // patch_size), readingpatch_sizefromself.transformer.config.patch_size. This matches how the Lumina2 transformer internally patchifies its input.Why Flux uses
latents.shape[1]but Lumina2 cannotThe Flux pipeline correctly uses
latents.shape[1]because Flux latents are pre-packed into(batch, seq_len, channels)beforeimage_seq_lenis computed. Lumina2 does not pre-pack its latents — the transformer handles patchification internally — soshape[1]gives channels, not sequence length.Changes
src/diffusers/pipelines/lumina2/pipeline_lumina2.py: Replacelatents.shape[1]with spatial sequence length computation usingpatch_sizefrom transformer configtests/pipelines/lumina2/test_pipeline_lumina2.py: Add test verifyingmuis computed from spatial dimensions (not channel dim), using dimensions where channel count != spatial seq_len to catch regressions