Skip to content

Conversation

@sayakpaul
Copy link

What does this PR do?

This PR aims to simplify a few things being discussed in huggingface#12999.

However, I am seeing that in from_pretrained() when use_flashpack=True, a lot of code for dealing with parallelism and quantization are being skipped which is undesirable. Is that expected?

@devanshi00
Copy link
Owner

devanshi00 commented Jan 27, 2026

FlashPack support in Diffusers is intentionally designed as a fast-path loader that assumes the checkpoint already represents the final, fully-materialized model state. When use_flashpack=True is specified, from_pretrained performs an early return immediately after loading tensors via load_flashpack, thereby bypassing the standard Diffusers loading pipeline.
This means that the following steps are explicitly skipped during FlashPack loading:

  • Quantizer initialization and preprocessing (hf_quantizer.preprocess_model)
  • Device map inference and dispatch (_determine_device_map, dispatch_model)
  • Accelerate-based sharding and offloading
  • Post-processing hooks associated with quantization or parallelism

Supporting quantization or device placement while loading a FlashPack model would require changing how FlashPack works at a very basic level. Instead of directly loading tensors into the model, FlashPack would need to first read only information about the tensors (like their shape and type), decide how they should be transformed or where they should live (CPU, GPU, quantized format), and only then create them.
This would turn FlashPack from a fast, straightforward saving and loading format into a much more complex system that understands and applies loading policies such as quantization and device mapping. Because this adds significant complexity and changes FlashPack’s original purpose, therefore it is not supported in the current implementation.

@sayakpaul
Copy link
Author

Okay then it really should have been clarified in the huggingface#12999. Let's do that, and I will request thoughts from other maintainers.

@devanshi00
Copy link
Owner

Yes, you’re absolutely right that this should have been clarified in huggingface#12999 .
My sincere apologies for the delayed response. I was away for the past five days without access to my laptop, which prevented me from replying sooner. I’m available now and will follow up promptly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants