Feat: flux2 dev support#9234
Open
Pfannkuchensack wants to merge 6 commits into
Open
Conversation
Adds end-to-end support for FLUX.2 [dev] alongside the existing Klein implementation. Dev uses Mistral Small 3.1 (24B) as its sole text encoder instead of Klein's Qwen3, with joint_attention_dim=15360 and the guidance-distilled 32B transformer. Backend - taxonomy: Flux2VariantType.Dev, ModelType.MistralEncoder, ModelFormat.MistralEncoder, MistralVariantType - configs: probe dev via context_in_dim=15360 (main + LoRA); new mistral_encoder.py with Diffusers / Checkpoint / GGUF configs; Main_Diffusers_Flux2_Config accepts Flux2Pipeline class name - loaders: new mistral_encoder.py (AutoModel for Diffusers folder, MistralModel for single-file + GGUF with llama.cpp key conversion). Existing Klein transformer loaders are generic enough for dev - ModelRecordChanges.variant union extended with MistralVariantType Invocations - flux2_dev_model_loader, flux2_dev_text_encoder (Mistral chat-template with FLUX2_DEV_SYSTEM_MESSAGE and layer-stacking 10/20/30), flux2_dev_lora_loader (+ collection variant) - MistralEncoderField on model.py; flux2_denoise / flux2_vae_decode / flux2_vae_encode reused unchanged (already model-agnostic) Frontend - types/hooks/selectors for MistralEncoder, isFlux2DevMainModelConfig, selectFlux2DevDiffusersModels, useMistralEncoderModels - params slice fields flux2DevVaeModel / flux2DevMistralEncoderModel / flux2DevSourceModel + reducers, selectIsFlux2Dev / selectIsFlux2Klein - ParamFlux2DevModelSelect component, wired into AdvancedSettingsAccordion - buildFLUXGraph dev branch with full txt2img / img2img / inpaint / outpaint + multi-reference image editing (same flux_kontext + collect chain as Klein, since Flux2RefImageExtension is model-agnostic) - addFlux2DevLoRAs helper for dev LoRA wiring - zModelType / zModelFormat / zFlux2VariantType extended for mistral_encoder / mistral_small_3_1 / dev - OpenAPI schema regenerated, TS types updated Starter models - FLUX.2 [dev] Diffusers (bf16 + NF4), three GGUFs (Q4/Q6/Q8), Mistral encoder (bf16 + NF4)
Follow-up fixes after first end-to-end run with FLUX.2 [dev] GGUF +
Mistral 3.x GGUF + standalone FLUX.2 VAE.
Frontend
- buildFLUXGraph: wire dev model loader's vae into both flux2_denoise
(required for BN statistics / inpaint) and flux2_vae_decode; missing
edge was raising RequiredConnectionException at runtime
- readiness.ts: variant-aware FLUX.2 readiness check — dev requires
flux2DevVaeModel + flux2DevMistralEncoderModel (or a Dev diffusers
source); Klein keeps Qwen3/VAE check. Threads
hasFlux2DevDiffusersSource through generate + canvas tabs and updates
buildGenerateTabArg / buildCanvasTabArg test helpers
- en.json: noFlux2DevVaeModelSelected, noFlux2DevMistralEncoderModelSelected
Mistral encoder loader (GGUF / single-file)
- Fix "Cannot copy out of meta tensor": llama.cpp conversion produced
`model.*` keys but loader instantiated bare MistralModel (no `model.`
prefix). Add _convert_for_bare_mistral_model to strip the prefix and
drop lm_head before load_state_dict
- _materialize_remaining_meta_tensors: after load_state_dict, replace any
still-meta parameters (norms→ones, others→zeros) and buffers so the
cache→VRAM move can't fail on partial state dicts, with a warning
listing what was missing
- llama.cpp converter: map attn_q_norm/attn_k_norm (Mistral 3.x qk-norm
variants), with ordering before attn_q/attn_k to avoid bad rewrites
Tokenizer / processor fallback
- _load_processor_with_offline_fallback walks a list of sources
(black-forest-labs/FLUX.2-dev tokenizer subfolder, then
mistralai/Mistral-Small-3.1-… and 3.2-…), trying AutoProcessor then
AutoTokenizer for each, cache-first then online. Final error spells
out the three workarounds (install Diffusers folder, set HF_ENDPOINT,
pre-cache the tokenizer)
- flux2_dev_text_encoder: try multimodal `[{type, text}]` chat template
first (PixtralProcessor / Mistral3Processor), fall back to plain
string content (AutoTokenizer), then to manual [INST]…[/INST]
Qwen3 encoder probe strictness
- _get_qwen3_variant_from_state_dict and _get_variant_from_config now
return None / raise NotAMatchError for unknown hidden_size instead of
silently defaulting to qwen3_4b. The old fallback meant any llama.cpp
GGUF causal LM (Mistral, Llama, …) was wrongly classified as Qwen3 —
visible when the Mistral 3.x GGUF was identified as a Qwen3-4B encoder
- Checkpoint / GGUF / Diffusers loaders propagate the strictness
…andlers Upstream Mistral Small 3.1/3.2 (40 layers) produces off-distribution embeddings under FLUX.2's static (10, 20, 30) hidden-state extraction. The joint attention was actually trained against BFL's 30-layer cow-mistral3-small distillation — both Comfy-Org's safetensors and gguf-org's cow GGUFs ship the same 30-layer weights, just packaged differently. - Probing (configs/mistral_encoder.py) now rejects non-cow Mistrals across all three formats (Diffusers / Checkpoint / GGUF) with a clear error. - Loader (load/model_loaders/mistral_encoder.py) extracts the embedded Tekken tokenizer from the `tekken_model` U8 (safetensors) / fp16-per-byte (cow GGUF) tensor via mistral_common, falling back to the BFL HF tokenizer. Removes the INVOKEAI_MISTRAL_TOKENIZER_SOURCE env var. - Starter models: drop upstream Mistral 3.x entries, add Comfy-Org bf16/fp8/fp4 variants alongside the cow GGUFs. - MistralVariantType: drop Small3_1, keep only Cow. - pyproject.toml: add mistral-common dependency. Frontend recall: - Add Flux2DevVAEModel + Flux2DevMistralEncoderModel handlers, disambiguating Klein vs dev via presence of `mistral_encoder` / `qwen3_encoder` metadata fields (both bases are `flux2`). - Wire both into the Recall Parameters panel (hardcoded list was missing them). - Add `metadata.mistralEncoder` i18n key + colocated tests.
…encoders
After studying ComfyUI's `Flux2Tokenizer` / `Mistral3_24BModel` reference
implementation, align the FLUX.2 [dev] text-encoder path with their setup:
- Probing now accepts both 30-layer (cow distillation) and 40-layer (Mistral
Small 3, BFL canonical / upstream) Mistrals. Re-adds `MistralVariantType.Mistral24B`
alongside `Cow`. All three configs (Diffusers / Checkpoint / GGUF) updated.
- Loaders strip `model.norm` (replace with Identity) when the loaded weights
are the 30-layer cow distillation. Matches Comfy's `final_norm=False` for
the pruned variant; for transformers' `MistralModel` the final RMSNorm is
always built but the cow was trained against the raw post-layer-29 state.
- 40-layer loads now log a clear warning that upstream Mistral 3.1 / 3.2 is
NOT what FLUX.2's joint attention was trained against and recommends the
Comfy-Org bf16/fp8/fp4 or gguf-org cow GGUF variants. BFL's canonical
bundled text_encoder is also 40-layer so we don't hard-reject; the warning
is opt-in self-discipline.
- Text encoder invocation switches from `apply_chat_template(messages, ...)`
to a raw text template `[SYSTEM_PROMPT]{sys}[/SYSTEM_PROMPT][INST]{prompt}[/INST]`
fed straight to the tokenizer — byte-for-byte matches Comfy's
`Flux2Tokenizer.llama_template.format(text)`. System prompt now includes
the literal `\n` between "object" and "attribution" Comfy ships.
- `_TekkenChatTemplateAdapter` renamed to `_TekkenRawTextAdapter` and exposes
a `__call__(text, padding_side='left', ...)` interface that Tekken-encodes
the raw string (BOS=1, no EOS) and left-pads with token id 11. Matches
Comfy's `pad_left=True` / `pad_token=11` settings.
Frontend types extended for the new `mistral3_24b` variant
(zMistralVariantType, MODEL_VARIANT_TO_LONG_NAME, schema.ts).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds end-to-end support for FLUX.2 [dev] alongside the existing FLUX.2 Klein implementation. Dev is a 32B guidance-distilled rectified flow transformer that uses the BFL "cow-mistral3-small" 30-layer Mistral distillation as its sole text encoder (Mistral Small 3 family, hidden 5120, joint_attention_dim 15360). It shares the 32-channel
AutoencoderKLFlux2VAE and the 4D-RoPE sampling backend with Klein, so most of the existing infrastructure is reused — only Mistral-specific loaders, configs, and graph wiring are new.A key finding during this work: only the 30-layer cow distillation works. Upstream Mistral Small 3.1 / 3.2 (40 layers) produces off-distribution embeddings under FLUX.2's static (10, 20, 30) hidden-state extraction, because the joint attention was trained against the 30-layer model. The probing layer now rejects 40-layer Mistrals outright with a clear error.
Backend — taxonomy & probing
Flux2VariantType.Dev,ModelType.MistralEncoder,ModelFormat.MistralEncoder, single-variantMistralVariantType.Cow. Added toAnyVariantandvariant_type_adapter.ModelRecordChanges.variantunion extended.context_in_dim = 15360(main) /vec_in_dim = 5120/hidden_size = 6144(LoRA, all formats). Existing Klein and FLUX.1 probes preserved.Main_Diffusers_Flux2_Configalso acceptsFlux2Pipeline/Flux2Transformer2DModel.configs/mistral_encoder.py: Diffusers / single-file / GGUF configs underBaseModelType.Any / ModelType.MistralEncoder. All three formats reject non-cow Mistrals (hidden_size != 5120 or num_hidden_layers != 30) with aNotAMatchErrorthat names the expected geometry. Diffusers folder probe excludes full pipelines (those matchMain_Diffusers_Flux2_Config).Backend — Mistral loaders
New
load/model_loaders/mistral_encoder.pywith three loaders (Diffusers viaAutoModel, single-file viaMistralModel, GGUF viaMistralModel+ llama.cpp key conversion). Includes:_convert_for_bare_mistral_model: strips themodel.prefix and dropslm_headso aMistralForCausalLMstate dict loads cleanly into bareMistralModel._drop_quantization_metadata: dequantizes Comfy-Org's FP8/FP4 weights in place using the per-layer*.weight_scale(and optional*.input_scale) tensors and strips the_quantization_metadata/scaled_fp8markers beforeload_state_dict._materialize_remaining_meta_tensors: replaces any params/buffers still on the meta device afterload_state_dict(norms → ones, others → zeros) with a warning listing what was missing, so the model cache → VRAM move can't fail on partial state dicts.attn_q_norm/attn_k_norm(Mistral 3.x QK-norm),attn_q/k/v/output,attn_norm,ffn_*, plus the roottoken_embd/output_norm/outputkeys.llama.rope.freq_base(1e9) andllama.context_length(131072) from the header so the rebuiltMistralConfigmatches the trained-against rope.Backend — embedded Tekken tokenizer
Both Comfy-Org's single-file safetensors and gguf-org's cow GGUFs ship the canonical Mistral Tekken tokenizer JSON embedded as a tensor named
tekken_model— Comfy as a U8 blob, cow GGUFs as one fp16 value per byte. New_extract_tekken_bytes()reads either layout,_TekkenChatTemplateAdapterwrapsmistral_common.MistralTokenizerto expose the HFapply_chat_template(tokenize=True, return_tensors='pt', padding='max_length', …)surface the invocation uses, and_load_tokenizer_for_model()walks: embedded Tekken → sibling diffuserstokenizer/→black-forest-labs/FLUX.2-dev:tokenizerHF fallback. Addsmistral-commonas a dependency.This makes the Mistral encoder self-contained for the canonical Comfy / cow distributions — no HF tokenizer fetch needed.
Backend — invocations
flux2_dev_model_loader,flux2_dev_text_encoder,flux2_dev_lora_loader(+ collection variant).MistralEncoderFieldadded tomodel.py._get_mistral_3_small_prompt_embedsreference) and stacks hidden states from layers (10, 20, 30) →(B, seq, 15360)matching the transformer'sjoint_attention_dim. For the 30-layer cow that's exactly (1/3, 2/3, last). Tries multimodal[{type, text}]content first (PixtralProcessor / Mistral3Processor / Tekken adapter) and falls back to plain-string content, then to manual[INST]…[/INST]formatting.flux2_denoise/flux2_vae_decode/flux2_vae_encodereused unchanged — already model-agnostic.Qwen3 probe strictness (bugfix)
_get_qwen3_variant_from_state_dict/_get_variant_from_confignow returnNone/ raiseNotAMatchErrorfor unknownhidden_sizeinstead of silently defaulting toqwen3_4b. The old fallback meant any llama.cpp GGUF causal LM (Mistral, Llama, …) was misclassified as Qwen3 — caught when a Mistral 3.x GGUF was identified asqwen3_4b.Frontend
MistralEncoderModelConfigtype +isMistralEncoderModelConfig,isFlux2DevMainModelConfig,isFlux2DevDiffusersMainModelConfigguards.selectMistralEncoderModels,selectFlux2DevDiffusersModels,useMistralEncoderModels,useFlux2DevDiffusersModelshooks/selectors.paramsSlice:flux2DevVaeModel/flux2DevMistralEncoderModel/flux2DevSourceModelfields + reducers +selectIsFlux2Dev/selectIsFlux2Kleinselectors.ParamFlux2DevModelSelectcomponent (VAE + Mistral Encoder dropdowns), wired intoAdvancedSettingsAccordion(Dev shows the Mistral selector, Klein keeps the Qwen3 selector).buildFLUXGraph: dev branch forflux2_dev_model_loader/flux2_dev_text_encoder/ sharedflux2_denoisewith full txt2img / img2img / inpaint / outpaint support, plus multi-reference image editing viaflux_kontextcollect chain (Flux2RefImageExtensionis model-agnostic). Dev model loader'svaeis wired into bothflux2_denoiseandflux2_vae_decode.addFlux2DevLoRAshelper, wires LoRAs throughflux2_dev_lora_collection_loader.readiness.ts: variant-aware FLUX.2 readiness — dev requiresflux2DevVaeModel+flux2DevMistralEncoderModel(or a Dev diffusers source), Klein keeps the Qwen3/VAE check.hasFlux2DevDiffusersSourcethreaded through both generate and canvas tabs.zModelType/zModelFormat/zFlux2VariantTypeextended formistral_encoder/dev.zMistralVariantTypeenum carries onlycow_mistral3_small. Display-name maps updated.noFlux2DevVaeModelSelected,noFlux2DevMistralEncoderModelSelected,metadata.mistralEncoder.Frontend — metadata recall
Flux2DevVAEModel+Flux2DevMistralEncoderModelhandlers inparsing.tsx. Both gate onbase === 'flux2'and use the presence ofmetadata.mistral_encoder(dev only) vsmetadata.qwen3_encoder(Klein only) to disambiguate — dispatching intoflux2DevVaeModelSelected/flux2DevMistralEncoderModelSelectedinstead of the Klein slices.KleinVAEModelupdated to also reject whenmistral_encoderis present in the metadata, so it no longer swallows Dev images into the Klein VAE slice.ImageMetadataActions.tsx(the panel iterates a hardcoded handler list; Klein VAE / Encoder were the only FLUX.2 entries before). Remix Image already iteratesObject.values(ImageMetadataHandlers)automatically.Starter models (cow-only Mistral encoders)
black-forest-labs/FLUX.2-devDiffusers (~80 GB, Non-Commercial)diffusers/FLUX.2-dev-bnb-4bitDiffusers (NF4)gguf-org/flux2-dev-ggufQ3_K_M / Q4_K_M / Q5_K_M / Q6_K / Q8_0 transformer-only entries depending on the FLUX.2 VAE + a cow Mistral encoderRelated Issues / Discussions
None yet. The upstream
feature/flux2-noncommercial-licensework is unrelated but worth coordinating with — FLUX.2 [dev] inherits the BFL Non-Commercial License and is flagged inisNonCommercialMainModelConfig.QA Instructions
Backend probing (no model load required, takes seconds):
Manual probe against any local FLUX.2 [dev] artifacts (folder, transformer subfolder, text_encoder subfolder, GGUF, single-file VAE, cow Mistral). All should classify with the matching dev / mistral_encoder configs and existing FLUX.2 Klein fixtures should still classify identically. Probing a 40-layer Mistral Small 3.1 / 3.2 GGUF or safetensors should be rejected with a clear "expected hidden_size=5120, num_hidden_layers=30" error — this is intentional.
Frontend checks (from
invokeai/frontend/web):End-to-end with a real model:
gguf-org/flux2-dev-ggufquants (Q4_K_M is a good balance at ~20 GB) or the fullblack-forest-labs/FLUX.2-devDiffusers folderflux2-vae.safetensors(or use the one bundled in the Diffusers source)Comfy-Org/flux2-dev/split_files/text_encoders/mistral_3_small_flux2_fp8.safetensors(recommended — 18 GB, embedded Tekken) or anygguf-org/flux2-dev-gguf/cow-mistral3-small-q*.ggufvariant.MainModelDefaultSettings.from_base(Flux2, Dev): 28 steps, guidance 3.5, CFG 1.0, 1024×1024.Loaded embedded Tekken tokenizer from <filename>(no HF roundtrip). Subsequent runs go straight to denoise.Embedded-Tekken caveat: if you install an older cow GGUF that doesn't embed
tekken_model, or a diffusers folder without a siblingtokenizer/, the loader falls back toblack-forest-labs/FLUX.2-dev:tokenizeron HF. With no internet and no HF cache, you'll see a clear error documenting three workarounds (install a Comfy-Org safetensors that bundles Tekken, populate the HF cache, or runhuggingface-cli download black-forest-labs/FLUX.2-dev --include 'tokenizer/*').Merge Plan
nulland the slice version did not change. Existing user states keep working — Dev-specific fields just stay null until the user picks a Dev model.mistral-commontopyproject.tomldependencies (~6 MB + pycountry transitive ~8 MB). Required for the embedded Tekken tokenizer path; without it the loader still works via HF fallback.flux2-dev-Q2_K.gguftransformer + Comfy fp8 Mistral encoder + standalone FLUX.2 VAE — full prompt adherence confirmed on the canonical "origami lighthouse on cliff" test prompt. Lower-fidelity setups (base Mistral 3.x at any quant level) produce visibly degraded compositions, which is why those configs are now rejected at probe time.isNonCommercialMainModelConfig.Checklist
feat(flux2): add FLUX.2 [dev] support (cow Mistral, embedded Tekken, recall)Flux2DevVAEModel+Flux2DevMistralEncoderModelcases including Klein/dev cross-rejectionnull, slice version unchangedWhat's Newcopy (if doing a release after this PR)