Skip to content

Feat: flux2 dev support#9234

Open
Pfannkuchensack wants to merge 6 commits into
invoke-ai:mainfrom
Pfannkuchensack:feature/flux2-dev-support
Open

Feat: flux2 dev support#9234
Pfannkuchensack wants to merge 6 commits into
invoke-ai:mainfrom
Pfannkuchensack:feature/flux2-dev-support

Conversation

@Pfannkuchensack
Copy link
Copy Markdown
Collaborator

@Pfannkuchensack Pfannkuchensack commented May 25, 2026

Summary

Adds end-to-end support for FLUX.2 [dev] alongside the existing FLUX.2 Klein implementation. Dev is a 32B guidance-distilled rectified flow transformer that uses the BFL "cow-mistral3-small" 30-layer Mistral distillation as its sole text encoder (Mistral Small 3 family, hidden 5120, joint_attention_dim 15360). It shares the 32-channel AutoencoderKLFlux2 VAE and the 4D-RoPE sampling backend with Klein, so most of the existing infrastructure is reused — only Mistral-specific loaders, configs, and graph wiring are new.

A key finding during this work: only the 30-layer cow distillation works. Upstream Mistral Small 3.1 / 3.2 (40 layers) produces off-distribution embeddings under FLUX.2's static (10, 20, 30) hidden-state extraction, because the joint attention was trained against the 30-layer model. The probing layer now rejects 40-layer Mistrals outright with a clear error.

Backend — taxonomy & probing

  • Taxonomy: Flux2VariantType.Dev, ModelType.MistralEncoder, ModelFormat.MistralEncoder, single-variant MistralVariantType.Cow. Added to AnyVariant and variant_type_adapter. ModelRecordChanges.variant union extended.
  • Probing: Dev is detected by context_in_dim = 15360 (main) / vec_in_dim = 5120 / hidden_size = 6144 (LoRA, all formats). Existing Klein and FLUX.1 probes preserved. Main_Diffusers_Flux2_Config also accepts Flux2Pipeline / Flux2Transformer2DModel.
  • configs/mistral_encoder.py: Diffusers / single-file / GGUF configs under BaseModelType.Any / ModelType.MistralEncoder. All three formats reject non-cow Mistrals (hidden_size != 5120 or num_hidden_layers != 30) with a NotAMatchError that names the expected geometry. Diffusers folder probe excludes full pipelines (those match Main_Diffusers_Flux2_Config).

Backend — Mistral loaders
New load/model_loaders/mistral_encoder.py with three loaders (Diffusers via AutoModel, single-file via MistralModel, GGUF via MistralModel + llama.cpp key conversion). Includes:

  • _convert_for_bare_mistral_model: strips the model. prefix and drops lm_head so a MistralForCausalLM state dict loads cleanly into bare MistralModel.
  • _drop_quantization_metadata: dequantizes Comfy-Org's FP8/FP4 weights in place using the per-layer *.weight_scale (and optional *.input_scale) tensors and strips the _quantization_metadata / scaled_fp8 markers before load_state_dict.
  • _materialize_remaining_meta_tensors: replaces any params/buffers still on the meta device after load_state_dict (norms → ones, others → zeros) with a warning listing what was missing, so the model cache → VRAM move can't fail on partial state dicts.
  • llama.cpp converter handles attn_q_norm/attn_k_norm (Mistral 3.x QK-norm), attn_q/k/v/output, attn_norm, ffn_*, plus the root token_embd/output_norm/output keys.
  • GGUF metadata reader pulls llama.rope.freq_base (1e9) and llama.context_length (131072) from the header so the rebuilt MistralConfig matches the trained-against rope.

Backend — embedded Tekken tokenizer
Both Comfy-Org's single-file safetensors and gguf-org's cow GGUFs ship the canonical Mistral Tekken tokenizer JSON embedded as a tensor named tekken_model — Comfy as a U8 blob, cow GGUFs as one fp16 value per byte. New _extract_tekken_bytes() reads either layout, _TekkenChatTemplateAdapter wraps mistral_common.MistralTokenizer to expose the HF apply_chat_template(tokenize=True, return_tensors='pt', padding='max_length', …) surface the invocation uses, and _load_tokenizer_for_model() walks: embedded Tekken → sibling diffusers tokenizer/black-forest-labs/FLUX.2-dev:tokenizer HF fallback. Adds mistral-common as a dependency.

This makes the Mistral encoder self-contained for the canonical Comfy / cow distributions — no HF tokenizer fetch needed.

Backend — invocations

  • flux2_dev_model_loader, flux2_dev_text_encoder, flux2_dev_lora_loader (+ collection variant). MistralEncoderField added to model.py.
  • Text encoder runs Mistral's chat template with a fixed system message (matches the diffusers _get_mistral_3_small_prompt_embeds reference) and stacks hidden states from layers (10, 20, 30) → (B, seq, 15360) matching the transformer's joint_attention_dim. For the 30-layer cow that's exactly (1/3, 2/3, last). Tries multimodal [{type, text}] content first (PixtralProcessor / Mistral3Processor / Tekken adapter) and falls back to plain-string content, then to manual [INST]…[/INST] formatting.
  • flux2_denoise / flux2_vae_decode / flux2_vae_encode reused unchanged — already model-agnostic.

Qwen3 probe strictness (bugfix)

  • _get_qwen3_variant_from_state_dict / _get_variant_from_config now return None / raise NotAMatchError for unknown hidden_size instead of silently defaulting to qwen3_4b. The old fallback meant any llama.cpp GGUF causal LM (Mistral, Llama, …) was misclassified as Qwen3 — caught when a Mistral 3.x GGUF was identified as qwen3_4b.

Frontend

  • New MistralEncoderModelConfig type + isMistralEncoderModelConfig, isFlux2DevMainModelConfig, isFlux2DevDiffusersMainModelConfig guards. selectMistralEncoderModels, selectFlux2DevDiffusersModels, useMistralEncoderModels, useFlux2DevDiffusersModels hooks/selectors.
  • paramsSlice: flux2DevVaeModel / flux2DevMistralEncoderModel / flux2DevSourceModel fields + reducers + selectIsFlux2Dev / selectIsFlux2Klein selectors.
  • ParamFlux2DevModelSelect component (VAE + Mistral Encoder dropdowns), wired into AdvancedSettingsAccordion (Dev shows the Mistral selector, Klein keeps the Qwen3 selector).
  • buildFLUXGraph: dev branch for flux2_dev_model_loader / flux2_dev_text_encoder / shared flux2_denoise with full txt2img / img2img / inpaint / outpaint support, plus multi-reference image editing via flux_kontext collect chain (Flux2RefImageExtension is model-agnostic). Dev model loader's vae is wired into both flux2_denoise and flux2_vae_decode.
  • New addFlux2DevLoRAs helper, wires LoRAs through flux2_dev_lora_collection_loader.
  • readiness.ts: variant-aware FLUX.2 readiness — dev requires flux2DevVaeModel + flux2DevMistralEncoderModel (or a Dev diffusers source), Klein keeps the Qwen3/VAE check. hasFlux2DevDiffusersSource threaded through both generate and canvas tabs.
  • zModelType / zModelFormat / zFlux2VariantType extended for mistral_encoder / dev. zMistralVariantType enum carries only cow_mistral3_small. Display-name maps updated.
  • New i18n keys: noFlux2DevVaeModelSelected, noFlux2DevMistralEncoderModelSelected, metadata.mistralEncoder.
  • OpenAPI schema regenerated; TS types up to date.

Frontend — metadata recall

  • New Flux2DevVAEModel + Flux2DevMistralEncoderModel handlers in parsing.tsx. Both gate on base === 'flux2' and use the presence of metadata.mistral_encoder (dev only) vs metadata.qwen3_encoder (Klein only) to disambiguate — dispatching into flux2DevVaeModelSelected / flux2DevMistralEncoderModelSelected instead of the Klein slices.
  • KleinVAEModel updated to also reject when mistral_encoder is present in the metadata, so it no longer swallows Dev images into the Klein VAE slice.
  • Wired both new handlers into the Recall Parameters panel in ImageMetadataActions.tsx (the panel iterates a hardcoded handler list; Klein VAE / Encoder were the only FLUX.2 entries before). Remix Image already iterates Object.values(ImageMetadataHandlers) automatically.

Starter models (cow-only Mistral encoders)

  • black-forest-labs/FLUX.2-dev Diffusers (~80 GB, Non-Commercial)
  • diffusers/FLUX.2-dev-bnb-4bit Diffusers (NF4)
  • gguf-org/flux2-dev-gguf Q3_K_M / Q4_K_M / Q5_K_M / Q6_K / Q8_0 transformer-only entries depending on the FLUX.2 VAE + a cow Mistral encoder
  • Mistral encoders (cow-only):
    • Comfy-Org single-file safetensors: bf16 (35.6 GB), fp8 (18 GB), fp4_mixed (12.3 GB) — all embed the Tekken tokenizer
    • gguf-org cow GGUFs: Q4_0, Q8_0, IQ4_XS — also embed Tekken
  • Upstream Mistral Small 3.1 / 3.2 entries removed — they don't work for FLUX.2 (wrong layer count).

Related Issues / Discussions

None yet. The upstream feature/flux2-noncommercial-license work is unrelated but worth coordinating with — FLUX.2 [dev] inherits the BFL Non-Commercial License and is flagged in isNonCommercialMainModelConfig.

QA Instructions

Backend probing (no model load required, takes seconds):

uv run --extra cuda pytest tests/test_imports.py tests/model_identification/test_identification.py::test_default_settings_main tests/model_identification/test_identification.py::test_controlnet_t2i_default_settings

Manual probe against any local FLUX.2 [dev] artifacts (folder, transformer subfolder, text_encoder subfolder, GGUF, single-file VAE, cow Mistral). All should classify with the matching dev / mistral_encoder configs and existing FLUX.2 Klein fixtures should still classify identically. Probing a 40-layer Mistral Small 3.1 / 3.2 GGUF or safetensors should be rejected with a clear "expected hidden_size=5120, num_hidden_layers=30" error — this is intentional.

Frontend checks (from invokeai/frontend/web):

pnpm lint:tsc
pnpm lint:eslint
pnpm test:no-watch

End-to-end with a real model:

  1. Install via the Model Manager:
    • Transformer: any of the gguf-org/flux2-dev-gguf quants (Q4_K_M is a good balance at ~20 GB) or the full black-forest-labs/FLUX.2-dev Diffusers folder
    • VAE: flux2-vae.safetensors (or use the one bundled in the Diffusers source)
    • Mistral encoder: either Comfy-Org/flux2-dev/split_files/text_encoders/mistral_3_small_flux2_fp8.safetensors (recommended — 18 GB, embedded Tekken) or any gguf-org/flux2-dev-gguf/cow-mistral3-small-q*.gguf variant.
  2. Select the FLUX.2 [dev] transformer in the main-model picker. The Advanced Settings accordion should now show "FLUX.2 [dev] VAE" and "FLUX.2 [dev] Mistral Encoder" dropdowns (not the Klein Qwen3 selector).
  3. For GGUF / single-file transformers: pick the standalone VAE + Mistral encoder. The pre-queue check should produce Dev-specific error messages if either is missing.
  4. Defaults from MainModelDefaultSettings.from_base(Flux2, Dev): 28 steps, guidance 3.5, CFG 1.0, 1024×1024.
  5. Generate. Watch the log on first run — for Comfy / cow files you should see Loaded embedded Tekken tokenizer from <filename> (no HF roundtrip). Subsequent runs go straight to denoise.
  6. Try a multi-reference image: add a FLUX.2 ref image (the same UI as Klein), generate.
  7. Apply a FLUX.2 LoRA and generate — variant mismatch (Klein LoRA on Dev) should log a warning but not crash.
  8. Recall test: open any generated FLUX.2 [dev] image. Under "Recall Parameters" the panel should now include Mistral Encoder and VAE rows alongside Model / Steps / Scheduler. Click "Remix Image" — both Dev VAE and Mistral encoder should restore into the params slice. Repeat with a Klein image and confirm Klein VAE + Qwen3 encoder still recall correctly (no cross-contamination).

Embedded-Tekken caveat: if you install an older cow GGUF that doesn't embed tekken_model, or a diffusers folder without a sibling tokenizer/, the loader falls back to black-forest-labs/FLUX.2-dev:tokenizer on HF. With no internet and no HF cache, you'll see a clear error documenting three workarounds (install a Comfy-Org safetensors that bundles Tekken, populate the HF cache, or run huggingface-cli download black-forest-labs/FLUX.2-dev --include 'tokenizer/*').

Merge Plan

  • No DB migration: the params slice gained new fields, but they default to null and the slice version did not change. Existing user states keep working — Dev-specific fields just stay null until the user picks a Dev model.
  • Adds mistral-common to pyproject.toml dependencies (~6 MB + pycountry transitive ~8 MB). Required for the embedded Tekken tokenizer path; without it the loader still works via HF fallback.
  • Tested locally end-to-end with flux2-dev-Q2_K.gguf transformer + Comfy fp8 Mistral encoder + standalone FLUX.2 VAE — full prompt adherence confirmed on the canonical "origami lighthouse on cliff" test prompt. Lower-fidelity setups (base Mistral 3.x at any quant level) produce visibly degraded compositions, which is why those configs are now rejected at probe time.
  • Non-Commercial License: dev inherits the existing FLUX dev / Klein 9B non-commercial flagging in isNonCommercialMainModelConfig.

Checklist

  • The PR has a short but descriptive title, suitable for a changeloge.g. feat(flux2): add FLUX.2 [dev] support (cow Mistral, embedded Tekken, recall)
  • Tests added / updated (if applicable) — model-identification + readiness fixtures cover new fields; metadata-recall test suite gained Flux2DevVAEModel + Flux2DevMistralEncoderModel cases including Klein/dev cross-rejection
  • ❗Changes to a redux slice have a corresponding migration — N/A, new fields default to null, slice version unchanged
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

Adds end-to-end support for FLUX.2 [dev] alongside the existing Klein
implementation. Dev uses Mistral Small 3.1 (24B) as its sole text encoder
instead of Klein's Qwen3, with joint_attention_dim=15360 and the
guidance-distilled 32B transformer.

Backend
- taxonomy: Flux2VariantType.Dev, ModelType.MistralEncoder,
  ModelFormat.MistralEncoder, MistralVariantType
- configs: probe dev via context_in_dim=15360 (main + LoRA); new
  mistral_encoder.py with Diffusers / Checkpoint / GGUF configs;
  Main_Diffusers_Flux2_Config accepts Flux2Pipeline class name
- loaders: new mistral_encoder.py (AutoModel for Diffusers folder,
  MistralModel for single-file + GGUF with llama.cpp key conversion).
  Existing Klein transformer loaders are generic enough for dev
- ModelRecordChanges.variant union extended with MistralVariantType

Invocations
- flux2_dev_model_loader, flux2_dev_text_encoder (Mistral chat-template
  with FLUX2_DEV_SYSTEM_MESSAGE and layer-stacking 10/20/30),
  flux2_dev_lora_loader (+ collection variant)
- MistralEncoderField on model.py; flux2_denoise / flux2_vae_decode /
  flux2_vae_encode reused unchanged (already model-agnostic)

Frontend
- types/hooks/selectors for MistralEncoder, isFlux2DevMainModelConfig,
  selectFlux2DevDiffusersModels, useMistralEncoderModels
- params slice fields flux2DevVaeModel / flux2DevMistralEncoderModel /
  flux2DevSourceModel + reducers, selectIsFlux2Dev / selectIsFlux2Klein
- ParamFlux2DevModelSelect component, wired into AdvancedSettingsAccordion
- buildFLUXGraph dev branch with full txt2img / img2img / inpaint /
  outpaint + multi-reference image editing (same flux_kontext +
  collect chain as Klein, since Flux2RefImageExtension is model-agnostic)
- addFlux2DevLoRAs helper for dev LoRA wiring
- zModelType / zModelFormat / zFlux2VariantType extended for
  mistral_encoder / mistral_small_3_1 / dev
- OpenAPI schema regenerated, TS types updated

Starter models
- FLUX.2 [dev] Diffusers (bf16 + NF4), three GGUFs (Q4/Q6/Q8), Mistral
  encoder (bf16 + NF4)
Follow-up fixes after first end-to-end run with FLUX.2 [dev] GGUF +
Mistral 3.x GGUF + standalone FLUX.2 VAE.

Frontend
- buildFLUXGraph: wire dev model loader's vae into both flux2_denoise
  (required for BN statistics / inpaint) and flux2_vae_decode; missing
  edge was raising RequiredConnectionException at runtime
- readiness.ts: variant-aware FLUX.2 readiness check — dev requires
  flux2DevVaeModel + flux2DevMistralEncoderModel (or a Dev diffusers
  source); Klein keeps Qwen3/VAE check. Threads
  hasFlux2DevDiffusersSource through generate + canvas tabs and updates
  buildGenerateTabArg / buildCanvasTabArg test helpers
- en.json: noFlux2DevVaeModelSelected, noFlux2DevMistralEncoderModelSelected

Mistral encoder loader (GGUF / single-file)
- Fix "Cannot copy out of meta tensor": llama.cpp conversion produced
  `model.*` keys but loader instantiated bare MistralModel (no `model.`
  prefix). Add _convert_for_bare_mistral_model to strip the prefix and
  drop lm_head before load_state_dict
- _materialize_remaining_meta_tensors: after load_state_dict, replace any
  still-meta parameters (norms→ones, others→zeros) and buffers so the
  cache→VRAM move can't fail on partial state dicts, with a warning
  listing what was missing
- llama.cpp converter: map attn_q_norm/attn_k_norm (Mistral 3.x qk-norm
  variants), with ordering before attn_q/attn_k to avoid bad rewrites

Tokenizer / processor fallback
- _load_processor_with_offline_fallback walks a list of sources
  (black-forest-labs/FLUX.2-dev tokenizer subfolder, then
  mistralai/Mistral-Small-3.1-… and 3.2-…), trying AutoProcessor then
  AutoTokenizer for each, cache-first then online. Final error spells
  out the three workarounds (install Diffusers folder, set HF_ENDPOINT,
  pre-cache the tokenizer)
- flux2_dev_text_encoder: try multimodal `[{type, text}]` chat template
  first (PixtralProcessor / Mistral3Processor), fall back to plain
  string content (AutoTokenizer), then to manual [INST]…[/INST]

Qwen3 encoder probe strictness
- _get_qwen3_variant_from_state_dict and _get_variant_from_config now
  return None / raise NotAMatchError for unknown hidden_size instead of
  silently defaulting to qwen3_4b. The old fallback meant any llama.cpp
  GGUF causal LM (Mistral, Llama, …) was wrongly classified as Qwen3 —
  visible when the Mistral 3.x GGUF was identified as a Qwen3-4B encoder
- Checkpoint / GGUF / Diffusers loaders propagate the strictness
@github-actions github-actions Bot added python PRs that change python files invocations PRs that change invocations backend PRs that change backend files services PRs that change app services frontend PRs that change frontend files labels May 25, 2026
@lstein lstein self-assigned this May 27, 2026
@lstein lstein added the 6.14.x label May 27, 2026
@lstein lstein moved this to 6.14.x Theme: USER EXPERIENCE in Invoke - Community Roadmap May 27, 2026
…andlers

Upstream Mistral Small 3.1/3.2 (40 layers) produces off-distribution embeddings
under FLUX.2's static (10, 20, 30) hidden-state extraction. The joint attention
was actually trained against BFL's 30-layer cow-mistral3-small distillation —
both Comfy-Org's safetensors and gguf-org's cow GGUFs ship the same 30-layer
weights, just packaged differently.

- Probing (configs/mistral_encoder.py) now rejects non-cow Mistrals across all
  three formats (Diffusers / Checkpoint / GGUF) with a clear error.
- Loader (load/model_loaders/mistral_encoder.py) extracts the embedded Tekken
  tokenizer from the `tekken_model` U8 (safetensors) / fp16-per-byte (cow GGUF)
  tensor via mistral_common, falling back to the BFL HF tokenizer. Removes the
  INVOKEAI_MISTRAL_TOKENIZER_SOURCE env var.
- Starter models: drop upstream Mistral 3.x entries, add Comfy-Org bf16/fp8/fp4
  variants alongside the cow GGUFs.
- MistralVariantType: drop Small3_1, keep only Cow.
- pyproject.toml: add mistral-common dependency.

Frontend recall:
- Add Flux2DevVAEModel + Flux2DevMistralEncoderModel handlers, disambiguating
  Klein vs dev via presence of `mistral_encoder` / `qwen3_encoder` metadata
  fields (both bases are `flux2`).
- Wire both into the Recall Parameters panel (hardcoded list was missing them).
- Add `metadata.mistralEncoder` i18n key + colocated tests.
@github-actions github-actions Bot added Root python-deps PRs that change python dependencies labels Jun 6, 2026
…encoders

After studying ComfyUI's `Flux2Tokenizer` / `Mistral3_24BModel` reference
implementation, align the FLUX.2 [dev] text-encoder path with their setup:

- Probing now accepts both 30-layer (cow distillation) and 40-layer (Mistral
  Small 3, BFL canonical / upstream) Mistrals. Re-adds `MistralVariantType.Mistral24B`
  alongside `Cow`. All three configs (Diffusers / Checkpoint / GGUF) updated.

- Loaders strip `model.norm` (replace with Identity) when the loaded weights
  are the 30-layer cow distillation. Matches Comfy's `final_norm=False` for
  the pruned variant; for transformers' `MistralModel` the final RMSNorm is
  always built but the cow was trained against the raw post-layer-29 state.

- 40-layer loads now log a clear warning that upstream Mistral 3.1 / 3.2 is
  NOT what FLUX.2's joint attention was trained against and recommends the
  Comfy-Org bf16/fp8/fp4 or gguf-org cow GGUF variants. BFL's canonical
  bundled text_encoder is also 40-layer so we don't hard-reject; the warning
  is opt-in self-discipline.

- Text encoder invocation switches from `apply_chat_template(messages, ...)`
  to a raw text template `[SYSTEM_PROMPT]{sys}[/SYSTEM_PROMPT][INST]{prompt}[/INST]`
  fed straight to the tokenizer — byte-for-byte matches Comfy's
  `Flux2Tokenizer.llama_template.format(text)`. System prompt now includes
  the literal `\n` between "object" and "attribution" Comfy ships.

- `_TekkenChatTemplateAdapter` renamed to `_TekkenRawTextAdapter` and exposes
  a `__call__(text, padding_side='left', ...)` interface that Tekken-encodes
  the raw string (BOS=1, no EOS) and left-pads with token id 11. Matches
  Comfy's `pad_left=True` / `pad_token=11` settings.

Frontend types extended for the new `mistral3_24b` variant
(zMistralVariantType, MODEL_VARIANT_TO_LONG_NAME, schema.ts).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.14.x backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-deps PRs that change python dependencies Root services PRs that change app services

Projects

Status: 6.14.x Theme: USER EXPERIENCE

Development

Successfully merging this pull request may close these issues.

2 participants