[Bug] Z-Image-base quantized versions yield black pictures

### Git commit

commit f957fa3

### Operating System & Version

Windows 10 Pro - 22H2

### GGML backends

CUDA

### Command-line arguments used

sd-cli.exe --diffusion-model models\unet\z-image-Q3_K_M.gguf --vae models\vae\flux_vae.safetensors --llm models\text_encoders\Qwen3-4B-Instruct-2507-Q4_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --offload-to-cpu --diffusion-fa -H 1024 -W 512 --vae-tiling

### Steps to reproduce

Tested command (from [documentation](https://github.com/leejet/stable-diffusion.cpp/blob/master/docs/z_image.md) with added --vae-tiling, minus -v)
```
sd-cli.exe --diffusion-model models\unet\z-image-Q3_K_M.gguf --vae models\vae\flux_vae.safetensors --llm models\text_encoders\Qwen3-4B-Instruct-2507-Q4_K_M.gguf -p "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic" --cfg-scale 5.0 --offload-to-cpu --diffusion-fa -H 1024 -W 512 --vae-tiling
```





### What you expected to happen

Getting the picture from documentation.

For example using Z-image-turbo (same command, only cfg 1.0 and a Q4_K version of the diffusion model):

<img width="512" height="1024" alt="Image" src="https://github.com/user-attachments/assets/ce685c82-c82d-4f19-ada6-2d1a0e3c3a89" />

### What actually happened

Quantized version of Z-Image-base results in black pictures while Z-image-turbo works without issue.

Output of above command:

<img width="512" height="1024" alt="Image" src="https://github.com/user-attachments/assets/8e35fc6e-8bfe-4bd0-b10a-4530990e9618" />

Preview using "proj" are also full black pictures.

### Logs / error messages / stack trace

Base logs without debug:
```
[INFO ] ggml_extend.hpp:78   - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:78   -   Device 0: NVIDIA GeForce GTX 1060, compute capability 6.1, VMM: yes
[INFO ] stable-diffusion.cpp:260  - loading diffusion model from 'models\unet\z-image-Q3_K_M.gguf'
[INFO ] model.cpp:370  - load models\unet\z-image-Q3_K_M.gguf using gguf format
[INFO ] stable-diffusion.cpp:307  - loading llm from 'models\text_encoders\Qwen3-4B-Instruct-2507-Q4_K_M.gguf'
[INFO ] model.cpp:370  - load models\text_encoders\Qwen3-4B-Instruct-2507-Q4_K_M.gguf using gguf format
[INFO ] stable-diffusion.cpp:321  - loading vae from 'models\vae\flux_vae.safetensors'
[INFO ] model.cpp:373  - load models\vae\flux_vae.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:337  - Version: Z-Image
[INFO ] stable-diffusion.cpp:365  - Weight type stat:                      f32: 390  |    q3_K: 55   |    q4_K: 303  |    q5_K: 17   |    q6_K: 58   |    bf16: 272
[INFO ] stable-diffusion.cpp:366  - Conditioner weight type stat:          f32: 145  |    q4_K: 216  |    q6_K: 37
[INFO ] stable-diffusion.cpp:367  - Diffusion model weight type stat:      f32: 245  |    q3_K: 55   |    q4_K: 87   |    q5_K: 17   |    q6_K: 21   |    bf16: 28
[INFO ] stable-diffusion.cpp:368  - VAE weight type stat:                 bf16: 244
[INFO ] stable-diffusion.cpp:735  - Using flash attention in the diffusion model
  |====================>                             | 453/1095 - 218.63it/s←[K
  |======================================>           | 851/1095 - 233.73it/s←[K
  |==================================================| 1095/1095 - 283.83it/s←[K
[INFO ] model.cpp:1629 - loading tensors completed, taking 3.86s (process: 0.00s, read: 3.06s, memcpy: 0.00s, convert: 0.19s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:876  - total params memory size = 8000.09MB (VRAM 8000.09MB, RAM 0.00MB): text_encoders 3555.38MB(VRAM), diffusion_model 4350.14MB(VRAM), vae 94.57MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:945  - running in FLOW mode
[INFO ] stable-diffusion.cpp:3527 - sampling using Euler method
[INFO ] denoiser.hpp:494  - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3654 - TXT2IMG
[INFO ] ggml_extend.hpp:1862 - qwen3 offload params (3555.38 MB, 398 tensors) to runtime backend (CUDA0), taking 1.71s
[INFO ] ggml_extend.hpp:1862 - qwen3 offload params (3555.38 MB, 398 tensors) to runtime backend (CUDA0), taking 0.85s
[INFO ] stable-diffusion.cpp:3271 - get_learned_condition completed, taking 3226 ms
[INFO ] stable-diffusion.cpp:3382 - generating image: 1/1 - seed 42
[INFO ] ggml_extend.hpp:1862 - z_image offload params (4350.17 MB, 453 tensors) to runtime backend (CUDA0), taking 1.08s
  |==================================================| 20/20 - 16.21s/it←[K
[INFO ] stable-diffusion.cpp:3424 - sampling completed, taking 324.51s
[INFO ] stable-diffusion.cpp:3435 - generating 1 latent images completed, taking 325.04s
[INFO ] stable-diffusion.cpp:3438 - decoding 1 latents
[INFO ] ggml_extend.hpp:1862 - vae offload params ( 94.57 MB, 138 tensors) to runtime backend (CUDA0), taking 0.07s
  |==================================================| 21/21 - 1.64it/s←[K
[INFO ] stable-diffusion.cpp:3448 - latent 1 decoded, taking 13.10s
[INFO ] stable-diffusion.cpp:3452 - decode_first_stage completed, taking 13.10s
[INFO ] stable-diffusion.cpp:3762 - generate_image completed in 341.37s
[INFO ] main.cpp:421  - save result image 0 to 'output.png' (success)
```

Debug info:
```
[DEBUG] main.cpp:500  - version: stable-diffusion.cpp version unknown, commit f957fa3
[DEBUG] main.cpp:501  - System Info:
    SSE3 = 1 |     AVX = 1 |     AVX2 = 1 |     AVX512 = 0 |     AVX512_VBMI = 0 |     AVX512_VNNI = 0 |     FMA = 1 |     NEON = 0 |     ARM_FMA = 0 |     F16C = 1 |     FP16_VA = 0 |     WASM_SIMD = 0 |     VSX = 0 |
[DEBUG] main.cpp:502  - SDCliParams {
  mode: img_gen,
  output_path: "output.png",
  verbose: true,
  color: false,
  canny_preprocess: false,
  convert_name: false,
  preview_method: none,
  preview_interval: 1,
  preview_path: "preview.png",
  preview_fps: 16,
  taesd_preview: false,
  preview_noisy: false
}
[DEBUG] main.cpp:503  - SDContextParams {
  n_threads: 6,
  model_path: "",
  clip_l_path: "",
  clip_g_path: "",
  clip_vision_path: "",
  t5xxl_path: "",
  llm_path: "models\text_encoders\Qwen3-4B-Instruct-2507-Q4_K_M.gguf",
  llm_vision_path: "",
  diffusion_model_path: "models\unet\z-image-Q3_K_M.gguf",
  high_noise_diffusion_model_path: "",
  vae_path: "models\vae\flux_vae.safetensors",
  taesd_path: "",
  esrgan_path: "",
  control_net_path: "",
  embedding_dir: "",
  embeddings: {
  }
  wtype: NONE,
  tensor_type_rules: "",
  lora_model_dir: ".",
  photo_maker_path: "",
  rng_type: cuda,
  sampler_rng_type: NONE,
  flow_shift: INF
  offload_params_to_cpu: true,
  enable_mmap: false,
  control_net_cpu: false,
  clip_on_cpu: false,
  vae_on_cpu: false,
  flash_attn: false,
  diffusion_flash_attn: true,
  diffusion_conv_direct: false,
  vae_conv_direct: false,
  circular: false,
  circular_x: false,
  circular_y: false,
  chroma_use_dit_mask: true,
  qwen_image_zero_cond_t: false,
  chroma_use_t5_mask: false,
  chroma_t5_mask_pad: 1,
  prediction: NONE,
  lora_apply_mode: auto,
  vae_tiling_params: { 0, 0, 0, 0.5, 0, 0 },
  force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:504  - SDGenerationParams {
  loras: "{
  }",
  high_noise_loras: "{
  }",
  prompt: "A cinematic, melancholic photograph of a solitary hooded figure walking through a sprawling, rain-slicked metropolis at night. The city lights are a chaotic blur of neon orange and cool blue, reflecting on the wet asphalt. The scene evokes a sense of being a single component in a vast machine. Superimposed over the image in a sleek, modern, slightly glitched font is the philosophical quote: 'THE CITY IS A CIRCUIT BOARD, AND I AM A BROKEN TRANSISTOR.' -- moody, atmospheric, profound, dark academic",
  negative_prompt: "",
  clip_skip: -1,
  width: 512,
  height: 1024,
  batch_count: 1,
  init_image_path: "",
  end_image_path: "",
  mask_image_path: "",
  control_image_path: "",
  ref_image_paths: [],
  control_video_path: "",
  auto_resize_ref_image: true,
  increase_ref_index: false,
  pm_id_images_dir: "",
  pm_id_embed_path: "",
  pm_style_strength: 20,
  skip_layers: [7, 8, 9],
  sample_params: (txt_cfg: 5.00, img_cfg: 5.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
  high_noise_skip_layers: [7, 8, 9],
  high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: 0.00, shifted_timestep: 0),
  custom_sigmas: [],
  cache_mode: "",
  cache_option: "",
  cache: disabled (threshold=1, start=0.15, end=0.95),
  moe_boundary: 0.875,
  video_frames: 1,
  fps: 16,
  vace_strength: 1,
  strength: 0.75,
  control_strength: 0.9,
  seed: 42,
  upscale_repeats: 1,
  upscale_tile_size: 128,
}
[DEBUG] stable-diffusion.cpp:166  - Using CUDA backend
[INFO ] ggml_extend.hpp:78   - ggml_cuda_init: found 1 CUDA devices:
[INFO ] ggml_extend.hpp:78   -   Device 0: NVIDIA GeForce GTX 1060, compute capability 6.1, VMM: yes
```

### Additional context / environment details

Model sources:

- https://huggingface.co/unsloth/Z-Image-GGUF/blob/main/z-image-Q3_K_M.gguf
- https://huggingface.co/unsloth/Qwen3-4B-Instruct-2507-GGUF/blob/main/Qwen3-4B-Instruct-2507-Q4_K_M.gguf
- https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/vae/diffusion_pytorch_model.safetensors

Nividia-smi:
```
Thu Feb  5 14:48:09 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 581.80                 Driver Version: 581.80         CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060      WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   43C    P8              2W /   78W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Z-Image-base quantized versions yield black pictures #1253

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Z-Image-base quantized versions yield black pictures #1253

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions