[UniLLaDA] Add UniLLaDA multimodal discrete diffusion pipeline by ChinChyi · Pull Request #13686 · huggingface/diffusers

ChinChyi · 2026-05-06T17:18:28Z

What does this PR do?

Adds support for LLaDA 2.0-Uni, a unified multimodal discrete diffusion language model that supports text understanding, image understanding, and image generation in a single framework.

Paper: LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

New Components

LLaDA2UniImageTransformer2DModel — Image diffusion transformer for decoding VQ tokens to images
UniLLaDaPipeline — Unified pipeline supporting three modes:
- Text-to-image generation
- Image understanding (VQA, captioning)
- Image editing
LLaDA2UniFlowMatchEulerScheduler — Flow matching scheduler with Euler ODE integration
Image tokenizer utilities — SigVQ-based image encoding/decoding

Key Features

Multimodal capabilities: Single model handles both vision and language tasks
Discrete diffusion: Block-wise iterative refinement for token generation
FP8 quantization support: Efficient inference with quantized weights
Flexible decoding: Supports both quality mode (50 steps) and turbo mode (8 steps)

Usage Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from diffusers import UniLLaDaPipeline, BlockRefinementScheduler
from diffusers.pipelines.unillada.image_tokenizer import ImageTokenizer

model_id = "inclusionAI/LLaDA2.0-Uni"
model = AutoModelForCausalLM.from_pretrained(
    model_id, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
scheduler = BlockRefinementScheduler()
image_tokenizer = ImageTokenizer(model_path=model_id)

pipe = UniLLaDaPipeline(
    transformer=model,
    tokenizer=tokenizer,
    scheduler=scheduler,
    image_tokenizer=image_tokenizer,
)

# Text-to-Image
result = pipe(prompt="A cat sitting on a windowsill at sunset")
result.images[0].save("output.png")

# Image Understanding
from PIL import Image
img = Image.open("photo.jpg")
result = pipe(image=img, question="Describe this image in detail.")
print(result.text)

# Image Editing
result = pipe(image=img, instruction="Change the background to a beach.")
result.images[0].save("edited.png")

Testing

Added unit tests in tests/pipelines/unillada/test_unillada.py
Tests cover all three modes (generation, understanding, editing)
Mock components for CI compatibility

Model Weights

Official weights available at: https://huggingface.co/inclusionAI/LLaDA2.0-Uni

Before submitting

Did you read the contributor guideline?
Did you read our philosophy doc?
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

@yiyixuxu @a-r-r-o-w @DN6

Add UniLLaDA pipeline supporting text-to-image, image understanding, and image editing via block-wise iterative discrete diffusion. New components: - UniLLaDaPipeline: main pipeline (DiffusionPipeline subclass) - LLaDA2UniImageTransformer2DModel: image transformer model - LLaDA2UniFlowMatchEulerScheduler: flow matching scheduler - ImageTokenizer: VQ image encoder helper - Documentation and tests

github-actions Bot added size/L PR with diff > 200 LOC documentation Improvements or additions to documentation models tests utils pipelines schedulers and removed size/L PR with diff > 200 LOC labels May 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UniLLaDA] Add UniLLaDA multimodal discrete diffusion pipeline#13686

[UniLLaDA] Add UniLLaDA multimodal discrete diffusion pipeline#13686
ChinChyi wants to merge 1 commit intohuggingface:mainfrom
ChinChyi:add-unillada-pipeline

ChinChyi commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChinChyi commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

New Components

Key Features

Usage Example

Testing

Model Weights

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChinChyi commented May 6, 2026 •

edited

Loading