Add KVAE 1.0 by ddavidchick · Pull Request #13033 · huggingface/diffusers

ddavidchick · 2026-01-26T23:18:51Z

What does this PR do?

Add KVAE1.0 from Kandinsky team. git repo.

Before submitting

[-] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[+] Did you read the contributor guideline?
[+] Did you read our philosophy doc (important for complex PRs)?
[-] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
[+] Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[-] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu @asomoza

yiyixuxu · 2026-01-27T22:19:26Z

thanks for the PR!
Can you explain what is this used for/with? is it mainly for future research, or can we use it with an existing Kandinsky 5 pipeline we support?

ddavidchick · 2026-01-28T00:12:44Z

KVAEs are autoencoders with strong reconstruction quality and a well-structured latent space, developed by our team. They weren't used in Kandinsky 5, as they were created a bit later, so this is mainly for future research. They can be used as a VAE backbone for developing new diffusion models.

leffff · 2026-02-09T11:45:26Z

Hi @yiyixuxu !
What are our next steps in merging this new VAE into Diffusers? We would love to integrate our latest VAE!
We are very excited, because this is the new SOTA VAE our team has developed. For now it can be used as is. For training new Diffusion models. Later we expect our next models to use this VAE.

leffff · 2026-02-10T12:40:38Z

Hi @asomoza !
What are our next steps in merging this new VAE into Diffusers?

ddavidchick · 2026-02-16T11:01:23Z

Hi @yiyixuxu @asomoza !
just wanted to follow up on this PR. Please let us know if you've had a chance to look at it or if there’s any more information we can provide to help with the review. Thanks!

asomoza · 2026-02-16T13:49:47Z

Hi @leffff @ddavidchick we don't have the bandwidth right now specially for a model that's not used right now, please have a little more patience for the review and comments, we will address this when we have the bandwith.

ddavidchick · 2026-03-05T12:39:56Z

Hi @yiyixuxu @asomoza !
Can you provide any updates? We're about to release new models, which are dependent on KVAE.

yiyixuxu

thanks, I left some feedbacks

src/diffusers/models/autoencoders/autoencoder_kl_kvae_video.py

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

src/diffusers/models/autoencoders/autoencoder_kl_kvae_video.py

MeiYi-dev · 2026-03-06T06:33:41Z

Hi @yiyixuxu @asomoza ! Can you provide any updates? We're about to release new models, which are dependent on KVAE.

@leffff @ddavidchick Are the new models going to be unified audio-video generation models or just video only models?

…add-kvae-1.0

ddavidchick · 2026-03-11T23:05:18Z

Thanks, @yiyixuxu. I’ve updated the code based on your feedback. Please take a look and let me know if it’s ready to merge.

asomoza

thanks, left some comments, we should also add some tests like the other vaes

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

docs/source/en/api/models/autoencoder_kl_kvae.md

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

src/diffusers/models/autoencoders/autoencoder_kl_kvae_video.py

docs/source/en/api/models/autoencoder_kl_kvae.md

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

ddavidchick · 2026-03-20T16:50:22Z

@asomoza, thanks! I've addressed all the comments and added tests. Let me know if anything else needs updating.

asomoza

just one small comment, otherwise looks good to me. I'll let @yiyixuxu do the final review and merge

asomoza · 2026-03-20T20:14:34Z

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

+        z_channels: int = 16,
+        double_z: bool = True,
+        ch_mult: Tuple[int, ...] = (1, 2, 4, 8),
+        bottleneck: Optional[nn.Module] = None,


this will break the save_pretrained if used, since it seems that it's not used, maybe remove it? from here and the docstring.

asomoza · 2026-03-20T20:20:22Z

@bot /style

github-actions · 2026-03-20T20:20:47Z

Style fix is beginning .... View the workflow run here.

HuggingFaceDocBuilderDev · 2026-03-20T21:00:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu

thanks, I left some comments
I think we can merge this soon!

yiyixuxu · 2026-03-20T21:53:50Z

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

+        temb_channels: int = 512,
+        zq_ch: Optional[int] = None,
+        add_conv: bool = False,
+        normalization: nn.Module = nn.GroupNorm,


Suggested change

normalization: nn.Module = nn.GroupNorm,

yiyixuxu · 2026-03-20T21:55:13Z

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

+        self.nonlinearity = get_activation(act_fn)
+
+        if zq_ch is None:
+            self.norm1 = normalization(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True)


Suggested change

self.norm1 = normalization(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True)

self.norm1 = nn.GroupNorm(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True)

yiyixuxu · 2026-03-20T21:55:29Z

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

+        if zq_ch is None:
+            self.norm1 = normalization(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True)
+        else:
+            self.norm1 = normalization(in_channels, zq_channels=zq_ch, add_conv=add_conv)


Suggested change

self.norm1 = normalization(in_channels, zq_channels=zq_ch, add_conv=add_conv)

self.norm1 = nn.GroupNorm(in_channels, zq_channels=zq_ch, add_conv=add_conv)

yiyixuxu · 2026-03-20T21:55:45Z

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

+        if temb_channels > 0:
+            self.temb_proj = torch.nn.Linear(temb_channels, out_channels)
+        if zq_ch is None:
+            self.norm2 = normalization(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True)


Suggested change

self.norm2 = normalization(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True)

self.norm2 = nn.GroupNorm(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True)

yiyixuxu · 2026-03-20T21:55:53Z

src/diffusers/models/autoencoders/autoencoder_kl_kvae.py

+        if zq_ch is None:
+            self.norm2 = normalization(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True)
+        else:
+            self.norm2 = normalization(out_channels, zq_channels=zq_ch, add_conv=add_conv)


Suggested change

self.norm2 = normalization(out_channels, zq_channels=zq_ch, add_conv=add_conv)

self.norm2 = nn.GroupNorm(out_channels, zq_channels=zq_ch, add_conv=add_conv)

ddavidchick added 5 commits January 22, 2026 14:05

add kvae2d

53b4fd9

add kvae3d video

b72f733

add docs for kvae2d and kvae3d video

4799ea6

style fixes

6b6d7d7

fix kvae3d docs

9d52d68

yiyixuxu reviewed Mar 6, 2026

View reviewed changes

ddavidchick added 4 commits March 11, 2026 13:22

Merge branch 'main' of https://github.com/huggingface/diffusers into …

0afb7ce

…add-kvae-1.0

fix normalzation

19f4206

fix kvae video for code style

f988d23

fix kvae video

b56b1bd

asomoza reviewed Mar 12, 2026

View reviewed changes

ddavidchick added 4 commits March 19, 2026 12:33

kvae minor fixes

e5b7991

add gradient ckpting for kvaes

6aae20c

get rid of inplace ops kvae video

e021f3f

add tests for KVAEs

cc31c77

asomoza approved these changes Mar 20, 2026

View reviewed changes

yiyixuxu reviewed Mar 20, 2026

View reviewed changes

	self.norm1 = normalization(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True)
	self.norm1 = nn.GroupNorm(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True)

	self.norm1 = normalization(in_channels, zq_channels=zq_ch, add_conv=add_conv)
	self.norm1 = nn.GroupNorm(in_channels, zq_channels=zq_ch, add_conv=add_conv)

	self.norm2 = normalization(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True)
	self.norm2 = nn.GroupNorm(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True)

	self.norm2 = normalization(out_channels, zq_channels=zq_ch, add_conv=add_conv)
	self.norm2 = nn.GroupNorm(out_channels, zq_channels=zq_ch, add_conv=add_conv)

Conversation

ddavidchick commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

yiyixuxu commented Jan 27, 2026

Uh oh!

ddavidchick commented Jan 28, 2026

Uh oh!

leffff commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leffff commented Feb 10, 2026

Uh oh!

ddavidchick commented Feb 16, 2026

Uh oh!

asomoza commented Feb 16, 2026

Uh oh!

ddavidchick commented Mar 5, 2026

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MeiYi-dev commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ddavidchick commented Mar 11, 2026

Uh oh!

asomoza left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ddavidchick commented Mar 20, 2026

Uh oh!

asomoza left a comment

Choose a reason for hiding this comment

Uh oh!

asomoza Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

asomoza commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 20, 2026

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

yiyixuxu Mar 20, 2026

Choose a reason for hiding this comment

ddavidchick commented Jan 26, 2026 •

edited

Loading

leffff commented Feb 9, 2026 •

edited

Loading

MeiYi-dev commented Mar 6, 2026 •

edited

Loading