Conversation
|
thanks for the PR! |
|
KVAEs are autoencoders with strong reconstruction quality and a well-structured latent space, developed by our team. They weren't used in Kandinsky 5, as they were created a bit later, so this is mainly for future research. They can be used as a VAE backbone for developing new diffusion models. |
|
Hi @yiyixuxu ! |
|
Hi @asomoza ! |
|
Hi @leffff @ddavidchick we don't have the bandwidth right now specially for a model that's not used right now, please have a little more patience for the review and comments, we will address this when we have the bandwith. |
yiyixuxu
left a comment
There was a problem hiding this comment.
thanks, I left some feedbacks
@leffff @ddavidchick Are the new models going to be unified audio-video generation models or just video only models? |
|
Thanks, @yiyixuxu. I’ve updated the code based on your feedback. Please take a look and let me know if it’s ready to merge. |
asomoza
left a comment
There was a problem hiding this comment.
thanks, left some comments, we should also add some tests like the other vaes
|
@asomoza, thanks! I've addressed all the comments and added tests. Let me know if anything else needs updating. |
| z_channels: int = 16, | ||
| double_z: bool = True, | ||
| ch_mult: Tuple[int, ...] = (1, 2, 4, 8), | ||
| bottleneck: Optional[nn.Module] = None, |
There was a problem hiding this comment.
this will break the save_pretrained if used, since it seems that it's not used, maybe remove it? from here and the docstring.
|
@bot /style |
|
Style fix is beginning .... View the workflow run here. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
yiyixuxu
left a comment
There was a problem hiding this comment.
thanks, I left some comments
I think we can merge this soon!
| temb_channels: int = 512, | ||
| zq_ch: Optional[int] = None, | ||
| add_conv: bool = False, | ||
| normalization: nn.Module = nn.GroupNorm, |
There was a problem hiding this comment.
| normalization: nn.Module = nn.GroupNorm, |
| self.nonlinearity = get_activation(act_fn) | ||
|
|
||
| if zq_ch is None: | ||
| self.norm1 = normalization(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True) |
There was a problem hiding this comment.
| self.norm1 = normalization(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True) | |
| self.norm1 = nn.GroupNorm(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True) |
| if zq_ch is None: | ||
| self.norm1 = normalization(num_channels=in_channels, num_groups=32, eps=1e-6, affine=True) | ||
| else: | ||
| self.norm1 = normalization(in_channels, zq_channels=zq_ch, add_conv=add_conv) |
There was a problem hiding this comment.
| self.norm1 = normalization(in_channels, zq_channels=zq_ch, add_conv=add_conv) | |
| self.norm1 = nn.GroupNorm(in_channels, zq_channels=zq_ch, add_conv=add_conv) |
| if temb_channels > 0: | ||
| self.temb_proj = torch.nn.Linear(temb_channels, out_channels) | ||
| if zq_ch is None: | ||
| self.norm2 = normalization(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True) |
There was a problem hiding this comment.
| self.norm2 = normalization(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True) | |
| self.norm2 = nn.GroupNorm(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True) |
| if zq_ch is None: | ||
| self.norm2 = normalization(num_channels=out_channels, num_groups=32, eps=1e-6, affine=True) | ||
| else: | ||
| self.norm2 = normalization(out_channels, zq_channels=zq_ch, add_conv=add_conv) |
There was a problem hiding this comment.
| self.norm2 = normalization(out_channels, zq_channels=zq_ch, add_conv=add_conv) | |
| self.norm2 = nn.GroupNorm(out_channels, zq_channels=zq_ch, add_conv=add_conv) |
What does this PR do?
Add KVAE1.0 from Kandinsky team. git repo.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu @asomoza