From edc93828e86b5c59b864b32801f981288474548e Mon Sep 17 00:00:00 2001 From: SII-Tian-yi Liang <41582525+tianyilt@users.noreply.github.com> Date: Tue, 10 Feb 2026 17:13:33 +0800 Subject: [PATCH] Add citation section to README Added citation section with reference details for MOVA. --- README.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/README.md b/README.md index f379e5d..dc39444 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,7 @@ We introduce **MOVA** (**MO**SS **V**ideo and **A**udio), a foundation model des - **Asymmetric Dual-Tower Architecture**: Leverages the power of pre-trained video and audio towers, fused via a bidirectional cross-attention mechanism for rich modality interaction. ## 🔥News!!! +- 2026/02/10: 🎉We released **MOVA** [technical report](https://arxiv.org/abs/2602.08794) and update [inference workflow](https://github.com/OpenMOSS/MOVA/pull/29). - 2026/01/29: 🎉We released **MOVA**, an open-source foundation model for high-fidelity synchronized video–audio generation!!! ## 🎬Demo @@ -285,6 +286,23 @@ All peak usage numbers below are measured on **360p, 8-second** video training s - [x] Technical Report - [ ] Diffusers Integration +## Citation +If you find our work helpful, please cite us. +``` +@misc{openmoss_mova_2026, + title = {MOVA: Towards Scalable and Synchronized Video-Audio Generation}, + author = {{SII-OpenMOSS Team} and Donghua Yu and Mingshu Chen and Qi Chen and Qi Luo and Qianyi Wu and Qinyuan Cheng and Ruixiao Li and Tianyi Liang and Wenbo Zhang and Wenming Tu and Xiangyu Peng and Yang Gao and Yanru Huo and Ying Zhu and Yinze Luo and Yiyang Zhang and Yuerong Song and Zhe Xu and Zhiyu Zhang and Chenchen Yang and Cheng Chang and Chushu Zhou and Hanfu Chen and Hongnan Ma and Jiaxi Li and Jingqi Tong and Junxi Liu and Ke Chen and Shimin Li and Songlin Wang and Wei Jiang and Zhaoye Fei and Zhiyuan Ning and Chunguo Li and Chenhui Li and Ziwei He and Zengfeng Huang and Xie Chen and Xipeng Qiu}, + year = {2026}, + month = feb, + eprint = {2602.08794}, + archivePrefix = {arXiv}, + primaryClass = {cs.CV}, + doi = {10.48550/arXiv.2602.08794}, + url = {https://arxiv.org/abs/2602.08794}, + note = {Technical report. Corresponding authors: Xie Chen and Xipeng Qiu. Project leaders: Qinyuan Cheng and Tianyi Liang.} +} +``` + ## Acknowledgement We would like to thank the contributors to [Wan](https://github.com/Wan-Video/Wan2.2), [SGLang](https://github.com/sgl-project/sglang), [diffusers](https://huggingface.co/docs/diffusers/en/index), [HuggingFace](https://huggingface.co/), [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio), and [HunyuanVideo-Foley](https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley) for their great open-source work, which is helpful to this project.