Skip to content

[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing their environment.

License

Notifications You must be signed in to change notification settings

ByteDance-Seed/VideoWorld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 

Repository files navigation

VideoWorld Series: Learning World Models from Unlabeled Videos

This repository hosts the VideoWorld research project series, exploring how deep generative models can learn complex world knowledge, physics, and dynamics solely from visual inputs.

This project encompasses two main iterations:

  • VideoWorld (CVPR 2025): The first generation model using Latent Dynamics Model (LDM) for knowledge acquisition.
  • VideoWorld 2: The second generation focusing on transferable knowledge using disentangled Latent Dynamics Model (dLDM).

📂 Projects Overview

Accepted by CVPR 2025

Highlights:

  • demonstrates that video generation models can learn complex rules (e.g., Go game) without reward signals.
  • Introduces the Latent Dynamics Model (LDM) to compress visual changes into informative latent codes.
  • Achieves 5-dan professional level in Go and strong performance in robotic control tasks (CALVIN).

Highlights:

  • Focuses on transferable knowledge and long-horizon tasks in real-world settings.
  • Proposes the disentangled Latent Dynamics Model (dLDM) to decouple action dynamics from visual appearance.
  • Significant improvements in task success rates (up to 70%) on challenging handcraft benchmarks.

🚀 Getting Started

Since the two projects were developed with slightly different dependencies to maintain reproducibility, we recommend using separate environments for each.

Prerequisites

  • Anaconda or Miniconda
  • NVIDIA GPU with CUDA support

Setup for VideoWorld

Please refer to VideoWorld/README.md for detailed instructions.

conda create -n videoworld python=3.10 -y
conda activate videoworld
# Detailed install steps inside the sub-folder

Setup for VideoWorld 2

Please refer to VideoWorld2/README.md for detailed instructions.

conda create -n videoworld2 python=3.11 -y
conda activate videoworld2
# Detailed install steps inside the sub-folder

📚 Citation

If you use this codebase or models in your research, please cite our papers:

VideoWorld (CVPR 2025)

@article{ren2025videoworld,
  title={VideoWorld: Exploring Knowledge Learning from Unlabeled Videos},
  author={Ren, Zhongwei and Wei, Yunchao and Guo, Xun and Zhao, Yao and Kang, Bingyi and Feng, Jiashi and Jin, Xiaojie},
  journal={arXiv preprint arXiv:2501.09781},
  year={2025}
}

VideoWorld 2

@article{ren2025videoworld2,
  title={VideoWorld 2: Learning Transferable Knowledge from Real-world Video},
  author={Ren, Zhongwei and Wei, Yunchao and Yu, Xiao and Luo, Guixun and Zhao, Yao and Feng, Jiashi and Jin, Xiaojie},
  journal={arXiv preprint},
  year={2025}
}

📄 License

This project is released under the LICENSE terms.

About

[CVPR 2025] VideoWorld is a simple generative model that learns purely from unlabeled videos—much like how babies learn by observing their environment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages