Synthetic Video Enhances Physical Fidelity in Video Synthesis

Qi Zhao1 Xingyu Ni1, 2 Ziyu Wang1, 3 Feng Cheng1 Ziyan Yang1
Lu Jiang1, * Bohan Wang4, *
1ByteDance Seed    2Peking University    3ShanghaiTech University    4National University of Singapore
*Corresponding Authors   
arXiv Paper Dataset(Uploading) GitHub(Coming Soon)

Abstract

We investigate how to enhance the physical fidelity of video generation models by leveraging synthetic videos generated via standard computer graphics techniques. These rendered videos respect real-world physics -- such as maintaining 3D consistency -- thereby serving as a valuable resource that can potentially improve video generation models. To harness this potential, we propose a solution that curates and integrates synthetic data while introducing a method to transfer its physical realism to the model, minimizing unwanted artifacts. Through experiments on three representative tasks emphasizing physical consistency, we demonstrate its effectiveness in enhancing physical fidelity. While our model still lacks a deep understanding of physics, our work offers one of the first empirical demonstrations that synthetic video enhances physical fidelity in video synthesis.

Synthetic Video Data from CGI

Our synthetic video data generation pipeline first plans the scene layout with 3D assets, characters, animations, background, and camera motion. Then we render the videos with graphics engines to produce videos that targetly enhance physics of the video generation models.

Method Overview

Method Pipeline

We introduce a novel synthetic data creation approach that harnesses morden computer-generated imagery (CGI) production pipeline to improve the physical fidelity of video generation models. It consists of three key components:

  1. A synthetic data curation and generation process that creates physically accurate synthetic videos to augment the real-world video dataset.
  2. A compositional caption generation module that generates fine-grained text descriptions for the synthetic videos.
  3. Mixed-training with real-world videos and SimDrop to transfer the physics properties of the synthetic videos while preserving photorealism.

We showed that our method greatly improves the physics respecting of generated videos in three challenging video generation tasks.

Video Comparisons of Physics Respecting Generations

3D Reconstruction

Our synthetic-data-tuned generation model delivers videos with significantly enhanced 3D consistency. With more reference points found, we can achieve significantly more accurate and detailed 3D object reconstructions.

BibTex

If you find this work useful, please cite it as follows:

@article{zhao2025synthetic,
  title={Synthetic Video Enhances Physical Fidelity in Video Synthesis},
  author={Zhao, Qi and Ni, Xingyu and Wang, Ziyu and Cheng, Feng and Yang, Ziyan and Jiang, Lu and Wang, Bohan},
  journal={arXiv preprint arXiv:2503.20822},
  year={2025}
}