MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising

Zhiqiang Xia *, Zhaokang Chen*, Bin Wu, Chao Li, Kwok-Wai Hung, Chao Zhan, Yingjie He, Wenjiang Zhou (*co-first author, Corresponding Author, benbinwu@tencent.com)

Github Huggingface HuggingfaceSpace Technical report (coming soon)

What is MuseV

MuseV is a diffusion-based virtual human video generation framework, which

  1. supports infinite length generation using a novel Visual Conditioned Parallel Denoising scheme.
  2. checkpoint available for virtual human video generation trained on human dataset.
  3. supports Image2Video, Text2Image2Video, Video2Video.
  4. compatible with the Stable Diffusion ecosystem, including base_model, lora, controlnet, etc.
  5. supports multi-reference image technology, including IPAdapter, ReferenceOnly, ReferenceNet, IPAdapterFaceID.
  6. training codes (coming very soon).

Overview of model structure

Pipeline

Parallel Denoising

Pipeline

Long Video Genereation

Source Video Output Video

Text2Video Genereation

image video prompt
(masterpiece, best quality, highres:1),(1boy, solo:1),(eye blinks:1.8),(head wave:1.3)
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3)
(masterpiece, best quality, highres:1), peaceful beautiful sea scene
(masterpiece, best quality, highres:1), peaceful beautiful sea scene
(masterpiece, best quality, highres:1), playing guitar
(masterpiece, best quality, highres:1), playing guitar
(masterpiece, best quality, highres:1), playing guitar
(masterpiece, best quality, highres:1), playing guitar
(masterpiece, best quality, highres:1),(1man, solo:1),(eye blinks:1.8),(head wave:1.3),Chinese ink painting style
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3)
(masterpiece, best quality, highres:1),(1man, solo:1),(eye blinks:1.8),(head wave:1.3)
(masterpiece, best quality, highres:1),(1man, solo:1),(eye blinks:1.8),(head wave:1.3), animate
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3)
(masterpiece, best quality, highres:1), peaceful beautiful waterfall, an endless waterfall
(masterpiece, best quality, highres:1), peaceful beautiful river
(masterpiece, best quality, highres:1), peaceful beautiful sea scene

Video2Video Genereation

Image Video
Image 3 Image 3