Neural Video Portrait Relighting in Real-time via Consistency Modeling

Longwen Zhang^1,2 Qixuan Zhang^1,2 Minye Wu^1,3 Jingyi Yu¹ Lan Xu¹

¹ShanghaiTech University ²Deemos Technology ³University of Chinese Academy of Sciences

Abstract

Video portraits relighting is critical in user-facing human photography, especially for immersive VR/AR experience. Recent advances still fail to recover consistent relit result under dynamic illuminations from monocular RGB stream, suffering from the lack of video consistency supervision. In this paper, we propose a neural approach for real-time, high-quality and coherent video portrait relighting, which jointly models the semantic, temporal and lighting consistency using a new dynamic OLAT dataset. We propose a hybrid structure and lighting disentanglement in an encoder-decoder architecture, which combines a multi-task and adversarial training strategy for semantic-aware consistency modeling. We adopt a temporal modeling scheme via flow-based supervision to encode the conjugated temporal consistency in a cross manner. We also propose a lighting sampling strategy to model the illumination consistency and mutation for natural portrait light manipulation in real-world. Extensive experiments demonstrate the effectiveness of our approach for consistent video portrait light-editing and relighting, even using mobile computing.

Pipeline

The training pipeline of our approach. It consists of a structure and lighting disentanglement (Sec. 4.1), a temporal consistencymodeling (Sec. 4.2) and a lighting sampling (Sec. 4.3), so as to generate consistent video relit results from a RGB stream in real-time.

Gallery

Our relighting results under dynamic illuminations. Each triplet includes the input frame and two relit result examples.

Results

YouTube video

Dataset

Apply for Dynamic OLAT Dataset

Code

We will publish the code and data for training [ DOWNLOAD HERE ] (coming soon)

Downloads

Paper (thecvf)
link

arXiv
link

Citation

@InProceedings{Zhang_2021_ICCV,
    author    = {Zhang, Longwen and Zhang, Qixuan and Wu, Minye and Yu, Jingyi and Xu, Lan},
    title     = {Neural Video Portrait Relighting in Real-Time via Consistency Modeling},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {802-812}
}

Acknowledgments

The authors would like to thank all participants of the Light Stage recordings. We also thank the authors of Wang et. al. [2020] for providing the results of their method for comparisons.

Contact

Longwen Zhang
zhanglw2@shanghaitech.edu.cn

Qixuan Zhang
zhangqx1@shanghaitech.edu.cn