Poster

EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation

Jiajian Xie ⋅ Shengyu Zhang ⋅ Mengze Li ⋅ chengfei lv ⋅ Zhou Zhao ⋅ Fei Wu

2025 Poster

Project Page [ Poster] [ OpenReview]

Abstract

Speech-driven 3D facial animation has attracted significant attention due to its wide range of applications in animation production and virtual reality. Recent research has explored speech-emotion disentanglement to enhance facial expressions rather than manually assigning emotions. However, this approach face issues such as feature confusion, emotions weakening and mean-face. To address these issues, we present EcoFace, a framework that (1) proposes a novel collaboration objective to provide a explicit signal for emotion representation learning from the speaker's expressive movements and produced sounds, constructing an audio-visual joint and coordinated emotion space that is independent of speech content. (2) constructs a universal facial motion distribution space determined by speech features and implement speaker-specific generation. Extensive experiments show that our method achieves more generalized and emotionally realistic talking face generation compared to previous methods.

Video

Chat is not available.