EGSTalker：基于3D高斯喷溅的实时人脸动画生成

cs.AI updates on arXiv.org 10月13日 12:11

EGSTalker：基于3D高斯喷溅的实时人脸动画生成

本文提出EGSTalker，一种基于3D高斯喷溅的实时音频驱动人脸动画生成框架。该框架仅需3-5分钟训练视频即可生成高质量面部动画，通过静态高斯初始化和音频驱动变形两个阶段，实现快速且高保真的动画效果。

arXiv:2510.08587v1 Announce Type: cross Abstract: This paper presents EGSTalker, a real-time audio-driven talking head generation framework based on 3D Gaussian Splatting (3DGS). Designed to enhance both speed and visual fidelity, EGSTalker requires only 3-5 minutes of training video to synthesize high-quality facial animations. The framework comprises two key stages: static Gaussian initialization and audio-driven deformation. In the first stage, a multi-resolution hash triplane and a Kolmogorov-Arnold Network (KAN) are used to extract spatial features and construct a compact 3D Gaussian representation. In the second stage, we propose an Efficient Spatial-Audio Attention (ESAA) module to fuse audio and spatial cues, while KAN predicts the corresponding Gaussian deformations. Extensive experiments demonstrate that EGSTalker achieves rendering quality and lip-sync accuracy comparable to state-of-the-art methods, while significantly outperforming them in inference speed. These results highlight EGSTalker's potential for real-time multimedia applications.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人脸动画 3D高斯喷溅实时生成音频驱动

相关文章

黄仁勋预言步入现实谷歌展示实时游戏生成AI模型GameNGen

字节AI版小李子一开口：黄风岭，八百里

Z Potentials｜独家专访李飞飞爱徒，斯坦福AI博士，a16z投资千万美元，AI视频月收入飞涨200%

RTX 4090可跑、完全开源，最快视频生成模型问世，实测一言难尽

HAC++: Revolutionizing 3D Gaussian Splatting Through Advanced Compression Techniques

理想汽车论文里出现了魔性康辉图片

真假难辨！阿里升级AI人像视频生成，表情动作直逼专业水准

MoEE：理想汽车的混合专家模型

鹅厂放大招，混元图像2.0「边说边画」：描述完，图也生成好了

免费可用！腾讯混元图像2.0正式发布，语音、打字实时出图，太快了！（附实测体验）