基于RGBD图像的AI眼动追踪系统研究

cs.AI updates on arXiv.org 10月09日

基于RGBD图像的AI眼动追踪系统研究

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

本论文介绍了一种利用RGBD图像（包含彩色和深度信息）实现的AI眼动追踪系统。研究采用了基于Transformer架构的模块来融合图像特征，这是对RGBD图像与Transformer结合的首次探索。为满足眼动角度估计的需求，研究者创建了一个新数据集，因为现有数据集要么缺乏深度信息，要么仅提供不适用于眼动角度估计的眼动点估计标签。论文中，多种模型配置在三个不同数据集上进行了训练、验证和评估。研究对比了不同模型架构的性能，发现在去除预训练GAN模块后，模型在ShanghaiTechGaze+数据集上的平均欧氏误差显著降低，并且使用MLP替换Transformer模块进一步提升了精度。在ETH-XGaze数据集上的实验结果也显示了类似趋势，尽管与现有研究的特定模型相比仍有改进空间。

💡 **AI眼动追踪系统构建**：本研究聚焦于开发一个利用RGBD图像（彩色与深度信息）的AI眼动追踪系统，旨在实现对用户眼球运动方向的精确估计。

🚀 **创新性特征融合方法**：论文采用了基于Transformer架构的模块来融合从RGBD图像中提取的特征，这是前沿技术在眼动追踪领域的首次结合尝试，以期提升追踪的准确性。

📊 **新数据集的创建与模型评估**：为了解决现有数据集在深度信息和标签适用性上的不足，研究者构建了一个新的数据集，并在该数据集及另外两个现有数据集上对多种模型配置进行了严格的训练、验证和评估，以全面考察模型性能。

📉 **模型架构优化与性能提升**：通过实验对比，研究发现在去除预训练的GAN模块后，模型的平均欧氏误差显著降低；进一步将Transformer模块替换为MLP（多层感知机）后，误差进一步减小，显示出模型架构优化的有效性。

🎯 **与现有研究的比较与分析**：论文将所提出的模型在不同数据集上的性能与先前研究（如Lian et al.和Zhang et al.）的成果进行了对比，虽然在某些指标上存在差距，但研究为后续改进提供了坚实的基础和方向。

arXiv:2510.06298v1 Announce Type: cross Abstract: Subject of this thesis is the implementation of an AI-based Gaze Tracking system using RGBD images that contain both color (RGB) and depth (D) information. To fuse the features extracted from the images, a module based on the Transformer architecture is used. The combination of RGBD input images and Transformers was chosen because it has not yet been investigated. Furthermore, a new dataset is created for training the AI models as existing datasets either do not contain depth information or only contain labels for Gaze Point Estimation that are not suitable for the task of Gaze Angle Estimation. Various model configurations are trained, validated and evaluated on a total of three different datasets. The trained models are then to be used in a real-time pipeline to estimate the gaze direction and thus the gaze point of a person in front of a computer screen. The AI model architecture used in this thesis is based on an earlier work by Lian et al. It uses a Generative Adversarial Network (GAN) to simultaneously remove depth map artifacts and extract head pose features. Lian et al. achieve a mean Euclidean error of 38.7mm on their own dataset ShanghaiTechGaze+. In this thesis, a model architecture with a Transformer module for feature fusion achieves a mean Euclidean error of 55.3mm on the same dataset, but we show that using no pre-trained GAN module leads to a mean Euclidean error of 30.1mm. Replacing the Transformer module with a Multilayer Perceptron (MLP) improves the error to 26.9mm. These results are coherent with the ones on the other two datasets. On the ETH-XGaze dataset, the model with Transformer module achieves a mean angular error of 3.59{\deg} and without Transformer module 3.26{\deg}, whereas the fundamentally different model architecture used by the dataset authors Zhang et al. achieves a mean angular error of 2.04{\deg}. On the OTH-Gaze-Estimation dataset created for...

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签