cs.AI updates on arXiv.org 09月11日
PianoVAM:多模态钢琴演奏数据集
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍PianoVAM,一个包含视频、音频、MIDI、手部关键点、指法标签和丰富元数据的全面钢琴演奏数据集。数据集采用Disklavier钢琴录制,收集业余钢琴家日常练习时的音频和MIDI,以及同步的顶视图视频,讨论了数据收集和不同模态之间的对齐问题,并展示了音频和音频-视觉钢琴转录的基准测试结果。

arXiv:2509.08800v1 Announce Type: cross Abstract: The multimodal nature of music performance has driven increasing interest in data beyond the audio domain within the music information retrieval (MIR) community. This paper introduces PianoVAM, a comprehensive piano performance dataset that includes videos, audio, MIDI, hand landmarks, fingering labels, and rich metadata. The dataset was recorded using a Disklavier piano, capturing audio and MIDI from amateur pianists during their daily practice sessions, alongside synchronized top-view videos in realistic and varied performance conditions. Hand landmarks and fingering labels were extracted using a pretrained hand pose estimation model and a semi-automated fingering annotation algorithm. We discuss the challenges encountered during data collection and the alignment process across different modalities. Additionally, we describe our fingering annotation method based on hand landmarks extracted from videos. Finally, we present benchmarking results for both audio-only and audio-visual piano transcription using the PianoVAM dataset and discuss additional potential applications.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

钢琴演奏 多模态数据集 音乐信息检索
相关文章