声音识别新框架：提升音频分类准确性

cs.AI updates on arXiv.org 09月25日

声音识别新框架：提升音频分类准确性

本文提出了一种神经模型框架，使其在听日常声音的同时进行“思考”，从而提升音频分类性能。通过利用大语言模型推理能力的发展，本文旨在解决两个核心问题：如何将思考融入现有音频分类流程以提升性能，以及是否可以设计一种全新架构以支持思考和测试时缩放。实验表明，该模型在两种场景下均展现了提高的分类准确度。

arXiv:2509.19676v1 Announce Type: cross Abstract: We propose a framework that enables neural models to "think while listening" to everyday sounds, thereby enhancing audio classification performance. Motivated by recent advances in the reasoning capabilities of large language models, we address two central questions: (i) how can thinking be incorporated into existing audio classification pipelines to enable reasoning in the category space and improve performance, and (ii) can a new architecture be designed from the ground up to support both thinking and test-time scaling? We demonstrate that in both settings, our models exhibit improved classification accuracy. Leveraging test-time scaling, we observe consistent gains as the number of sampled traces increases. Furthermore, we evaluate two open-source reasoning models, GPT-OSS-20B and Qwen3-14B, showing that while such models are capable of zero-shot reasoning, a lightweight approach--retraining only the embedding matrix of a frozen, smaller model like GPT-2--can surpass the performance of billion-parameter text-based reasoning models.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

音频分类神经模型思考能力测试时缩放

相关文章

Taming Long Audio Sequences: Audio Mamba Achieves Transformer-Level Performance Without Self-Attention

AI又一突破！用AI理解AI，MIT推出多模态自动可解释智能体MAIA

谷歌推世界首个AI游戏引擎，2000亿游戏产业恐颠覆，0代码生成游戏，老黄预言成真

周鸿祎警告：未来社会分化这种人会沦为AI的奴隶

关于大模型语料的迷思

一定要把一天中最好的时间【卖给自己】最近看了一篇查理芒格老爷子的旧采访，依然收获颇丰。他虽然在国内没有股神巴菲特那么有名，却被后者奉为挚友甚至人生导...

张颖演讲读后感 | Findme

Is using an LLM for creative writing considered "cheating"?

C'mon guys, Deliberate Practice is Real

Can 1B LLM Surpass 405B LLM? Optimizing Computation for Small LLMs to Outperform Larger Models