热点
关于我们
xx
xx
"
模型性能
" 相关文章
Towards Robust Mathematical Reasoning
cs.AI updates on arXiv.org
2025-11-05T05:31:07.000000Z
全注意力、复杂推理不掉速:MiniMax M2把Agent做成了「可执行能力」
PaperWeekly
2025-11-04T11:45:34.000000Z
Not All Instances Are Equally Valuable: Towards Influence-Weighted Dataset Distillation
cs.AI updates on arXiv.org
2025-11-03T05:19:25.000000Z
Interaction as Intelligence Part II: Asynchronous Human-Agent Rollout for Long-Horizon Task Training
cs.AI updates on arXiv.org
2025-11-03T05:18:31.000000Z
Lean4Physics: Comprehensive Reasoning Framework for College-level Physics in Lean4
cs.AI updates on arXiv.org
2025-10-31T04:01:55.000000Z
Measuring what matters: How offline evaluation of GitHub MCP Server works
The GitHub Blog
2025-10-30T22:00:37.000000Z
New tools in Google AI Studio to explore, debug and share logs
Google AI News
2025-10-30T17:20:03.000000Z
Accelerate Scaling of LLM Finetuning via Quantifying the Coverage and Depth of Instruction Set
cs.AI updates on arXiv.org
2025-10-29T04:32:17.000000Z
[程序员] 一道三年级数学题把大模型难住了,不是说数学推理都很厉害吗
V2EX
2025-10-29T03:59:33.000000Z
Meta新研究Free Transformer:仅增加3%成本,让大模型生成更多样、更高能的回答
我爱计算机视觉
2025-10-24T09:14:31.000000Z
SAID: Empowering Large Language Models with Self-Activating Internal Defense
cs.AI updates on arXiv.org
2025-10-24T04:22:55.000000Z
CreativityPrism: A Holistic Benchmark for Large Language Model Creativity
cs.AI updates on arXiv.org
2025-10-24T04:22:33.000000Z
The Lock-In Phase Hypothesis: Identity Consolidation as a Precursor to AGI
cs.AI updates on arXiv.org
2025-10-24T04:17:19.000000Z
Beyond Reactivity: Measuring Proactive Problem Solving in LLM Agents
cs.AI updates on arXiv.org
2025-10-23T04:11:32.000000Z
阿里通义 Qwen3-VL 新增 2B、32B 两个密集模型尺寸,手机也能跑
IT之家
2025-10-22T06:11:36.000000Z
WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality
cs.AI updates on arXiv.org
2025-10-22T04:23:52.000000Z
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
cs.AI updates on arXiv.org
2025-10-21T04:28:32.000000Z
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
cs.AI updates on arXiv.org
2025-10-21T04:28:24.000000Z
The Road Less Traveled: Enhancing Exploration in LLMs via Sequential Sampling
cs.AI updates on arXiv.org
2025-10-20T04:14:12.000000Z
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
cs.AI updates on arXiv.org
2025-10-20T04:12:32.000000Z