基准平台_Fishai

热点

"基准平台" 相关文章

InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research

cs.AI updates on arXiv.org 2025-11-03T05:18:27.000000Z

PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature

cs.AI updates on arXiv.org 2025-10-14T04:09:48.000000Z

SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation On Diverse Modalities

cs.AI updates on arXiv.org 2025-10-08T04:15:26.000000Z

TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

cs.AI updates on arXiv.org 2025-08-19T04:21:11.000000Z

BALSAM: A Platform for Benchmarking Arabic Large Language Models

cs.AI updates on arXiv.org 2025-07-31T04:48:13.000000Z

PhysGym: Benchmarking LLMs in Interactive Physics Discovery with Controlled Priors

cs.AI updates on arXiv.org 2025-07-22T04:34:13.000000Z

GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning

cs.AI updates on arXiv.org 2025-07-08T04:33:41.000000Z

Copyright © 2019 FISHAI.All Rights Reserved