PutnamGAP数据集

An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems

cs.AI updates on arXiv.org 2025-08-13T04:14:51.000000Z