评估_Fishai

热点

"评估" 相关文章

FreeSliders: Training-Free, Modality-Agnostic Concept Sliders for Fine-Grained Diffusion Control in Images, Audio, and Video

cs.AI updates on arXiv.org 2025-11-05T05:18:58.000000Z

Webinar recap: Eval best practices

Braintrust Blog 2025-11-05T04:39:32.000000Z

Scalable Oversight via Partitioned Human Supervision

cs.AI updates on arXiv.org 2025-10-28T04:14:32.000000Z

PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading

cs.AI updates on arXiv.org 2025-10-28T04:14:09.000000Z

Harnessing the Power of Large Language Models for Software Testing Education: A Focus on ISTQB Syllabus

cs.AI updates on arXiv.org 2025-10-28T04:14:09.000000Z

Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence

cs.AI updates on arXiv.org 2025-10-28T04:11:10.000000Z

Learning "Partner-Aware" Collaborators in Multi-Party Collaboration

cs.AI updates on arXiv.org 2025-10-28T04:02:55.000000Z

List of lists of project ideas in AI Safety

少点错误 2025-10-27T08:42:17.000000Z

如何写好AI提示词？

掘金人工智能 2025-10-24T19:00:52.000000Z

Evaluating Latent Knowledge of Public Tabular Datasets in Large Language Models

cs.AI updates on arXiv.org 2025-10-24T04:27:00.000000Z

WebDevJudge: Evaluating (M)LLMs as Critiques for Web Development Quality

cs.AI updates on arXiv.org 2025-10-22T04:23:52.000000Z

Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning

cs.AI updates on arXiv.org 2025-10-22T04:17:01.000000Z

Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs

cs.AI updates on arXiv.org 2025-10-21T04:18:41.000000Z

DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios

cs.AI updates on arXiv.org 2025-10-20T04:14:11.000000Z

StressTransfer: Stress-Aware Speech-to-Speech Translation with Emphasis Preservation

cs.AI updates on arXiv.org 2025-10-16T04:26:14.000000Z

Scheming Ability in LLM-to-LLM Strategic Interactions

cs.AI updates on arXiv.org 2025-10-16T04:23:05.000000Z

Do Large Language Models Respect Contracts? Evaluating and Enforcing Contract-Adherence in Code Generation

cs.AI updates on arXiv.org 2025-10-15T04:34:54.000000Z

吴恩达Agentic AI新课：手把手教你搭建Agent工作流，GPT-3.5反杀GPT-4就顺手的事

量子位 2025-10-14T09:14:35.000000Z

How to Evaluate Your RAG Pipeline with Synthetic Data?

MarkTechPost@AI 2025-10-13T21:33:59.000000Z

Objective Features Extracted from Motor Activity Time Series for Food Addiction Analysis Using Machine Learning - A Pilot Study

cs.AI updates on arXiv.org 2025-10-10T04:20:58.000000Z

Copyright © 2019 FISHAI.All Rights Reserved