评估指标_Fishai

热点

"评估指标" 相关文章

Driving scenario generation and evaluation using a structured layer representation and foundational models

cs.AI updates on arXiv.org 2025-11-05T05:30:38.000000Z

SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning

cs.AI updates on arXiv.org 2025-10-31T04:08:08.000000Z

The Quest for Reliable Metrics of Responsible AI

cs.AI updates on arXiv.org 2025-10-31T04:05:01.000000Z

ClaimGen-CN: A Large-scale Chinese Dataset for Legal Claim Generation

cs.AI updates on arXiv.org 2025-10-28T04:14:40.000000Z

PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

cs.AI updates on arXiv.org 2025-10-23T04:15:20.000000Z

PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

cs.AI updates on arXiv.org 2025-10-23T04:15:20.000000Z

Invoice Information Extraction: Methods and Performance Evaluation

cs.AI updates on arXiv.org 2025-10-20T04:09:45.000000Z

Invoice Information Extraction: Methods and Performance Evaluation

cs.AI updates on arXiv.org 2025-10-20T04:09:45.000000Z

On the Design and Evaluation of Human-centered Explainable AI Systems: A Systematic Review and Taxonomy

cs.AI updates on arXiv.org 2025-10-15T04:39:34.000000Z

What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

cs.AI updates on arXiv.org 2025-10-13T04:09:00.000000Z

What Is Your Agent's GPA? A Framework for Evaluating Agent Goal-Plan-Action Alignment

cs.AI updates on arXiv.org 2025-10-13T04:09:00.000000Z

ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models

cs.AI updates on arXiv.org 2025-10-08T04:07:25.000000Z

WebRenderBench: Enhancing Web Interface Generation through Layout-Style Consistency and Reinforcement Learning

cs.AI updates on arXiv.org 2025-10-07T04:07:46.000000Z

Reward Models are Metrics in a Trench Coat

cs.AI updates on arXiv.org 2025-10-06T04:28:23.000000Z

Reward Models are Metrics in a Trench Coat

cs.AI updates on arXiv.org 2025-10-06T04:28:23.000000Z

Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025

cs.AI updates on arXiv.org 2025-10-03T04:18:38.000000Z

FINCH: Financial Intelligence using Natural language for Contextualized SQL Handling

cs.AI updates on arXiv.org 2025-10-03T04:18:08.000000Z

Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks

cs.AI updates on arXiv.org 2025-10-03T04:17:58.000000Z

Can LLMs Refuse Questions They Do Not Know? Measuring Knowledge-Aware Refusal in Factual Tasks

cs.AI updates on arXiv.org 2025-10-03T04:17:58.000000Z

Mailbag: How to Bootstrap Labels for Relevant Docs in Search

https://eugeneyan.com/rss 2025-09-30T11:12:10.000000Z

Copyright © 2019 FISHAI.All Rights Reserved