Communications of the ACM - Artificial Intelligence 11月03日 00:55
AI信任度评估工具
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

研究人员开发了一款名为VizTrust的软件工具,能够实时、自动化地识别和定位人工智能(AI)与人类对话中建立信任的关键环节。与传统的对话结束后进行满意度调查不同,VizTrust能够逐轮分析对话,精确标记出影响信任度的对话节点,从而帮助AI系统更有效地赢得用户信任。该工具通过分析AI在能力、仁慈、正直和可预测性等方面的表现,为提升人机交互的信任度提供了有力的支持。

📊 **VizTrust工具革新AI信任度评估方式**:传统上,评估AI是否赢得用户信任依赖于对话后的问卷调查、访谈和焦点小组等人工方式。VizTrust则通过软件工具,实现了对人机对话中建立信任环节的精确、自动化识别和实时定位,极大地提升了评估的效率和准确性。

🔍 **实时逐轮信任度分析**:VizTrust能够直接标记出对话中影响用户信任度的关键“回合”(turn),即AI的回应。这使得研究人员和开发者能够实时了解对话进程中信任度的变化趋势,而非依赖于用户对整个对话的记忆。此举有助于AI在交互过程中即时调整策略以建立和巩固信任。

⚖️ **多维度信任指标衡量**:该工具基于对“信任”的临床定义,从四个核心维度——能力(competence)、仁慈(benevolence)、正直(integrity)和可预测性(predictability)——来衡量AI的表现。通过对这四个维度的量化评估,VizTrust能够全面、客观地展现AI的潜在信任度。

💡 **广泛的应用前景**:VizTrust的实时信任度评估能力在多个领域具有巨大潜力。例如,在医疗健康领域,它可以监测患者对数字健康助手的信任感;在教育领域,AI导师可以根据学生信任度的波动调整教学方式和语气。这为优化人机交互体验提供了数据支持。

Today, evaluating the extent to which artificial intelligence (AI) successfully gains a user’s trust is performed manually in analog ways as it long has been, through follow-up surveys, human-to-human interviews, and focus groups, according to researchers at Binghamton University, the University of Hawaii, and Clemson University.

Despite the billions of lines of code written for execution by a large language model (LLM) to win a human’s trust, in the end “trust” alone is a blunt instrument.

These researchers say a software tool they developed sharpens, automates, and promptly identifies the precise location of trust-building turns in a human/AI conversation. (The first “turn” in a human/AI conversation is a query from the human; the answer from the AI is also a turn; the collection of these turns over time make up the whole conversation.)

Usually the manual survey is conducted immediately after the human/AI conversation is completed, and measures the overall effect the human/AI conversation has on trust. But Binghamton doctoral candidate Xin Wang said the world needs trust-recognition software that can directly flag, in real time, the precise location of a trust-affecting turn during a human/AI conversation.

“By identifying turns independently, VizTrust”—the app, named after its dashboard “viz”ualization of trust parameters—“pinpoints when a conversation turns toward or away from trust,” said Wang.

The entire process of trust-building with AIs is described in “VizTrust: A Visual Analytics Tool for Capturing User Trust Dynamics in Human-AI Communication,” presented in April at the ACM Conference on Human Factors in Computing Systems in Yokohama, Japan.

Wang claims her software is the world’s first “trust-tracking app” between human and AI partners that unfolds turn-by-turn in real time, rather than relying on what surveyed humans remember after their conversations with AIs are completed.

For her prototype, Wang first searched for an open-source clinical definition of “trust,” which she found in the Principles of Social Psychology at the Open Textbook Library, a collection of peer-reviewed textbooks available open source worldwide through Creative Commons licenses. Using this definition of “trust,” the AI can be measured on four scales: for competence, benevolence, integrity, and predictability, all of which indicate the potential for trust by a human towards an AI. Other definitions of “trust” could be considered in real applications, as compared to the VizTrust prototype app.

“The measurable indicators of human-trusting/AI has many applications,” according to Zicheng Zhu, who was not involved in the research, and serves as a research specialist in AI assistants and human well-being at the National University of Singapore’s AI for Social Good lab.

For instance, “In healthcare, a trust-assessment system would monitor if patients feel safe and understood by digital health assistants. In education, it can help AI tutors adjust their explanations and tone of voice when a student’s trust begins to waver,” Wang said. “Manual surveys still provide valuable insights, especially when capturing subjective experiences or emotions. However, our software shows exactly when and how trust was affected, turn-by-turn. A whole conversation can be evaluated this way in real time with supporting graphics and text for quick comprehension.”

Evaluating for Trust

The VizTrust LLM-based software has a front end managed by Meta’s Llama LLM, which carries on a conversation with the human. At the end of each turn, the dialog is passed to multiple Mixtral LLM agents, which were designed to be used together. The Mixtral agents evaluate the four trust parameters. Each trust-specific LLM agent processes conversation turns in real time to assess the AI system’s impact on user trust. Rather than simply relying on a single powerful LLM model, the researchers emphasized their use of multi-agent collaboration in extracting trust-related cues from dialogue using interpretable, context-aware techniques. The dialog is also passed to a suite of Python programs that analyze the human’s behavior. After each conversation turn has been analyzed, it is visualized graphically in real time on the dashboard for possible modification by the AI stakeholders.

“You can review turns that led to changes in trust and use these insights to improve the design and capabilities of the AI system, guiding interactions toward more trustworthy and effective human-AI communications,” said Wang.

Measuring the amount of trust attained by the AI is based on six-LLMs, open-source Mixtrals specialized here to measure each of the four varieties of trust, plus an initializing LLM that establishes configurations at the beginning of a conversation; also a finalizing LLM AI collects the rating scores of each trust agent, and assembles supporting evidence from the trust-bearing conversation turns into real-time visualizations at the upper left of the following figure of the dashboard.

The VizTrust dashboard tracks the “trust” shown in an expert AI by a human novice in four trust categories. Visualizations express the “trustworthiness” of the AI as perceived by the human after each turn of their Q&A conversation. Unlike traditional surveys, which address the results of an entire conversation after its conclusion, the VizTrust dashboard explores trust “cues” at each turn in the conversation.
Credit: Binghamton University, State University of New York

The upper right of the dashboard is dedicated to measuring “engagement” of the human with the AI. Engagement is measured using two factors: the length of the human prompt (longer prompts indicate greater openness to engage with the AI). The second measure is the “informativeness” of each word in the human prompt.

The bottom left of the dashboard graphic shows the emotion—anger, fear, joy, love, sadness, surprise—expressed with each turn. The bottom right plots the “theory of user politeness,” including slots for apologizing, gratitude, deference, factuality, and hedges.

In addition, alternative user interfaces can either plug users directly into the AI for human/AI conversations, or into the dashboard where trust-based analytics are displayed, or into the stakeholder’s module which specifically follows certain “trust” level achievements during development of an application. Design stakeholders also gain detailed information on trend-changing points as their supporting evidence for why each trust score changed rates from its last turn.

“Turn-by-turn trust evaluation represents a significant step forward for AI in science and user research, offering a transformative approach to measuring human trust—a crucial factor in human-AI interaction,” said Zhu.

R. Colin Johnson is a Kyoto Prize Fellow who ​​has worked as a technology journalist ​for two decades.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI信任度 人机交互 VizTrust 信任评估 实时分析 AI Trust Human-AI Interaction VizTrust Trust Evaluation Real-time Analysis
相关文章