AWS Machine Learning Blog 08月15日
Citations with Amazon Nova understanding models
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

大型语言模型(LLMs)在日常和企业应用中日益普及,但其“幻觉”和提供错误信息的倾向带来了信任问题。借鉴人类专家的模式,LLMs通过展示思考过程和引用可靠来源来增强可信度。本文介绍了如何通过特定提示(prompting)指导Amazon Nova理解模型提供引用,并阐述了评估这些引用准确性的方法。引用不仅能确保事实准确性、建立透明度、支持伦理实践,还能提升用户体验、弥合LLMs局限性,并满足学术和专业标准。文章以Amazon Nova Pro为例,展示了如何通过包含引用格式的指令,使其在回答问题时附带原文摘录,从而提高回答的可验证性和可靠性。

💡 LLMs的信任挑战源于“幻觉”现象,即模型可能自信地提供不准确的信息。为了解决这个问题,LLMs需要像人类专家一样,能够解释其推理过程并提供信息来源的引用,从而使用户能够验证其回答的准确性。

📝 引用在LLM应用中至关重要,它能确保事实准确性,减少错误信息;建立用户信任和透明度,使用户能交叉核对信息来源;支持伦理实践,尊重知识产权并避免抄袭;提升可用性,方便用户查找相关资料;并能弥合LLMs因无法访问实时数据或准确记忆训练源而产生的局限性。

🚀 Amazon Nova理解模型,包括Nova Micro、Lite、Pro和Premier,可以通过精心的提示工程来生成包含引用的响应。通过在提示中明确指示模型引用来源并指定输出格式,可以引导模型从提供的上下文中提取精确的文本片段作为引用,从而提高回答的可信度和可追溯性。

📊 评估LLM生成引用的准确性是关键一步。文中提到了使用“LLM-as-a-judge”技术,即利用另一个LLM(如Anthropic Claude 3.5 Sonnet)来评估模型响应的多个指标,包括忠实度、正确性、连贯性等,以量化和优化AI模型的表现,确保引用的可靠性。

🌟 通过实例展示,当要求Amazon Nova Pro根据提供的股东信回答问题时,模型能够准确地提取相关信息,并附带精确的引用标记(如[1]、[2]等),这些引用直接指向原文中的具体句子,有效证明了其回答的来源和准确性。

Large language models (LLMs) have become increasingly prevalent across both consumer and enterprise applications. However, their tendency to “hallucinate” information and deliver incorrect answers with seeming confidence has created a trust problem. Think of LLMs as you would a human expert: we typically trust experts who can back up their claims with references and walk us through their reasoning process. The same principle applies to LLMs – they become more trustworthy when they can demonstrate their thought process and cite reliable sources for their information. Fortunately, with proper prompting, LLMs can be instructed to provide these citations, making their outputs more verifiable and dependable.

In this post, we demonstrate how to prompt Amazon Nova understanding models to cite sources in responses. Further, we will also walk through how we can evaluate the responses (and citations) for accuracy.

What are citations and why are they useful? 

Citations are references to sources that indicate where specific information, ideas, or concepts in a work originated. Citations play a crucial role in addressing the following issues, enhancing the credibility, usability, and ethical grounding of LLM-based applications.

    Ensuring factual accuracy: LLMs are prone to “hallucinations,” where they generate plausible but incorrect information. Citations allow users to verify claims by tracing them back to reliable sources, improving factual correctness and reducing misinformation risks. Building trust and transparency: Citations foster trust in AI-generated content so users can cross-check information and understand its origins. This transparency is vital for applications in research, healthcare, law, and education. Supporting ethical practices: Citing sources ensures proper attribution to original authors, respecting intellectual property rights and scholarly contributions. It prevents plagiarism and promotes ethical AI use. Enhancing usability: Citations improve user experience by providing a pathway to explore related materials. Features like inline citations or bibliographies help users find relevant sources easily. Addressing Limitations of LLMs: LLMs often fabricate references due to their inability to access real-time data or remember training sources accurately. Retrieval augmented generation (RAG) systems and citation tools mitigate this issue by grounding responses in external data. Professional and academic standards: In academic contexts, citations are indispensable for replicating research methods and validating findings. AI-generated outputs must adhere to these standards to maintain scholarly integrity.

Citations with Amazon Nova models

Amazon Nova, launched in Dec 2024, is a new generation of foundation models that deliver frontier intelligence and industry leading price performance, available on Amazon Bedrock. Amazon Nova models include four understanding models (Nova Micro, Nova Lite, Nova Pro and Nova Premier), two creative content generation models (Nova Canvas and Nova Reel), and one speech-to-speech model (Nova Sonic). Through seamless integration with Amazon Bedrock, developers can build and scale generative AI applications with Amazon Nova foundation models.

Citations for the Amazon Nova understanding models can be achieved by crafting prompts where we instruct the model to cite its sources and indicate the response format. To illustrate this, we’ve picked an example where we ask questions to Nova Pro about Amazon shareholder letters. We will include the shareholder letter in the prompt as context and ask Nova Pro to answer questions and include citations from the letter(s).

Here’s an example prompt that we constructed for Amazon Nova Pro following best practices for prompt engineering for Amazon Nova.

Note the output format that we included in the prompt to distinguish the actual answers from the citations.

System prompt 

##InstructionYou are a QA agent. You answer questions based on the context provided. You will answer the question and also include exact excerpts from the context and quote them as quotes. If you are unable to answer the question based on the context, your response should be “I’m sorry I cannot answer based on the context”.##Examples: Question: What factors contributed to the growth of Amazon Quotes: [1] Ourvision for Kindle is every book ever printed in any language, all available in less than 60 seconds.Publishers—including all the major publishers—have embraced Kindle, and we’re thankful for that. From apublisher’s point of view, there are a lot of advantages to Kindle. Books never go out of print, and they never goout of stock. Nor is there ever waste from over-printing. Most important, Kindle makes it more convenient forreaders to buy more books. Anytime you make something simpler and lower friction, you get more of it.Answer: Inovation with Kindle and publisher collaboration contributed to the growth of Amazon [1] ##Output Format Quotes: [1] ....[2] ...Answer:

User Prompt

Question: What contributed to net sales?Context: <Amazon 2009 shareholder letter... document not shown for brevity> 

Here’s the response from Nova Pro for the above prompt

Quotes: [1] "Net sales increased 28% year-over-year to $24.51 billion in 2009." [2] "More customers are taking advantage of Amazon Prime, with worldwide memberships up significantly over last year." [3] "We added 21 new product categories around the world in 2009..." [4] "It was a busy year for our shoes business. In November we acquired Zappos..." [5] "Sales of products by third party sellers on our websites represented 30% of unit sales in 2009." [6] "Amazon Web Services continued its rapid pace of innovation, launching many new services and features..." [7] "The U.S. Kindle Store now has more than 460,000 books..."Answer: Several factors contributed to the increase in net sales for Amazon in 2009. These include the significant growth in Amazon Prime memberships [2], the addition of 21 new product categories globally [3], the acquisition of Zappos which enhanced the shoes business [4], and the strong performance of third-party sellers on Amazon's platforms [5]. Additionally, the innovation and expansion of Amazon Web Services (AWS) attracted more customers [6], and the growth of the Kindle Store with a larger selection of books and content [7] also played a role. These combined efforts resulted in a 28% year-over-year increase in net sales to $24.51 billion [1].

As you can see Nova Pro is following our instructions and providing the answer along with the citations. We’ve verified the quotes are indeed present in the 2009 shareholder letter.

Here’s another user prompt (with the same system prompt as above) along with the model’s response

User Prompt:

Question: How does Bezos describe Amazon's approach to failure, and how does he tie it to innovation?Context: <Amazon 2006 shareholder letter> 

Model response

Quotes: [1] "We won’t always choose right, and we won’t always succeed. But we will be choosy, and we will work hard and patiently."[2] "In my view, Amazon’s culture is unusually supportive of small businesses with big potential, and I believe that’s a source of competitive advantage."[3] "The culture demands that these new businesses be high potential and that they be innovative and differentiated, but it does not demand that they be large on the day that they are born."Answer: Bezos describes Amazon's approach to failure as one that is inherently tied to innovation and experimentation. He acknowledges that not every venture will be successful, stating, "We won’t always choose right, and we won’t always succeed" [1]. However, he emphasizes the importance of being selective and persistent, working hard and patiently despite potential failures. Bezos ties this approach to innovation by highlighting Amazon's supportive culture towards small, high-potential businesses. He notes, "In my view, Amazon’s culture is unusually supportive of small businesses with big potential, and I believe that’s a source of competitive advantage" [2]. This culture allows for the nurturing of innovative ideas, even if they start small. He further explains that the company’s culture demands innovation and differentiation in new businesses, but does not require them to be large from the outset [3]. This patient and experimental approach fosters an environment where failure is seen as a part of the innovation process, rather than a deterrent.

Evaluating citations 

While citations are good, it’s important to evaluate that the model is following our instructions and including the citation verbatim from the context and not making up the citations.

To evaluate the citations at scale, we used another LLM to judge the responses from Amazon Nova Pro. We used the LLM-as-a-judge technique in Amazon Bedrock evaluations and evaluated 10 different prompts. LLM-as-a-judge on Amazon Bedrock Model Evaluation provides a comprehensive, end-to-end solution for assessing and optimizing AI model performance. This automated process uses the power of LLMs to evaluate responses across multiple metric categories (such as correctness, completeness, harmfulness, helpfulness and more) offering insights that can significantly improve your AI applications.

We prepared the input dataset for evaluation. The input dataset is a jsonl file containing our prompts that we want to evaluate. Each line in the jsonl file must include key-value pairs. Here are the required and optional fields for the input dataset:

Here’s an example jsonl file for evaluating our prompts (full jsonl file not shown for brevity).

{   "prompt": "##Model Instructions You are a QA agent. You answer questions based on the context provided. You will answer the question and also include exact excerpts from the context and quote them as quotes. \n ##Examples: \nQuestion: What factors contributed to the growth of Amazon\nQuotes: [1] Ourvision for Kindle is every book ever printed in any language, all available in less than 60 seconds. Publishers—including all the major publishers—have embraced Kindle, and we're thankful for that. From a publisher’s point of view, there are a lot of advantages to Kindle. Books never go out of print, and they never go out of stock. Nor is there ever waste from over-printing. Most important, Kindle makes it more convenient for readers to buy more books. Anytime you make something simpler and lower friction, you get more of it.\n Answer: Inovation with Kindle and publisher collaboration contributed to the growth of Amazon [1]\n\n ##Output Format\nQuotes: [1] ....\n[2] ...\n\n Answer: \n\nQuestion:How does Bezos describe Amazon's approach to failure, and how does he tie it to innovation?\n Context: <Amazon shareholder letter…. Not included here for brevity”}{ "prompt":……..}

We then started a model evaluation job using the Bedrock API with Anthropic Claude 3.5 Sonnet v1 as the evaluator/judge model. We have open sourced our code on the AWS Samples GitHub.

We evaluated our prompts and responses for the following built-in metrics

    Helpfulness Correctness Professional style and tone Faithfulness Completeness Coherence Following instructions Relevance Readability Harmfuless

Here’s the result summary of our evaluation. As you can see, Nova Pro had a 0.78 score on coherence and faithfulness and 0.67 on correctness. The high scores indicate that Nova Pro’s responses were holistic, useful, complete and accurate while being coherent as evaluated by Claude 3.5 Sonnet.

Conclusion

In this post, we walked through how we can prompt Amazon Nova understanding models to cite sources from the context through simple instructions. Amazon Nova’s capability to include citations in its responses demonstrates a practical approach to implementing this feature, showcasing how simple instructions can lead to more reliable and trustworthy AI interactions. The evaluation of these citations, using an LLM-as-a-judge technique, further underscores the importance of assessing the quality and faithfulness of AI-generated responses. To learn more about prompting for Amazon Nova models please visit this prompt library. You can learn more about Amazon Bedrock evaluations on the AWS website.


About the authors

Sunita Koppar is a Senior Specialist Solutions Architect in Generative AI and Machine Learning at AWS, where she partners with customers across diverse industries to design solutions, build proof-of-concepts, and drive measurable business outcomes. Beyond her professional role, she is deeply passionate about learning and teaching Sanskrit, actively engaging with student communities to help them upskill and grow.

Veda Raman is a Senior Specialist Solutions Architect for generative AI and machine learning at AWS. Veda works with customers to help them architect efficient, secure, and scalable machine learning applications. Veda specializes in generative AI services like Amazon Bedrock and Amazon SageMaker.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 LLMs 引用 可信度 Amazon Nova
相关文章