Nvidia Blog 09月04日
AI学习常识
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AI模型在快速发展和扩展,但它们缺乏人类常见的常识:通过现实世界经验发展起来的理解,例如鸟不能向后飞、镜子具有反射性和冰融化成水。为了解决这个问题,NVIDIA正在开发一系列测试,以指导AI模型了解物理世界的限制,即教AI常识。这些测试用于开发推理模型,例如NVIDIA Cosmos Reason,这是一个用于物理AI应用的开推理视觉语言模型(VLM),该模型擅长生成时间相关的响应。Cosmos Reason在Hugging Face上的物理推理排行榜上位居榜首。与之前的VLM相比,Cosmos Reason的设计旨在加速机器人、自动驾驶汽车和智能空间等领域的物理AI开发。该模型可以使用物理常识知识推断和推理前所未有的场景。为了理解复杂的环境,包括工业空间和实验室,模型必须从小处着手。例如,在下面的测试中,Cosmos Reason模型被要求回答关于视频中相对运动的选择题。为了发展他们的推理能力,NVIDIA模型通过强化学习学习物理世界的常识。例如,机器人不知道左、右、上或下的方向。它们通过训练学习这些空间-时间限制。在安全测试中使用的AI驱动的机器人必须被教知道它们的物理形式如何与其周围环境相互作用。如果没有将常识嵌入这些机器人的训练中,部署时可能会出现问题。

🔍AI模型在快速发展和扩展,但它们缺乏人类常见的常识:通过现实世界经验发展起来的理解,例如鸟不能向后飞、镜子具有反射性和冰融化成水。

📚为了解决这个问题,NVIDIA正在开发一系列测试,以指导AI模型了解物理世界的限制,即教AI常识。这些测试用于开发推理模型,例如NVIDIA Cosmos Reason,这是一个用于物理AI应用的开推理视觉语言模型(VLM),该模型擅长生成时间相关的响应。

🏆Cosmos Reason在Hugging Face上的物理推理排行榜上位居榜首。与之前的VLM相比,Cosmos Reason的设计旨在加速机器人、自动驾驶汽车和智能空间等领域的物理AI开发。该模型可以使用物理常识知识推断和推理前所未有的场景。

🤖为了理解复杂的环境,包括工业空间和实验室,模型必须从小处着手。例如,在下面的测试中,Cosmos Reason模型被要求回答关于视频中相对运动的选择题。为了发展他们的推理能力,NVIDIA模型通过强化学习学习物理世界的常识。

🔬机器人不知道左、右、上或下的方向。它们通过训练学习这些空间-时间限制。在安全测试中使用的AI驱动的机器人必须被教知道它们的物理形式如何与其周围环境相互作用。如果没有将常识嵌入这些机器人的训练中,部署时可能会出现问题。

AI models are advancing at a rapid rate and scale.

But what might they lack that (most) humans don’t? Common sense: an understanding, developed through real-world experiences, that birds can’t fly backwards, mirrors are reflective and ice melts into water.

While such principles seem obvious to humans, they must be taught to AI models tasked with accurately answering complex questions and navigating unpredictable physical environments, such as industrial warehouses or roads.

NVIDIA is tackling this challenge by developing a set of tests to coach AI models on the limitations of the physical world. In other words, to teach AI common sense.

These tests are used to develop reasoning models such as NVIDIA Cosmos Reason, an open reasoning vision language model (VLM) used for physical AI applications that are proficient in generating temporally grounded responses. Cosmos Reason just topped the physical reasoning leaderboard on Hugging Face.

Cosmos Reason is unique compared with previous VLMs as it’s designed to accelerate physical AI development for fields such as robotics, autonomous vehicles and smart spaces. The model can infer and reason through unprecedented scenarios using physical common-sense knowledge.

For models to understand complex environments — including industrial spaces and laboratories — they must start small. For example, in the test depicted below, the Cosmos Reason model is tasked with answering a multiple-choice question about the relative motion in the video:

Example from Cosmos Reason evaluation dataset

What Does Reasoning Look Like for an AI Model? 

To develop their reasoning capabilities, NVIDIA models are being taught physical common sense about the real world via reinforcement learning.

For example, robots don’t intuitively know which way is left, right, up or down. They’re taught these spatial-temporal limitations through training. AI-powered robots used in safety testing, such as vehicle crash testing, must be taught to be aware of how their physical forms interact with their surroundings.

Without embedding common sense into the training of these robots, issues can arise in deployment.

“Without basic knowledge about the physical world, a robot may fall down or accidentally break something, causing danger to the surrounding people and environment,” said Yin Cui, a Cosmos Reason research scientist at NVIDIA.

Distilling human common sense about the physical world into models is how NVIDIA is bringing about the next generation of AI.

Enter the NVIDIA data factory team: a group of global analysts who come from various backgrounds — including bioengineering, business and linguistics. They’re working to develop, analyze and compile hundreds of thousands of data units that will be used to train generative AI models on how to reason.

The Data Curation Process

One of the NVIDIA data factory team’s projects focuses on the development of world foundation models for physical AI applications. These virtual environments create deep learning neural networks that are safer and more effective for training reasoning models, based on simulated domains.

It all starts with an NVIDIA annotation group that creates question-and-answer pairs based on video data. These videos are all from the real world and can include any type of footage, whether depicting chickens walking around in their coop or cars driving on a rural road.

For example, an annotator might ask about the video below: “The person uses which hand to cut the spaghetti?”

Example from Cosmos Reason evaluation dataset

The annotators then come up with four multiple choice answers labeled A, B, C and D. The model is fed the data and has to reason and choose the correct answer.

“We’re basically coming up with a test for the model,” said Cui. “All of our questions are multiple choice, like what students would see on a school exam.”

These question-and-answer pairs are then quality checked by NVIDIA analysts, such as Michelle Li.

Li has a background in public health and data analytics, which allows her to look at the broader purpose of the data she analyzes.

“For physical AI, we have a specific goal of wanting to train models on understanding the physical world, which helps me think about the bigger picture when I’m looking at the Q&A pairs and the types of questions that are being presented,” Li said. “I ask myself, do the Q&A pairs that I’m looking at align with our objectives for the guidelines that we have for the project?”

After this, the data is reviewed by the data factory leads of the project, who make sure it’s up to quality standards and ready to be sent to the Cosmos Reason research team. The scientists then feed the hundred thousands of data units — in this case the Q&A pairs — to the model, training it with reinforcement learning on the bounds and limitations of the physical world.

What Are the Applications of Reasoning AI? 

Reasoning models are exceptional because they can make sense of their temporal space as well as predict outcomes. They can analyze a situation, come up with a thought web of probable outcomes and infer the most likely scenario.

Simply put, reasoning AI demonstrates humanlike thinking. It shows its work, giving the user insight into the logic behind its responses.

Users can ask these models to analyze a video such as of two cars driving on a road. When asked a question like, “What would happen if the cars were driving toward each other on the same lane?” the model can reason and determine the most probable outcome of the proposed scenario — for example, a car crash.

“We’re building a pioneering reasoning model focused on physical AI,” said Tsung-Yi Lin, a principal research scientist on the Cosmos Reason team at NVIDIA.

The data factory team’s ability to produce high-quality data will be imperative for driving the development of intelligent autonomous agents and physical AI systems that can safely interact with the real world as NVIDIA reasoning model innovation continues.

Preview NVDIA Cosmos-Reason1 or download the model on Hugging Face and GitHub.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI Common Sense Reasoning Models Physical AI NVIDIA Cosmos Reason Robotic Autonomous Vehicles Smart Spaces
相关文章