Salesforce Blog AI Research 10月15日
Salesforce发布eVerse框架,提升AI代理能力与一致性
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Salesforce AI Research推出名为eVerse的企业模拟框架,旨在解决AI代理在面对看似简单任务时表现不一致的“锯齿智能”问题。该框架通过“合成(Synthesize)”、“测量(Measure)”和“训练(Train)”三个互联步骤,为AI代理提供逼真的企业环境模拟和强化学习训练,以优化其能力和一致性。eVerse利用合成数据构建企业“数字孪生”,模拟真实业务场景,并能在不触及真实数据的情况下进行严格测试,特别是在复杂的语音交互方面。通过结合人类专家指导的强化学习,eVerse显著提升了AI代理在企业任务上的表现,致力于实现“企业通用智能”,即在特定业务领域兼具高能力和高一致性的AI。

🧩 **解决“锯齿智能”问题,提升AI代理可靠性**:文章指出,即使是先进的AI模型也可能在简单任务上表现不佳,这种“锯齿智能”现象对企业应用构成风险。Salesforce发布的eVerse框架旨在通过模拟和训练,显著提高AI代理在复杂企业环境中的稳定性和可靠性,确保其在关键业务流程中的表现一致。

🛠️ **eVerse三大核心步骤:合成、测量、训练**:该框架包含三个关键环节。首先,“合成”阶段通过创建企业“数字孪生”生成逼真的合成数据和模拟环境,确保用户数据安全;其次,“测量”阶段在高度仿真的场景下,特别是复杂的语音交互中,对AI代理进行严格的压力测试;最后,“训练”阶段利用人类专家指导的强化学习来弥合性能差距,从而实现AI能力的提升。

📈 **显著提升性能与一致性**:通过eVerse的训练,AI代理在企业任务上的表现有了显著改进,成功率从19%提升至88%。这种方法将AI代理从通用语言模型转变为高度专业化、兼具高能力和高一致性的“冠军”级别系统,以满足企业对可靠性和效率的严格要求。

🌐 **迈向“企业通用智能”**:eVerse是Salesforce实现“企业通用智能”(EGI)愿景的关键一步。EGI专注于优化AI在具体商业应用中的表现,强调在复杂多步工作流程中的稳定性和可靠性,这与追求广泛通用能力的消费级AI模型有所不同。

Salesforce AI Research announces framework to optimize agent capability and consistency through synthetic data, realistic testing, and reinforcement learning.

Even as AI models grow more sophisticated, a curious challenge persists: systems that solve PhD-level mathematics struggle with surprisingly simple tasks. Ask a leading language model the famous riddle “Where does Christmas come before Thanksgiving?” and it correctly answers “in the dictionary”—because alphabetically, ‘C’ precedes ‘T.’

But swap the words—ask “Where does Thanksgiving come before Christmas?”—and watch the same model confidently explain that “in the dictionary, Thanksgiving comes before Christmas alphabetically.” This phenomenon, which we call “jagged intelligence,” reveals sharp peaks of brilliance alongside unexpected valleys of weakness.

For enterprise businesses, this inconsistency isn’t academic—it’s operational. When AI agents handle customer service calls, process sales workflows, or manage healthcare billing, jagged intelligence creates real business risk. An agent might flawlessly handle complex multi-step tasks one moment, then stumble on straightforward requests the next. This unpredictability is a dealbreaker for enterprises where reliability matters as much as capability.

At Salesforce AI Research, we’ve developed a new methodology to mitigate these risks. Today, we’re announcing eVerse: an enterprise simulation framework that trains AI agents like elite athletes, optimizing them for both capability and consistency through three interconnected steps: Synthesize, Measure, and Train.

eVerse: Synthesize – Building the Enterprise “Digital Twin”

Training best-in-class AI agents requires best-in-class training environments. Just as Formula 1 drivers spend thousands of hours in sophisticated simulators before competing at Monaco, enterprise AI agents need realistic practice grounds that mirror the complexity of actual business operations.

Because trust is Salesforce’s #1 value, we’ve designed a training approach that never puts your real data at risk. Our recent research work with CRMArena-Pro is a great example. It creates completely synthetic training grounds with realistic customer data, multi-step workflows, and the edge cases that make business operations unpredictable. Agents learn in environments that mirror real enterprise systems, while your and your customers’ data remains private, secure, and completely untouched. Learn more about our work in simulation environments in my recent blog, The New AI Agent Training Ground: Simulating Enterprise Environments.

The validation speaks for itself: 90% of domain experts rate our synthetic data generation as realistic or very realistic. Even more telling—the majority of the demos you’re seeing at Dreamforce this week use synthetic data generated by CRMArena-Pro.

eVerse: Measure – Stress-Testing in Realistic Scenarios

Synthesis alone isn’t enough. We must rigorously measure agent performance across the scenarios that matter most to enterprises. This includes one of the most critical—and challenging—modalities: voice interactions.

Voice conversations introduce layers of complexity that text-based testing misses: background noise, diverse accents, translation errors, poor connections, multiple speakers. eVerse simulates these realistic voice interactions, generating synthetic phone conversations that sound remarkably human while testing agents against comprehensive enterprise scenarios.

This measurement infrastructure operates behind the scenes throughout Salesforce. It’s how we validated Agentforce voice capabilities before launch, running thousands of synthetic conversations to ensure agents could handle real-world complexity with both high capability and unwavering consistency.

eVerse: Train –  Closing Performance Gaps with Human Expertise

After measurement reveals performance gaps, eVerse’s training engine closes them through reinforcement learning guided by human expertise. Our research has demonstrated remarkable improvements using this method: 69% better performance on enterprise tasks (from 19% to 88% success rates). We’re currently piloting eVerse with customers. One example is UCSF Health, where we’re partnering with human experts to train and refine AI that helps simplify and improve the healthcare billing experience. 

“When used responsibly, we believe AI can help our teams simplify one of the most complex parts of healthcare, creating a billing experience that feels more seamless and truly patient-centered.”

Sara Murray, MD, MAS, VP & Chief Health AI Officer at UCSF Health remarked

This continuous loop—synthesize environments, measure performance, train on gaps—transforms agents from generic language models into enterprise-specialized systems ready for production deployment.

The Path to Enterprise General Intelligence

This work advances our vision for what we call Enterprise General Intelligence (EGI): AI optimized for business applications that excels in both capability and consistency. While consumer AI prioritizes broad general-purpose capabilities, enterprise AI demands reliable performance across specific and complex, multi-step workflows where inconsistency carries real business risk.

eVerse addresses this by moving agents along both dimensions simultaneously. Generic LLM agents underperform in business settings—high capability but low consistency creates the “prodigy” problem: brilliant when it works, unreliable when it matters. eVerse-trained agents achieve the “champion” quadrant: high capability combined with high consistency, exactly what enterprises require.

The Competitive Imperative

The organizations that will lead in the agentic AI era won’t necessarily be those with the most advanced models—they’ll be the ones who recognized early that enterprise AI excellence requires sophisticated training environments bridging the gap between simulation and reality.

This body of research—from eVerse to voice simulation to reinforcement learning from human feedback—represents Salesforce’s commitment to making AI agents genuinely enterprise-ready: trustworthy, reliable, and grounded in enterprise business intelligence. The future belongs to agents trained in environments that simulate millions of realistic business scenarios, validated by domain experts, and continuously refined through real-world feedback loops.

We’re sharing eVerse at Dreamforce because our research advances through continuous customer engagement. The human feedback that trains agents in eVerse comes from our customers’ domain experts—the same organizations who will deploy these systems. This partnership between research and practice is how enterprise AI becomes genuinely reliable.

Join us as we shape what enterprise-ready AI agents can become.  

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Salesforce AI 人工智能 AI代理 企业AI eVerse 机器学习 强化学习 合成数据 Salesforce AI Research Enterprise General Intelligence
相关文章