Artificial Ignorance 2024年11月28日
Case Study: Scaling customer intelligence
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文讲述了一位AI工程师如何利用大型语言模型(LLM)分析Pulley公司1万多条销售电话记录,构建客户旅程,并帮助公司理解不同客户群体的需求。传统方法需要耗费大量时间和人力,而AI工程师仅用两周时间就完成了分析,并构建了可过滤、可下载的分析结果,使销售、营销等团队都能从中受益。此外,作者还分享了项目中遇到的技术挑战和解决方案,包括模型选择、准确性保证、可扩展性优化以及用户体验设计等,并强调了AI在提升效率和解锁新可能性方面的价值。

🤔 **项目背景:**Pulley公司希望通过分析大量销售电话记录,了解不同客户群体的需求,从而优化销售和营销策略,但手动分析1万多条电话记录几乎不可能完成。

🚀 **AI解决方案:**利用大型语言模型(LLM)如Claude 3.5 Sonnet,结合检索增强生成(RAG)和提示工程等技术,实现了对销售电话记录的自动化分析,提取关键信息,构建客户旅程。

📊 **技术挑战与突破:**在模型选择、准确性保证、可扩展性优化等方面都面临挑战,最终通过多层级方法、提示缓存和长文本输出等技术手段解决了这些问题,大幅降低了成本并提升了分析速度。

💡 **应用价值:**该项目不仅帮助Pulley公司获得了客户旅程的洞察,还为销售、营销等团队提供了可过滤、可下载的分析结果,提高了工作效率,并激发了团队探索更多数据分析需求的可能性。

🔄 **经验总结:**模型选择需根据需求而定,AI工程需要结合传统软件工程方法,构建有效的软件架构,并考虑AI应用的更多可能性,最终目标是提升效率,解锁新可能。

A few months ago, I joined Pulley as an AI Engineer. This case study is just the first example I'm sharing of how we're using LLMs to drive real business impacts. If that sounds interesting to you, we're hiring!


Question: how many sales calls can you listen to and take notes on in a day?

Well, if you (conservatively) assume each one is thirty minutes, and you work an eight-hour day, that gives you sixteen calls. If, for some reason, you have absolutely zero work-life balance and only stop listening to sleep, then you'd get closer to 32 in a day. That’s 224 in a week.

I had two weeks to analyze ten thousand of them.

That's the challenge I faced when our CEO wanted to dig through our repository of sales calls to map the buyer's journey for various customer personas. Two years ago, this would have been a Herculean task. But now, it's something that a single AI engineer can accomplish in a fortnight.

Side note: why is creating a buyer's journey useful?

Historically, Pulley's ICP (ideal customer profile) has been "venture backed startups," but that's pretty broad. Tailoring a sales and marketing strategy to a specific persona, such as a "CTO of an early-stage AI startup" can lead to much more success for that specific segment. Because Pulley has been around for a while, we had a sense of the different customer segments that were interested in the product - but we wanted to know the exact reasons *why, along with details like how they heard about Pulley, or what alternatives they were considering.

Subscribe now

The opportunity

When trying to glean customer insights, there's no beating talking to customers. In my case, I was doing the next best thing - reading transcripts of sales calls that Pulley's sales reps had done over the years.

The only problem was the sheer scale of the data. A quick look at our Gong dashboard showed over 10,000 call transcripts in the database.

A manual approach to analyzing these calls would have been brutal (if not impossible):

Even if you had the time to read or listen to them all (it would take 625 days, or nearly two years), the human brain simply isn't wired to process that much information while maintaining consistent analysis. It's like trying to read an entire library and then write a detailed summary - not just impossible, but fundamentally mismatched with how humans process information.

Luckily, we have modern LLMs. And this intersection of unstructured data and pattern recognition represents a sweet spot for modern AI.

Traditional approaches to analyzing customer calls typically fall into two categories: manual analysis (high quality but completely unscalable) and keyword analysis (scalable but shallow, missing context and nuance). Leveraging LLMs like ChatGPT and Claude represents a third approach.

The technical challenges

What looks simple in hindsight - "just use AI to analyze the calls" - actually required solving several interconnected technical challenges. Like any complex engineering problem, the solution emerged through careful choices about tradeoffs between accuracy, cost, and speed.

Accuracy

The first major decision was choosing the right model. While GPT-4o and Claude 3.5 Sonnet offered the most intelligence1, they were also the most expensive and slowest options. The temptation to use smaller, cheaper models like 4o-mini or pplx-api2 was strong – if they could do the job, the cost savings would be significant.

However, early experiments quickly showed the limitations of this approach. Smaller models produced an alarming number of false positives, often making confident but incorrect assertions. They would classify customers as crypto companies because a sales rep mentioned blockchain features or decided a customer was the company founder despite zero supporting evidence in the transcripts.

These weren't just minor errors – they threatened the entire project's viability. Any savings on speed or cost would be pointless if the answers couldn't be trusted.

The breakthrough came from realizing that even the best models needed proper context and constraints. We developed a multi-layered approach:

This combination created a system that could reliably extract both accurate company details and meaningful customer journey insights.

Scalability

Using Claude 3.5 Sonnet, though, meant opting for a pretty expensive model. To make matters worse, because we wanted to pull lengthy customer quotes out of the transcripts, we often ran into the 4K token output limit, meaning we'd have to send our entire prompt again to continue the conversation.

To overcome both issues, we leveraged multiple beta features to dramatically lower our costs and improve our analysis speeds.

Prompt Caching: By specifying parts of our prompts to cache ahead of time, we reduced both costs by up to 90% and latency by up to 85%. Most of the prompt context was each call's transcript, meaning we could cache it beforehand and do multiple rounds of analysis.

Longer Outputs: Access to 8K token outputs (double the default) let us generate extended customer insight summaries in a single pass. While not a major factor in reducing latency, it was still a nice quality-of-life improvement to make sure we didn't have to make unnecessary API calls.

Taken together, these features let us turn conduct a $5000 analysis for closer to $500, and get results in hours, not days.

Usability

Of course, it wasn't enough to process the sales transcripts and store them in a database - they had to be accessible to non-technical users. The data wasn't particularly valuable in its raw format, so we added a few features to make things more usable.

First, we made it filterable. Users could filter the calls by customer role, company stage, industry, geolocation, and more. That way, if we wanted to target different personas in the future, we would be able to replicate our analysis quickly.

Second, the transcripts were downloadable. With Cursor's help, I added buttons to download formatted transcripts. This allowed users to perform their own analysis on the fly. In fact, users ended up repeatedly downloading transcripts and creating Claude Projects with them to start asking questions.

Third, we generated a final report for the main personas that we wanted to analyze - using specific citations, we were able to link references back to specific sales calls and list the number of relevant calls for each aspect of the customer journey. We were triple-dipping on our Claude usage at this point - in addition to the API and projects, the chatbot version of Claude did a great job generating artifacts from our extracted data.

The impact

One thing that really surprised me about this project was its wide-ranging impact. What started as a strategic request from the CEO for customer journey insights quickly evolved into something more significant.

For example, the sales team used the tool to automate manually downloading transcripts, saving hours each week. The marketing team pulled key quotes about what customers loved about Pulley, helping to complete a positioning exercise. Our VP of Marketing, in particular, noted that this sort of analysis usually takes "tens if not hundreds of hours" to do.

Beyond the time savings, we also saw our mountains of unstructured data in a new light. Now, teams started asking questions they wouldn't have considered before - because the manual analysis would have been too daunting.

The learnings

There were a few key takeaways for me from this project:

Models matter. As much as the AI community believes in open-source and small language models, it was clear that Claude 3.5 Sonnet and GPT-4o were able to do handle this when other models couldn't. We decided to go with Claude because of its prompt caching - at the time, GPT-4o didn't support the feature. Of course, the right tool isn't always the most powerful one; it's the one that best fits your needs.

Don't neglect the scaffolding. Despite AI's capabilities, there were still many wins gained from "traditional" software engineering, like using JSON structures and having good database schemas. That's what AI engineering is at its core - knowing how to build effective software around LLMs. But it's important to remember that AI must still be thoughtfully integrated into existing architectures and design patterns.

Consider additional use cases. By building a simple yet flexible tool, what could have been a one-off analysis became a company-wide resource. Doing this well takes more than good engineering chops - it requires being able to work cross-functionally and understand business processes. In truth, I was lucky enough to discover these use cases after the fact, but the project opened my eyes to what they might look like in the future.

Perhaps most importantly, this project showed how AI can transform seemingly impossible tasks into routine operations. It's not about replacing human analysis – it's about augmenting that analysis and removing human-based bottlenecks. With AI, services can now support software-like margins.

That's the real promise of tools like Claude: not just doing things faster, but unlocking new possibilities. And as we continue to discover exactly what these models are good for, the best applications will be where technical capabilities meet practical needs.

Artificial Ignorance is reader-supported. If you found this interesting or insightful, consider becoming a free or paid subscriber.

1

o1-preview hadn't been released at the time I built this.

2

One approach that I wanted to test was using Perplexity's API models - the hope was that with their knowledge of the internet, they could enrich information about the transcripts to fill in any gaps. In practice, it seems that their hosted APIs don't access the internet in the same way that their flagship product does - meaning the results were less than stellar.

3

RAG reduced hallucinations and allowed us to build a more sophisticated understanding of context. When a sales rep mentioned “Series A funding,” we could cross-reference this with company databases to verify the funding stage. This meant we could confidently segment conversations even when key details were implied rather than explicitly stated - much like how human analysts pick up on context clues.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 大型语言模型 客户旅程 销售电话分析 数据分析
相关文章