LeadDev 09月30日
AI代码助手质量堪忧,需谨慎使用
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

一项对82,845条聊天记录的分析显示,流行的AI代码助手往往回复过长,通常比人类提示长14倍,且生成的代码含大量基本错误。研究发现,近七成对话涉及来回交流,常因用户中途改变目标或需澄清细节。AI助手平均响应约2,000字符,远超Stack Overflow的836字符。代码质量问题严重,包括75%的JavaScript片段存在未定义变量,83%的Python片段命名无效,41%的C++代码缺少头文件等。此外,错误会随时间累积,Python中未定义变量问题在第五轮对话时从24%升至33%。研究人员建议通过静态分析反馈诊断,并强调明确指令的重要性。

🔍 研究显示,AI代码助手回复通常远超人类提示长度,平均长达2000字符,远超Stack Overflow的836字符,导致资源浪费。

⚠️ AI生成的代码质量堪忧,常见基本错误,如75%的JavaScript片段存在未定义变量,83%的Python片段命名无效,41%的C++代码缺少头文件等。

🔄 错误会随时间累积,Python中未定义变量问题在第五轮对话时从24%升至33%,表明AI助手在纠错能力上存在不足。

🗣️ 研究强调明确指令的重要性,开发者应通过静态分析反馈诊断,并优化提示工程,以提升AI助手的可靠性和效率。

🧩 尽管存在缺陷,AI助手仍是工具,通过谨慎使用和反馈优化,可成为可靠的辅助工具,但需认识到其局限性,不应过度依赖。

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Estimated reading time: 2 minutes

A new Queen’s University study throws up warning signs for those relying on AI output without careful checking.

A new analysis of 82,845 chat logs shows that popular coding assistants often reply at great length – typically 14 times longer than the human prompt – and much of the code they produce contains basic errors.

Researchers from Queen’s University in Canada looked at the real-world conversations from the WildChat corpus of developer conversations with ChatGPT, containing 368,506 code snippets in more than 20 programming languages. Nearly seven in ten conversations involved some form of back and forth, often because users shifted goals mid-thread or had to clarify missing details. Those chats got pretty in the weeds, too: the average model response was around 2,000 characters, compared to 836 for a typical Stack Overflow answer.

But beyond burning tokens, the quality of the code generated by the AI assistants was cause for concern. Among the issues identified included undefined variables in 75% of JavaScript snippets, invalid naming in 83% of Python snippets (with undefined variables in 31%), missing headers in 41% of C++ code, missing required comments in 76% of Java snippets, and unresolved namespaces in 49% of C# outputs. Those syntactic mistakes weren’t the only problem: maintainability and style issues were common, too.

Check, check and check again

“I think that is a big issue, that it has a lot of defects,” lead author Suzhen Zhong, a researcher at Queen’s University in Kingston, Canada, said. She’s particularly worried about the risk of AI-defected code being deployed into a large-scale, real-life project.  

The idea of spotting errors in conversation with chatbots then fixing them seems sensible, but even there are problems. Zhong and her colleagues found that errors persist, and can even worsen, over time. 

In Python, the share of conversations with undefined variable issues rose from 24% to 33% by turn five in a chat. That wasn’t the case across all languages, though: Java’s documentation violations improved with iteration by chatbots, dropping from 78% to 63%, suggesting some problems are fixable when users explicitly point them out. 

Zhong was “really surprised” by how often Python import errors cropped up, and by how uneven assistants were across languages. “It means that an LLM has different capability levels in different programming languages,” she concludes.

How to solve the problem 

All those issues don’t mean AI assistants are unusable. Indeed, Zhong’s a fan of the tools in her own work. “I’m using LLMs to generate code a lot,” she says. 

But her practical advice for how to harness AI’s efficiencies while ironing out the wrinkles is simple: run the bot’s output through static analysis and feed the diagnostics back into the next prompt. She also says part of the issue is from humans’ non-specific instructions. “Developers should be very clear about their prompt engineering,” she says. 

Combine those and you get closer to something that can be a reliable colleague, rather than an unreliable intern wrecking your codebase. That, and bearing in mind that your purported productivity gains might not be as significant as you think they are, according to other research.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI代码助手 代码质量 错误率 静态分析 提示工程
相关文章