Physics World 09月04日
人工智能:机遇与挑战并存,数据质量是关键
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

人工智能(AI)正以前所未有的速度普及,尤其以生成式AI(GenAI)为代表的工具,正在改变我们的工作和生活方式。GenAI的强大能力得益于海量数据和计算能力的进步,能够高效处理和总结大量文本,并在气候建模、药物发现等复杂问题上提供支持。然而,AI并非完美无瑕,其输出质量高度依赖输入数据的质量和用户的诚信。此外,AI模型可能存在信息审查、数据偏见以及“黑箱”问题,导致决策过程不透明。因此,在使用AI时,用户需保持警惕,注意保护个人隐私,并审慎验证AI生成的内容,确保“垃圾进,垃圾出”的恶性循环不会发生。

💡AI的普及与能力:人工智能,特别是生成式AI(GenAI),已成为一种强大的工具,能够处理和总结大量非结构化文本,并在科学研究等领域提供支持。这得益于海量数字化数据和计算能力的显著进步,使得AI能够高效地完成过去难以想象的任务,为用户带来效率和成本节约的巨大好处。

⚠️AI的两大关键要素:AI系统的运作效果,如同其他数据分析解决方案一样,高度依赖于两个核心要素:输入数据的质量以及用户确保输出结果符合预期的诚信。这意味着AI的成功运用,离不开优质的数据来源和负责任的用户操作。

🚫AI的潜在风险与挑战:AI系统存在潜在风险,包括但不限于:1.信息审查:AI模型可能因训练数据受限或人为干预而隐藏或扭曲信息(如天安门事件的例子);2.模型偏见:训练数据中存在的社会偏见可能被AI继承,导致不公平的输出(如招聘和信贷审批中的性别偏见案例);3.“黑箱”问题:AI决策过程不透明,难以理解其得出结论的依据,可能影响对关键系统(如自动驾驶辅助系统)的信任。

🔒谨慎使用AI的建议:在使用AI工具时,用户应保持审慎。许多GenAI应用会存储用户提示和对话历史,并可能用于训练未来模型,因此应谨慎分享个人信息,如医疗或财务数据。保持提示的非特定性(避免使用姓名或出生日期)有助于保护隐私。同时,用户应在做出重要决策前,仔细检查AI生成的输出,并意识到AI可能犯错,防止“垃圾进,垃圾出”的现象升级为“垃圾进,垃圾的平方”。

Artificial intelligence (AI) is fast becoming the new “Marmite”. Like the salty spread that polarizes taste-buds, you either love AI or you hate it. To some, AI is miraculous, to others it’s threatening or scary. But one thing is for sure – AI is here to stay, so we had better get used to it.

In many respects, AI is very similar to other data-analytics solutions in that how it works depends on two things. One is the quality of the input data. The other is the integrity of the user to ensure that the outputs are fit for purpose.

Previously a niche tool for specialists, AI is now widely available for general-purpose use, in particular through Generative AI (GenAI) tools. Also known as Large Language Models (LLMs), they’re now widley available through, for example, OpenAI’s ChatGPT, Microsoft Co-pilot, Anthropic’s Claude, Adobe Firefly or Google Gemini.

GenAI has become possible thanks to the availability of vast quantities of digitized data and significant advances in computing power. Based on neural networks, this size of model would in fact have been impossible without these two fundamental ingredients.

GenAI is incredibly powerful when it comes to searching and summarizing large volumes of unstructured text. It exploits unfathomable amounts of data and is getting better all the time, offering users significant benefits in terms of efficiency and labour saving.

Many people now use it routinely for writing meeting minutes, composing letters and e-mails, and summarizing the content of multiple documents. AI can also tackle complex problems that would be difficult for humans to solve, such as climate modelling, drug discovery and protein-structure prediction.

I’d also like to give a shout out to tools such as Microsoft Live Captions and Google Translate, which help people from different locations and cultures to communicate. But like all shiny new things, AI comes with caveats, which we should bear in mind when using such tools.

User beware

LLMs, by their very nature, have been trained on historical data. They can’t therefore tell you exactly what may happen in the future, or indeed what may have happened since the model was originally trained. Models can also be constrained in their answers.

Take the Chinese AI app DeepSeek. When the BBC asked it what had happened at Tiananmen Square in Beijing on 4 June 1989 – when Chinese troops cracked down on protestors – the Chatbot’s answer was suppressed. Now, this is a very obvious piece of information control, but subtler instances of censorship will be harder to spot.

Trouble is, we can’t know all the nuances of the data that models have been trained on

We also need to be conscious of model bias. At least some of the training data will probably come from social media and public chat forums such as X, Facebook and Reddit. Trouble is, we can’t know all the nuances of the data that models have been trained on – or the inherent biases that may arise from this.

One example of unfair gender bias was when Amazon developed an AI recruiting tool. Based on 10 years’ worth of CVs – mostly from men – the tool was found to favour men. Thankfully, Amazon ditched it. But then there was Apple’s gender-biased credit-card algorithm that led to men being given higher credit limits than women of similar ratings.

Another problem with AI is that it sometimes acts as a black box, making it hard for us to understand how, why or on what grounds it arrived at a certain decision. Think about those online Captcha tests we have to take to when accessing online accounts. They often present us with a street scene and ask us to select those parts of the image containing a traffic light.

The tests are designed to distinguish between humans and computers or bots – the expectation being that AI can’t consistently recognize traffic lights. However, AI-based advanced driver assist systems (ADAS) presumably perform this function seamlessly on our roads. If not, surely drivers are being put at risk?

A colleague of mine, who drives an electric car that happens to share its name with a well-known physicist, confided that the ADAS in his car becomes unresponsive, especially when at traffic lights with filter arrows or multiple sets of traffic lights. So what exactly is going on with ADAS? Does anyone know?

Caution needed

My message when it comes to AI is simple: be careful what you ask for. Many GenAI applications will store user prompts and conversation histories and will likely use this data for training future models. Once you enter your data, there’s no guarantee it’ll ever be deleted. So  think carefully before sharing any personal data, such medical or financial information. It also pays to keep prompts non-specific (avoiding using your name or date of birth) so that they cannot be traced directly to you.

Democratization of AI is a great enabler and it’s easy for people to apply it without an in-depth understanding of what’s going on under the hood. But we should be checking AI-generated output before we use it to make important decisions and we should be careful of the personal information we divulge.

It’s easy to become complacent when we are not doing all the legwork. We are reminded under the terms of use that “AI can make mistakes”, but I wonder what will happen if models start consuming AI-generated erroneous data. Just as with other data-analytics problems, AI suffers from the old adage of “garbage in, garbage out”.

But sometimes I fear it’s even worse than that. We’ll need a collective vigilance to avoid AI being turned into “garbage in, garbage squared”.

The post Garbage in, garbage out: why the success of AI depends on good data appeared first on Physics World.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI 生成式AI GenAI 数据质量 AI偏见 AI风险 Artificial Intelligence Data Quality AI Bias AI Risks
相关文章