Communications of the ACM - Artificial Intelligence 10月02日
AI助力解码动物交流:从鲸鱼到大象
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

人工智能正被应用于解码动物的交流方式,特别是利用大型语言模型(LLMs)分析它们的叫声和行为。以鲸鱼翻译计划(Project CETI)为例,科学家们正在使用机器学习技术解析抹香鲸的咔嗒声序列(codas),发现了其复杂的结构和潜在的“语音字母表”,远超以往认知。此外,研究还扩展到大象、猴子和乌鸦等物种。通过分析声音的节奏、音高、变化以及与行为的关联,AI模型能够识别个体间的交流,甚至预测动物的行为。虽然完全理解动物语言仍是挑战,但AI的进步正以前所未有的方式揭示动物世界的沟通奥秘。

🐋 **AI驱动的动物语言破译新进展:** 人工智能,特别是大型语言模型(LLMs),正被用于分析和理解动物的交流模式。通过识别声音中的规律性,例如抹香鲸的咔嗒声序列(codas),研究人员能够区分不同的“方言”,并发现其内部结构的复杂性,揭示了远超以往的交流能力。

🗣️ **发现动物的“语音字母表”与交流变量:** 对抹香鲸咔嗒声的研究不仅识别了其节奏和速度,还发现了“rubato”(音符时值变化)和“ornamentation”(额外咔嗒声)等变量,这些变量组合起来可以产生数千种不同的codas。研究人员甚至发现了类似于人类“音素”的“语音字母表”,表明动物发声具有更精细的结构。

🐘 **AI在识别个体交流中的应用:** 除了鲸鱼,AI也被用于研究其他动物,例如使用随机森林算法识别非洲象的“名字”。通过学习声音的声学特征,AI能够准确预测特定叫声是针对哪只大象发出的,这有力地证明了动物之间存在个体化的称呼方式。

🌐 **无监督学习与翻译的可能性:** 研究人员正在探索使用无监督机器学习技术,如生成对抗网络(GANs)和无监督机器翻译,来解码动物的交流。虽然目前AI尚不能完全理解动物的意图,但通过模拟人类学习语言的过程,AI正在帮助科学家识别声音中可能承载意义的模式,并为未来的翻译研究奠定基础。

Artificial intelligence (AI) has been making great strides in generating and translating human language. Large language models (LLMs) have quickly moved beyond simply dealing with human speech to recognizing other patterns that convey information, from DNA sequences to computer code. Now some scientists are cocking the computer’s metaphorical ear to animal vocalizations, hoping to discover if other creatures actually speak to each other in a way that might be recognizable to humans and, if so, what they are saying.

That is the idea behind the Cetacean Translation Initiative (Project CETI), which is using machine learning to try to decode the vocalizations of sperm whales. Other researchers are studying communication in species including elephants, monkeys, and crows. They are using the pattern recognition capabilities of AI to sort a cacophony of caws and rumbles and chatter into individual units that may carry meaning, and then trying to match those units with behavioral observations to determine what those meanings might be.

The research is still in its early days, trying to uncover fundamental building blocks of animal communications systems and attach meanings to the sounds those creatures make. Scientists have started out with relatively simple machine learning techniques such as classifiers to identify individual units of language and are quickly moving to more sophisticated systems, such as deep neural networks, hoping to figure out what animals might be talking about.

Project CETI, for instance, has uncovered new details about the sequences of clicks that sperm whales make. Those sequences, known as codas, range from three to 40 clicks long and vary slightly between different social groups. Essentially, the groups can be distinguished by their different coda dialects, like the difference between an American and a British accent.

Shane Gero, a whale biologist at Carleton University in Ottawa, Canada, has spent the past two decades in the waters off the coast of Dominica in the Caribbean, observing more than 30 families of whales and recording their calls. Over the years, he and his team created spectrograms of the calls—graphic representations that allow them to visualize acoustic features such as frequency and volume—and labeled them by hand, a time-consuming process. Now, as part of Gero’s collaboration with Project CETI, computer scientists at the Massachusetts Institute of Technology (MIT) have used his labeled data and supervised learning to train a model that can annotate new data more quickly and separate the calls according to which whale is making them.

It turned out that the whales’ vocalizations were more sophisticated than had previously been thought. Past research had identified 21 codas in Caribbean sperm whales, and a total of approximately 150 worldwide. The whales’ observed behavior, however, was too complex to be covered by a relatively simple communication system with a small, fixed set of messages.

The CETI researchers discovered that, in addition to the rhythm and tempo that they knew about, the codas had two other features that could vary, which they named rubato and ornamentation. ‘Rubato’ involves subtle changes in the intervals between clicks. ‘Ornamentation’ is the occasional addition of an extra click. With four variables in different combinations, whales can produce a large set of different codas; the researchers identified 8,719 distinct ones.

Researchers also found what they are calling a phonetic alphabet for the whales, similar to the set of phonemes (basic units of sound) that humans use to build words and sentences. “The internal structure mimics in some ways aspects of phonology in human languages, where you have these different ways of putting together bits of the vocal apparatus to make a large set of sounds,” said Jacob Andreas, an associate professor in the Computer Science and Artificial Intelligence Laboratory at MIT, who participated in the research.

Humans are able to build up their basic units of sound into an unlimited array of meanings, said Pratyusha Sharma, a Ph.D. student at MIT and lead author of the paper reporting the phonetic alphabet. “We don’t yet know if whales can create an infinite space of meanings,” she said. “What we do know is they’re a lot more expressive than what was believed one year ago.”

GANs, Trees, and LLMs

This particular paper used fairly basic machine learning tools, such as a Gaussian mixture model, a probabilistic algorithm used to group data points into clusters. Researchers are already working, however, on applying more sophisticated AI to both the data they have and new data that they are continuing to collect. To figure out whether the whales’ vocalizations can convey information, Gero and the MIT team are working on what they call WhaleLM, a type of large language model. Like the LLMs that power chatbots or write computer code, WhaleLM learns the patterns underlying the whales’ calls and then uses what it has learned to predict what should come next.

In a preprint that has not yet been peer-reviewed, the researchers looked at whether they could use the sequence of codas being generated by the LLM to predict the current behavior and future actions of the whales, such as diving, and found that they could, with an accuracy of 72% for behavior and 86% for future actions. They trained the model on codas and looked at how accurately they predicted the next coda in a sequence. To test which aspects of the sounds might convey meaning, they trained new models with shorter sequences of codas, different orders of codas, or codas in which they changed the rhythm, tempo, rubato, or ornamentation one at a time. They found that any of these actions made the predictions less accurate. The authors said the study provides the first evidence that the whales’ vocalizations do indeed contain information that the creatures can act on.

Other researchers have used older AI tools to learn about animal calls. Mickey Pardo, a post-doctoral researcher in behavioral ecology at Cornell University, used a random forest algorithm to predict which individual animal in a group of African elephants a particular call was addressing. Once it had learned the acoustic features of those calls, the algorithm was able to examine a fresh set of calls and predict the elephant for which they were intended. In essence, the evidence shows that the elephants address each other by name, Pardo said.

Gasper Begus, an associate professor at the University of California, Berkeley, whose work covers linguistics, AI, and cognitive science, uses generative adversarial networks (GANs) to discover the building blocks of communication in unknown languages. His neural networks try to learn speech the way a human baby does, by listening and imitating. One part of the GAN, the generator, tries to produce new sounds that seem like they were generated by humans, and another part, the discriminator, decides whether the speech is real or fake. A separate neural network makes sure the sounds produced are not just random words but actually carry information.

Begus and his colleagues have developed techniques to look inside the neural layers of the GANs to determine which features the network identified as important to creating speech. “The model learns that there are sounds in languages, it learns words, it learns all sorts of meaningful things without any supervision,” he said.

The linguistics lead at Project CETI, Begus applied his AI model to sperm whale vocalizations. The model identified aspects of the calls that had previously been identified as meaningful, such as the number of clicks in a coda. It also identified other patterns that had not been seen as carrying meaning. Those patterns, reported in a preprint, seem to be equivalent to the vowels in human speech.

Though the model can identify acoustic properties that animals may use to convey meaning, it cannot say what that meaning is. Figuring that out requires observing how animals react to the sounds. In cases such as the elephant name study, that can mean playing back the sound the computer has identified as containing a name and seeing if the animal responds as expected.

David Gruber, the founder and president of Project CETI, and Shafi Goldwasser, a computer scientist at MIT and 2012 recipient of the ACM A.M. Turing Award, think unsupervised machine translation may be able to help scientists figure out what whales or other animals are saying, if indeed they have something resembling a language. They developed two stylized models and fed them with synthetic data and used that to establish the bounds on what sample data would have to look like for translation to work.

Gruber, a professor of biology and environmental sciences at Baruch College, City University of New York, said they showed that such translation has a better chance of working when whales and humans have similar concepts, such as family, food, or swimming. Where their experiences do not overlap with humans—these are, after all, animals the size of a couple of school buses that spend hours at ocean depths where light does not penetrate—translation may be more of a challenge. They also showed that the more complex the language the computer was dealing with, the lower its error rate would be.

None of the researchers claim they will be able to actually talk to animals. It is difficult to even ask if animals have language because there is no consensus on what constitutes a language, Begus said. What is known, he said, is “If you look closely, there are many properties of language that other animals have as well.”

Gruber hopes that as researchers collect more data and as their AI models grow more sophisticated, they may eventually be able to grasp the meaning of animal calls. “We see ourselves as baby whales now,” he said, “and we’re just kind of understanding the basic fundamentals of the communication system.”

Further Reading

  • Sharma, P., Gero, S., Payne, R. et al.
    Contextual and combinatorial structure in sperm whale vocalisations. Nat Commun, 2024, 10.1038/s41467-024-47221-8
  • Pardo, M.A., Fristrup, K., Lolchuragi, D.S. et al.
    African elephants address one another with individually specific name-like calls. Nat Ecol Evol, 2024, 10.1038/s41559-024-02420-w
  • Sharma, P., Gero, S., Rus, D. et al.
    WhaleLM: Finding Structure and Information in Sperm Whale Vocalizations and Behavior with Machine Learning, bioRxiv preprint, 2024,  10.1101/2024.10.31.621071
  • Goldwasser, S., Gruber, D., Kalai, A.T., and Paradise, O.
    A Theory of Unsupervised Translation Motivated by Understanding Animal Communication, 37th Conference on Neural Information Processing Systems (NeurIPS), 2023, https://proceedings.neurips.cc/paper_files/paper/2023/file/7571c9d44179c7988178593c5b62a9b6-Paper-Conference.pdf
  • Modeling speech recognition and synthesis simultaneously https://www.youtube.com/watch?v=BTg6upDjUyw
  • A whale of a tale: How scientists are decoding the language of sperm whales https://www.youtube.com/watch?v=_8DnreYuddE

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 动物交流 大型语言模型 抹香鲸 鲸鱼翻译计划 AI Animal Communication LLMs Sperm Whales Project CETI
相关文章