MIT News - Artificial intelligence 09月25日
AI数据存储革新,Cloudian助力企业高效处理海量数据
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

人工智能正改变企业数据存储和访问方式。传统存储系统为少量用户设计,难以满足AI系统对海量并行数据访问的需求。Cloudian公司通过并行计算技术,将AI功能和数据整合至单一平台,实现存储与GPU、CPU的高速直连,简化AI工具开发流程。该平台采用对象存储架构,支持实时数据处理,并嵌入AI功能,使企业能将计算近数据,显著提升AI性能并降低成本。

🔍 Cloudian通过并行计算技术革新数据存储,将AI功能与数据整合至单一平台,实现存储与GPU、CPU的高速直连,显著提升AI数据处理效率。

📊 该平台采用对象存储架构,支持海量非结构化数据管理,并内置实时数据处理能力,使企业能在数据采集存储点直接进行AI计算,消除传统存储系统中的数据传输延迟。

🔗 Cloudian与NVIDIA合作,其存储系统可直接与NVIDIA GPU协同工作,实现AI模型与存储的高速交互,进一步降低AI计算成本,加速AI应用部署。

🌐 云计算时代,数据增长速度远超网络传输能力,Cloudian将计算能力嵌入存储系统,实现'云向数据'而非传统'数据向云'的架构,解决数据移动成本高昂的问题。

🧬 Cloudian平台已应用于制造业、医疗、科研等领域,例如通过AI预测机器人维护需求、存储DNA序列等,帮助企业在海量数据中挖掘AI价值。

Artificial intelligence is changing the way businesses store and access their data. That’s because traditional data storage systems were designed to handle simple commands from a handful of users at once, whereas today, AI systems with millions of agents need to continuously access and process large amounts of data in parallel. Traditional data storage systems now have layers of complexity, which slows AI systems down because data must pass through multiple tiers before reaching the graphical processing units (GPUs) that are the brain cells of AI.

Cloudian, co-founded by Michael Tso ’93, SM ’93 and Hiroshi Ohta, is helping storage keep up with the AI revolution. The company has developed a scalable storage system for businesses that helps data flow seamlessly between storage and AI models. The system reduces complexity by applying parallel computing to data storage, consolidating AI functions and data onto a single parallel-processing platform that stores, retrieves, and processes scalable datasets, with direct, high-speed transfers between storage and GPUs and CPUs.

Cloudian’s integrated storage-computing platform simplifies the process of building commercial-scale AI tools and gives businesses a storage foundation that can keep up with the rise of AI.

“One of the things people miss about AI is that it’s all about the data,” Tso says. “You can’t get a 10 percent improvement in AI performance with 10 percent more data or even 10 times more data — you need 1,000 times more data. Being able to store that data in a way that’s easy to manage, and in such a way that you can embed computations into it so you can run operations while the data is coming in without moving the data — that’s where this industry is going.”

From MIT to industry

As an undergraduate at MIT in the 1990s, Tso was introduced by Professor William Dally to parallel computing — a type of computation in which many calculations occur simultaneously. Tso also worked on parallel computing with Associate Professor Greg Papadopoulos.

“It was an incredible time because most schools had one super-computing project going on — MIT had four,” Tso recalls.

As a graduate student, Tso worked with MIT senior research scientist David Clark, a computing pioneer who contributed to the internet’s early architecture, particularly the transmission control protocol (TCP) that delivers data between systems.

“As a graduate student at MIT, I worked on disconnected and intermittent networking operations for large scale distributed systems,” Tso says. “It’s funny — 30 years on, that’s what I’m still doing today.”

Following his graduation, Tso worked at Intel’s Architecture Lab, where he invented data synchronization algorithms used by Blackberry. He also created specifications for Nokia that ignited the ringtone download industry. He then joined Inktomi, a startup co-founded by Eric Brewer SM ’92, PhD ’94 that pioneered search and web content distribution technologies.

In 2001, Tso started Gemini Mobile Technologies with Joseph Norton ’93, SM ’93 and others. The company went on to build the world’s largest mobile messaging systems to handle the massive data growth from camera phones. Then, in the late 2000s, cloud computing became a powerful way for businesses to rent virtual servers as they grew their operations. Tso noticed the amount of data being collected was growing far faster than the speed of networking, so he decided to pivot the company.

“Data is being created in a lot of different places, and that data has its own gravity: It’s going to cost you money and time to move it,” Tso explains. “That means the end state is a distributed cloud that reaches out to edge devices and servers. You have to bring the cloud to the data, not the data to the cloud.”

Tso officially launched Cloudian out of Gemini Mobile Technologies in 2012, with a new emphasis on helping customers with scalable, distributed, cloud-compatible data storage.

“What we didn’t see when we first started the company was that AI was going to be the ultimate use case for data on the edge,” Tso says.

Although Tso’s research at MIT began more than two decades ago, he sees strong connections between what he worked on and the industry today.

“It’s like my whole life is playing back because David Clark and I were dealing with disconnected and intermittently connected networks, which are part of every edge use case today, and Professor Dally was working on very fast, scalable interconnects,” Tso says, noting that Dally is now the senior vice president and chief scientist at the leading AI company NVIDIA. “Now, when you look at the modern NVIDIA chip architecture and the way they do interchip communication, it’s got Dally’s work all over it. With Professor Papadopoulos, I worked on accelerate application software with parallel computing hardware without having to rewrite the applications, and that’s exactly the problem we are trying to solve with NVIDIA. Coincidentally, all the stuff I was doing at MIT is playing out.”

Today Cloudian’s platform uses an object storage architecture in which all kinds of data —documents, videos, sensor data — are stored as a unique object with metadata. Object storage can manage massive datasets in a flat file stucture, making it ideal for unstructured data and AI systems, but it traditionally hasn’t been able to send data directly to AI models without the data first being copied into a computer’s memory system, creating latency and energy bottlenecks for businesses.

In July, Cloudian announced that it has extended its object storage system with a vector database that stores data in a form which is immediately usable by AI models. As the data are ingested, Cloudian is computing in real-time the vector form of that data to power AI tools like recommender engines, search, and AI assistants. Cloudian also announced a partnership with NVIDIA that allows its storage system to work directly with the AI company’s GPUs. Cloudian says the new system enables even faster AI operations and reduces computing costs.

“NVIDIA contacted us about a year and a half ago because GPUs are useful only with data that keeps them busy,” Tso says. “Now that people are realizing it’s easier to move the AI to the data than it is to move huge datasets. Our storage systems embed a lot of AI functions, so we’re able to pre- and post-process data for AI near where we collect and store the data.”

AI-first storage

Cloudian is helping about 1,000 companies around the world get more value out of their data, including large manufacturers, financial service providers, health care organizations, and government agencies.

Cloudian’s storage platform is helping one large automaker, for instance, use AI to determine when each of its manufacturing robots need to be serviced. Cloudian is also working with the National Library of Medicine to store research articles and patents, and the National Cancer Database to store DNA sequences of tumors — rich datasets that AI models could process to help research develop new treatments or gain new insights.

“GPUs have been an incredible enabler,” Tso says. “Moore’s Law doubles the amount of compute every two years, but GPUs are able to parallelize operations on chips, so you can network GPUs together and shatter Moore’s Law. That scale is pushing AI to new levels of intelligence, but the only way to make GPUs work hard is to feed them data at the same speed that they compute — and the only way to do that is to get rid of all the layers between them and your data.”

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Cloudian AI数据存储 并行计算 对象存储 NVIDIA合作
相关文章