AI数据存储革新，Cloudian助力企业高效处理海量数据

Artificial intelligence is changing the way businesses store and access their data. That’s because traditional data storage systems were designed to handle simple commands from a handful of users at once, whereas today, AI systems with millions of agents need to continuously access and process large amounts of data in parallel. Traditional data storage systems now have layers of complexity, which slows AI systems down because data must pass through multiple tiers before reaching the graphical processing units (GPUs) that are the brain cells of AI.

Cloudian, co-founded by Michael Tso ’93, SM ’93 and Hiroshi Ohta, is helping storage keep up with the AI revolution. The company has developed a scalable storage system for businesses that helps data flow seamlessly between storage and AI models. The system reduces complexity by applying parallel computing to data storage, consolidating AI functions and data onto a single parallel-processing platform that stores, retrieves, and processes scalable datasets, with direct, high-speed transfers between storage and GPUs and CPUs.

Cloudian’s integrated storage-computing platform simplifies the process of building commercial-scale AI tools and gives businesses a storage foundation that can keep up with the rise of AI.

“One of the things people miss about AI is that it’s all about the data,” Tso says. “You can’t get a 10 percent improvement in AI performance with 10 percent more data or even 10 times more data — you need 1,000 times more data. Being able to store that data in a way that’s easy to manage, and in such a way that you can embed computations into it so you can run operations while the data is coming in without moving the data — that’s where this industry is going.”

From MIT to industry

As an undergraduate at MIT in the 1990s, Tso was introduced by Professor William Dally to parallel computing — a type of computation in which many calculations occur simultaneously. Tso also worked on parallel computing with Associate Professor Greg Papadopoulos.

“It was an incredible time because most schools had one super-computing project going on — MIT had four,” Tso recalls.

As a graduate student, Tso worked with MIT senior research scientist David Clark, a computing pioneer who contributed to the internet’s early architecture, particularly the transmission control protocol (TCP) that delivers data between systems.

“As a graduate student at MIT, I worked on disconnected and intermittent networking operations for large scale distributed systems,” Tso says. “It’s funny — 30 years on, that’s what I’m still doing today.”

Following his graduation, Tso worked at Intel’s Architecture Lab, where he invented data synchronization algorithms used by Blackberry. He also created specifications for Nokia that ignited the ringtone download industry. He then joined Inktomi, a startup co-founded by Eric Brewer SM ’92, PhD ’94 that pioneered search and web content distribution technologies.

In 2001, Tso started Gemini Mobile Technologies with Joseph Norton ’93, SM ’93 and others. The company went on to build the world’s largest mobile messaging systems to handle the massive data growth from camera phones. Then, in the late 2000s, cloud computing became a powerful way for businesses to rent virtual servers as they grew their operations. Tso noticed the amount of data being collected was growing far faster than the speed of networking, so he decided to pivot the company.

“Data is being created in a lot of different places, and that data has its own gravity: It’s going to cost you money and time to move it,” Tso explains. “That means the end state is a distributed cloud that reaches out to edge devices and servers. You have to bring the cloud to the data, not the data to the cloud.”

Tso officially launched Cloudian out of Gemini Mobile Technologies in 2012, with a new emphasis on helping customers with scalable, distributed, cloud-compatible data storage.

“What we didn’t see when we first started the company was that AI was going to be the ultimate use case for data on the edge,” Tso says.

Although Tso’s research at MIT began more than two decades ago, he sees strong connections between what he worked on and the industry today.

“It’s like my whole life is playing back because David Clark and I were dealing with disconnected and intermittently connected networks, which are part of every edge use case today, and Professor Dally was working on very fast, scalable interconnects,” Tso says, noting that Dally is now the senior vice president and chief scientist at the leading AI company NVIDIA. “Now, when you look at the modern NVIDIA chip architecture and the way they do interchip communication, it’s got Dally’s work all over it. With Professor Papadopoulos, I worked on accelerate application software with parallel computing hardware without having to rewrite the applications, and that’s exactly the problem we are trying to solve with NVIDIA. Coincidentally, all the stuff I was doing at MIT is playing out.”

Today Cloudian’s platform uses an object storage architecture in which all kinds of data —documents, videos, sensor data — are stored as a unique object with metadata. Object storage can manage massive datasets in a flat file stucture, making it ideal for unstructured data and AI systems, but it traditionally hasn’t been able to send data directly to AI models without the data first being copied into a computer’s memory system, creating latency and energy bottlenecks for businesses.

In July, Cloudian announced that it has extended its object storage system with a vector database that stores data in a form which is immediately usable by AI models. As the data are ingested, Cloudian is computing in real-time the vector form of that data to power AI tools like recommender engines, search, and AI assistants. Cloudian also announced a partnership with NVIDIA that allows its storage system to work directly with the AI company’s GPUs. Cloudian says the new system enables even faster AI operations and reduces computing costs.

“NVIDIA contacted us about a year and a half ago because GPUs are useful only with data that keeps them busy,” Tso says. “Now that people are realizing it’s easier to move the AI to the data than it is to move huge datasets. Our storage systems embed a lot of AI functions, so we’re able to pre- and post-process data for AI near where we collect and store the data.”

AI-first storage

Cloudian is helping about 1,000 companies around the world get more value out of their data, including large manufacturers, financial service providers, health care organizations, and government agencies.

Cloudian’s storage platform is helping one large automaker, for instance, use AI to determine when each of its manufacturing robots need to be serviced. Cloudian is also working with the National Library of Medicine to store research articles and patents, and the National Cancer Database to store DNA sequences of tumors — rich datasets that AI models could process to help research develop new treatments or gain new insights.

“GPUs have been an incredible enabler,” Tso says. “Moore’s Law doubles the amount of compute every two years, but GPUs are able to parallelize operations on chips, so you can network GPUs together and shatter Moore’s Law. That scale is pushing AI to new levels of intelligence, but the only way to make GPUs work hard is to feed them data at the same speed that they compute — and the only way to do that is to get rid of all the layers between them and your data.”

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签