MIT News - Computer Science and Artificial Intelligence Laboratory 09月25日 18:01
MIT研发新框架,平衡AI隐私与性能
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

麻省理工学院的研究人员开发了一种名为PAC Privacy的新框架,旨在解决AI模型在保护用户数据隐私和保持模型准确性之间的权衡难题。该框架通过更高效的计算方式,减少了隐私保护带来的性能损失,并提供了一个通用的模板,可用于几乎任何算法的隐私化处理,而无需深入了解算法内部构造。新方法能够自动估算实现所需隐私级别所需的最小噪声量,并能估计非均匀噪声,从而在保护数据的同时提升模型准确性。研究还发现,更稳定的算法更容易实现隐私化,两者之间可能存在“双赢”局面。

🛡️ **PAC Privacy框架实现隐私与性能的平衡**:MIT研究人员提出的PAC Privacy框架,通过一种新的隐私度量标准,能够在保护用户敏感数据(如医疗图像、财务记录)免受攻击的同时,维持AI模型的高性能,解决了传统方法中隐私保护常以牺牲准确性为代价的难题。

⚙️ **提升效率与通用性**:新版本的PAC Privacy在计算效率上有所提升,并提供了一个四步模板,允许开发者在不接触算法内部机制的情况下,对几乎任何算法进行隐私化处理。这大大降低了实际部署的门槛,使得在各种算法中实现数据隐私成为可能。

📈 **稳定性与隐私化的协同效应**:研究表明,算法的“稳定性”(即训练数据微小变动时预测结果的一致性)与其易于隐私化程度呈正相关。更稳定的算法不仅预测更准确,还能通过PAC Privacy框架以更少的噪声实现更高的隐私保护,形成一种“免费隐私”的获益模式。

📊 **量化噪声与优化隐私**:PAC Privacy能够自动估算实现特定隐私级别所需的最小噪声量。与仅能添加各向同性噪声的旧算法不同,新方法能估计各向异性噪声,这种噪声能根据训练数据特性进行定制,从而在达到相同隐私水平时,使用更少的总噪声,进一步提升模型准确性。

Data privacy comes with a cost. There are security techniques that protect sensitive user data, like customer addresses, from attackers who may attempt to extract them from AI models — but they often make those models less accurate.

MIT researchers recently developed a framework, based on a new privacy metric called PAC Privacy, that could maintain the performance of an AI model while ensuring sensitive data, such as medical images or financial records, remain safe from attackers. Now, they’ve taken this work a step further by making their technique more computationally efficient, improving the tradeoff between accuracy and privacy, and creating a formal template that can be used to privatize virtually any algorithm without needing access to that algorithm’s inner workings.

The team utilized their new version of PAC Privacy to privatize several classic algorithms for data analysis and machine-learning tasks.

They also demonstrated that more “stable” algorithms are easier to privatize with their method. A stable algorithm’s predictions remain consistent even when its training data are slightly modified. Greater stability helps an algorithm make more accurate predictions on previously unseen data.

The researchers say the increased efficiency of the new PAC Privacy framework, and the four-step template one can follow to implement it, would make the technique easier to deploy in real-world situations.

“We tend to consider robustness and privacy as unrelated to, or perhaps even in conflict with, constructing a high-performance algorithm. First, we make a working algorithm, then we make it robust, and then private. We’ve shown that is not always the right framing. If you make your algorithm perform better in a variety of settings, you can essentially get privacy for free,” says Mayuri Sridhar, an MIT graduate student and lead author of a paper on this privacy framework.

She is joined in the paper by Hanshen Xiao PhD ’24, who will start as an assistant professor at Purdue University in the fall; and senior author Srini Devadas, the Edwin Sibley Webster Professor of Electrical Engineering at MIT. The research will be presented at the IEEE Symposium on Security and Privacy.

Estimating noise

To protect sensitive data that were used to train an AI model, engineers often add noise, or generic randomness, to the model so it becomes harder for an adversary to guess the original training data. This noise reduces a model’s accuracy, so the less noise one can add, the better.

PAC Privacy automatically estimates the smallest amount of noise one needs to add to an algorithm to achieve a desired level of privacy.

The original PAC Privacy algorithm runs a user’s AI model many times on different samples of a dataset. It measures the variance as well as correlations among these many outputs and uses this information to estimate how much noise needs to be added to protect the data.

This new variant of PAC Privacy works the same way but does not need to represent the entire matrix of data correlations across the outputs; it just needs the output variances.

“Because the thing you are estimating is much, much smaller than the entire covariance matrix, you can do it much, much faster,” Sridhar explains. This means that one can scale up to much larger datasets.

Adding noise can hurt the utility of the results, and it is important to minimize utility loss. Due to computational cost, the original PAC Privacy algorithm was limited to adding isotropic noise, which is added uniformly in all directions. Because the new variant estimates anisotropic noise, which is tailored to specific characteristics of the training data, a user could add less overall noise to achieve the same level of privacy, boosting the accuracy of the privatized algorithm.

Privacy and stability

As she studied PAC Privacy, Sridhar hypothesized that more stable algorithms would be easier to privatize with this technique. She used the more efficient variant of PAC Privacy to test this theory on several classical algorithms.

Algorithms that are more stable have less variance in their outputs when their training data change slightly. PAC Privacy breaks a dataset into chunks, runs the algorithm on each chunk of data, and measures the variance among outputs. The greater the variance, the more noise must be added to privatize the algorithm.

Employing stability techniques to decrease the variance in an algorithm’s outputs would also reduce the amount of noise that needs to be added to privatize it, she explains.

“In the best cases, we can get these win-win scenarios,” she says.

The team showed that these privacy guarantees remained strong despite the algorithm they tested, and that the new variant of PAC Privacy required an order of magnitude fewer trials to estimate the noise. They also tested the method in attack simulations, demonstrating that its privacy guarantees could withstand state-of-the-art attacks.

“We want to explore how algorithms could be co-designed with PAC Privacy, so the algorithm is more stable, secure, and robust from the beginning,” Devadas says. The researchers also want to test their method with more complex algorithms and further explore the privacy-utility tradeoff.

“The question now is: When do these win-win situations happen, and how can we make them happen more often?” Sridhar says.

“I think the key advantage PAC Privacy has in this setting over other privacy definitions is that it is a black box — you don’t need to manually analyze each individual query to privatize the results. It can be done completely automatically. We are actively building a PAC-enabled database by extending existing SQL engines to support practical, automated, and efficient private data analytics,” says Xiangyao Yu, an assistant professor in the computer sciences department at the University of Wisconsin at Madison, who was not involved with this study.

This research is supported, in part, by Cisco Systems, Capital One, the U.S. Department of Defense, and a MathWorks Fellowship.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

PAC Privacy AI隐私 数据安全 机器学习 MIT 隐私保护 算法隐私化 AI Performance Data Privacy Machine Learning Algorithm Privatization
相关文章