少点错误 10月09日 23:32
Tinker API:解锁无需直接访问模型进行AI研究的新方式
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Thinking Machines推出的Tinker API为开源大语言模型(LLM)的微调和推理提供了一种独特的方法。该API允许研究人员在不直接访问模型权重的情况下进行多种机器学习研究,这对于AI安全研究具有重要意义,例如简化强化学习实验。更重要的是,它有助于AI公司减少研究人员(包括人类和AI)对敏感模型权重的访问,从而降低模型权重泄露和非法部署的风险。Tinker API提供了一个低级别的接口,支持初始化LoRA、运行前向/后向传播、执行优化步骤、从模型采样以及下载LoRA,这使得实现SFT、DPO、RL等多种训练算法成为可能,并能通过批量处理请求大幅提升效率。

💡 **Tinker API的创新设计**:Tinker API提供了一个低级别的接口,允许研究人员在不直接访问模型权重的情况下进行机器学习研究。用户可以初始化LoRA、运行前向/后向传播、执行优化步骤、从模型采样以及下载LoRA,这为实现SFT、DPO、RL等多种训练算法提供了灵活性,并通过批量处理请求大幅提升了效率。

🛡️ **增强AI安全与控制**:该API通过限制研究人员对模型权重的直接访问,有效降低了模型权重泄露、非法部署以及未监控训练和推理的风险。这对于防范内部威胁(无论是来自人类还是AI代理)都至关重要,能够更集中地关注和管理少数需要更高级别访问的危险情况。

🔬 **推动广泛的ML研究**:Tinker API使得绝大多数在AI前沿公司进行的机器学习研究,如强化学习算法、RL环境探索、训练对模型“个性”的影响等,都可以在无需访问危险模型权重的情况下完成。即使是需要更细粒度访问但不需要危险模型权重的研究(如性能工程、预训练、量化研究、优化器研究),也能通过使用小型模型或早期检查点来完成。

Published on October 9, 2025 3:22 PM GMT

Last week, Thinking Machines announced Tinker. It’s an API for running fine-tuning and inference on open-source LLMs that works in a unique way. I think it has some immediate practical implications for AI safety research: I suspect that it will make RL experiments substantially easier, and increase the number of safety papers that involve RL on big models.

But it's also interesting to me for another reason: the design of this API makes it possible to do many types of ML research without direct access to the model you’re working with. APIs like this might allow AI companies to reduce how many of their researchers (either human or AI) have access to sensitive model weights, which is good for reducing the probability of weight exfiltration and other rogue deployments.

(Thinking Machines gave us early access to the product; in exchange, we gave them bug reports and let them mention our name in their announcement blog post. It was helpful to us that they gave us this access, and we expect to use it substantially going forward. I don't have a strong opinion about whether this product is overall good or bad for the world; note that my title claims an evidential rather than causal effect 🙂.)

How the Tinker API is different

Previous LLM training APIs, e.g. the OpenAI SFT and RL APIs, let you submit training data, and then train the LLMs using your data but using an algorithm that they implemented themselves. This is great if you want to use exactly the algorithm that they implemented. But it's inflexible.

The Tinker API instead gives you a lower level API that is flexible enough to allow you to implement a wide variety of training algorithms, and they have an open source library that implements many important training algorithms on top of their API.

Specifically, their API allows you to:

This is expressive enough that you can use it to implement most of the algorithms that are typically used for training models: SFT, DPO, RL algorithms, etc.

One important insight behind this library is that because you’re just training LoRAs rather than full models, you can efficiently batch different users’ requests together. This is a huge efficiency boost. Of course, it only makes sense if LoRA fine-tuning works for your usecase, which is presumably why Thinking Machines put out a blog post the other day arguing that LoRA fine-tuning is just as good as full-model fine-tuning for many common use cases.

Why this is good for AI security and control

The great thing about this API from a security perspective is that it doesn't give users direct access to the models they’re working with.

My informed guess is that 90% of researchers at frontier AI developers whose work involves fine-tuning (which is a large fraction of them) do it by writing scripts that are executed on machines that have access to model weights. This leads to several huge security risks that would be mitigated if they instead used this interface.

Firstly, your researchers normally have access to the model architecture. This is unfortunate if you want to avoid it leaking. It's not clear how important this is. My sense is that changes to model architecture have been a minority of the algorithmic improvement since the invention of the transformer.

Secondly and more importantly, your researchers normally have access to the weights of the model. Because they are running Python programs that load model weights and then do arbitrary computations on them, it’s totally doable for them to save a copy of the weights on the machine storage. They can then move those weights around with impunity. They might then steal the model weights.

(Unfortunately for the attacker, the weights might be larger than other files they need to work with. But it's actually pretty typical to produce some seriously big files when you're doing ML research, and the researcher might be able to split the weights into multiple fragments and steal them one at a time.)

They’re also able to do unmonitored training and inference. For example, if your researchers can directly implement inference on model weights on their research server, you can’t ensure that the outputs of your most powerful model are always logged. They are also able to arbitrarily fine-tune models and run inference on those fine-tuned models. This could go wrong in several ways:

(See here for more concrete pictures of how model weight access could lead to rogue deployments.)

I phrased all the above in terms of mitigating insider threat from humans, but all of it transfers to mitigating risk from misaligned AI agents. (See here for more on the analogy between insider threat from humans and insider threat from AIs.)

Of course, there can still be security issues with this interface:

I think that forcing researchers to use APIs like this instead of direct model access would be a notable improvement for AI security and control. Currently, for most researchers at AI companies it would be fairly easy to misuse model weights, trivial to learn the model architecture, and plausibly doable to steal the model weights if they really tried and were fairly creative about it. This change would make all those attacks much harder.

We’ve already seen that researchers are thrilled to run model inference using a centralized inference server instead of hosting inference themselves, which is similarly helpful for security, though much more restrictive. Historically, training APIs have been too inflexible for me to think that researchers would want to use them if they had the opportunity of directly training; I think this API is flexible enough that researchers will want to use it even given model access. So learning about this API design updated me moderately towards thinking it would be more possible to secure model weights and architectures from malevolent insiders.

A lot of ML research can be done without direct access to dangerous model weights

I think that, if you have access to this interface, the majority of ML research that is done at frontier AI companies can be done without the researcher having direct access to dangerous model weights (which we could operationalize as “model weights from models more capable than available today”).

A simple breakdown of research that can be done with these restricted permissions:

The main class of projects that need granular model weight access to frontier models is model internals/interpretability.

So overall, I think it's plausible that the majority, perhaps the vast majority, of ML research and engineering inside AI companies might be doable without model weight access to powerful LLMs. I'm not sure how I expect this to change over time as the quantity of research labor increases.

One reason to be more pessimistic is that this API might be inherently less computationally efficient than a more direct implementation of fine-tuning. I don't know whether this is the case. The main argument that it is less efficient is that it requires a couple of network round trips and is generally less flexible; this is exacerbated by my argument here that AI automation of AI R&D will probably lead to ML researchers putting more labor into computational efficiency. There's also a case that this API is more efficient because of the possibility of batching requests from different users; an argument for this is that I expect future AI researchers will mostly use internal inference APIs rather than deploying inference themselves.

Of course, in both security and AI control, even if you can’t prevent your insiders from ever needing a dangerous permission, you prefer that permission to be required less often, because you can concentrate your attention more on those rare dangerous situations. So even if 10% of your researchers needed to have direct model weight access rather than using this interface, that would still be a big win.

Conclusions

Overall, learning about the design of this API, and observing that Thinking Machines successfully implemented it, has made me moderately more optimistic about securing model weights against both human and AI insider threats when the humans or AIs are doing research. I think it would be beneficial for the security of AI companies if they implemented services like this for internal use, and I hope they do so.

To be clear, there are many other sources of insider threat aside from AI agents stealing model weights while doing ML R&D. The AI agents will probably be used to write security-critical code, to generate huge amounts of synthetic data that might be poisoned, to produce external artifacts like military robot software that might be sabotaged, to work with secrets that we don’t want leaked, and to answer important questions where adversarially wrong answers might be very dangerous. So this isn't a big update for me about the importance of AI control research.

This is also an unusually clear-to-me example of me not knowing something important about the future of AI security that many AI company staff knew. I didn't know that there existed a design as good as this one for allowing researchers to do lots of ML research flexibly without having direct model access.

  1. ^


Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Tinker API AI安全 模型微调 LLM 机器学习研究 内部威胁 Tinker API AI Safety Model Fine-tuning LLM Machine Learning Research Insider Threats
相关文章