MarkTechPost@AI 前天 07:22
SkyRL tx v0.1.0:本地GPU集群上的Tinker兼容强化学习引擎
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Anyscale和UC Berkeley的NovaSky团队发布了SkyRL tx v0.1.0,一个支持Tinker风格强化学习的本地引擎。该引擎允许AI团队在自有基础设施上运行训练和推理,同时保持与Tinker托管服务相同的最小API。新版本v0.1.0是该系列的首个版本,全面支持端到端的强化学习,并显著提升了采样速度。它通过REST API暴露Tinker的底层原语(forward_backward、optim_step、sample、save_state),并在本地处理GPU调度、分布式执行、批处理和LoRA适配器管理,为开发者提供了更大的灵活性和控制力。

🚀 SkyRL tx v0.1.0 实现了本地化的、兼容Tinker的引擎,统一了大型语言模型(LLM)训练和推理的流程。该引擎通过REST API暴露了Tinker的核心原语(forward_backward、optim_step、sample、save_state),并在内部处理批处理、LoRA适配器和设备放置,使得开发者可以在自己的硬件上运行Tinker风格的强化学习。

🔧 SkyRL tx 的架构设计精巧,将系统分解为REST API服务器、SQL数据库(支持SQLite和Postgres)、调度引擎以及执行前向和后向传播的Worker。这种设计允许一个引擎实例服务于单个基础模型,并附加多个LoRA适配器,为高效的模型训练和推理奠定了基础。

⚡️ v0.1.0 版本引入了多项关键改进,包括端到端的强化学习支持、更快的JIT编译和分片采样、每个请求的采样参数设置、梯度检查点以及微批处理。这些优化显著提升了性能,并为运行复杂的RL实验提供了必要的功能支持。

💡 该发布版本提供了一个在8个H100 GPU集群上端到端运行强化学习的实例代码。通过简单的克隆和配置,用户可以快速启动本地SkyRL tx后端,并通过观察奖励曲线来验证RL循环的正确运行,这为开发者提供了便捷的实践路径。

How can AI teams run Tinker style reinforcement learning on large language models using their own infrastructure with a single unified engine? Anyscale and NovaSky (UC Berkeley) Team releases SkyRL tx v0.1.0 that gives developers a way to run a Tinker compatible training and inference engine directly on their own hardware, while keeping the same minimal API that Tinker exposes in the managed service.

The research team describes SkyRL tx as a unified training and inference engine that implements the Tinker API and allows people to run a Tinker like service on their own infrastructure. This v0.1.0 version is the first of its series that supports reinforcement learning end to end, and it also makes sampling significantly faster.

Tinker API in brief

Tinker from Thinking Machines is a training API built around four core functions. forward_backward performs a forward pass and a backward pass and accumulates gradients. optim_step updates model weights based on those gradients. sample generates tokens for interaction, evaluation or RL actions. save_state writes checkpoints for resuming training.

Instead of a full task specific fine tuning abstraction, Tinker exposes these low level primitives so that users can implement their own supervised or reinforcement learning loops in regular Python code, while the service handles GPU scheduling and distributed execution.

SkyRL tx targets this exact API and implements an open backend that users can deploy locally. It keeps the Tinker programming model, while removing the need to rely only on the hosted environment.

Where SkyRL tx fits inside SkyRL

SkyRL is a full stack reinforcement learning library for large language models that includes skyrl-agent for long horizon agents, skyrl-train for training, and skyrl-gym for tool use environments such as math, coding, search and SQL.

Within this stack, skyrl-tx is marked as an experimental cross platform library that exposes a local Tinker like REST API for model post training. SkyRL tx therefore becomes the system layer that connects RL logic, environments and training code to concrete GPU resources through the Tinker interface.

Architecture, inference engine that also trains

The SkyRL tx architecture is described as an inference engine that also supports backward passes. It has four main components:

    REST API server that processes incoming requests from different users.Database that tracks metadata about models, checkpoints, requests and futures, and also acts as a job queue. The current implementation uses SQLite behind an interface that also supports other SQL databases such as Postgres.Engine that schedules and batches requests across users. Each engine instance serves a single base model and can attach many LoRA adapters.Worker that executes forward and backward passes and holds model definitions and optimizer states. Multiple workers would be enabling more advanced multi node sharding in upcoming versions

What v0.1.0 adds?

The v0.1.0 release focuses on reinforcement learning support and performance improvements. The official release highlights several concrete changes:

Running RL end to end on 8 H100 GPUs

The official release contains a specific code recipe for running reinforcement learning end to end on a cluster with 8 H100 GPUs.

First, users clone the SkyRL repository and in the skyrl-tx folder start the engine with:

uv run --extra gpu --extra tinker -m tx.tinker.api \  --base-model Qwen/Qwen3-4B \  --max-lora-adapters 3 \  --max-lora-rank 1 \  --tensor-parallel-size 8 \  --train-micro-batch-size 8 > out.log

Then they clone the Tinker Cookbook from the Thinking Machines team and in the tinker_cookbook/recipes folder run:

export TINKER_API_KEY=dummyexport WANDB_API_KEY=<your key>uv run --with wandb --with tinker rl_loop.py \  base_url=http://localhost:8000 \  model_name="Qwen/Qwen3-4B" \  lora_rank=1 \  max_length=1024 \  save_every=100

This produces a reward curve that confirms the RL loop runs correctly through the local SkyRL tx backend.

Key Takeaways

Editorial Comments

SkyRL tx v0.1.0 is a practical step for dev teams that want Tinker style reinforcement learning on their own clusters with a consistent Tinker API surface. The design that treats the system as an inference engine that also runs backward passes is clean and reduces stack divergence. Support for LoRA, gradient checkpointing, micro batching and Postgres is a concrete systems upgrade. Overall, this release turns Tinker compatibility into an actionable local RL backend for LLM


Check out the Repo and Official Release. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Anyscale and NovaSky Team Releases SkyRL tx v0.1.0: Bringing Tinker Compatible Reinforcement Learning RL Engine To Local GPU Clusters appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SkyRL tx Reinforcement Learning Large Language Models LLM Tinker API Anyscale NovaSky UC Berkeley GPU Clusters AI Infrastructure Machine Learning Deep Learning Open Source v0.1.0 Local Deployment Inference Engine Training Engine LoRA Gradient Checkpointing Micro Batching
相关文章