SkyRL tx v0.1.0：本地GPU集群上的Tinker兼容强化学习引擎

How can AI teams run Tinker style reinforcement learning on large language models using their own infrastructure with a single unified engine? Anyscale and NovaSky (UC Berkeley) Team releases SkyRL tx v0.1.0 that gives developers a way to run a Tinker compatible training and inference engine directly on their own hardware, while keeping the same minimal API that Tinker exposes in the managed service.

The research team describes SkyRL tx as a unified training and inference engine that implements the Tinker API and allows people to run a Tinker like service on their own infrastructure. This v0.1.0 version is the first of its series that supports reinforcement learning end to end, and it also makes sampling significantly faster.

Tinker API in brief

Tinker from Thinking Machines is a training API built around four core functions. forward_backward performs a forward pass and a backward pass and accumulates gradients. optim_step updates model weights based on those gradients. sample generates tokens for interaction, evaluation or RL actions. save_state writes checkpoints for resuming training.

Instead of a full task specific fine tuning abstraction, Tinker exposes these low level primitives so that users can implement their own supervised or reinforcement learning loops in regular Python code, while the service handles GPU scheduling and distributed execution.

SkyRL tx targets this exact API and implements an open backend that users can deploy locally. It keeps the Tinker programming model, while removing the need to rely only on the hosted environment.

Where SkyRL tx fits inside SkyRL

SkyRL is a full stack reinforcement learning library for large language models that includes skyrl-agent for long horizon agents, skyrl-train for training, and skyrl-gym for tool use environments such as math, coding, search and SQL.

Within this stack, skyrl-tx is marked as an experimental cross platform library that exposes a local Tinker like REST API for model post training. SkyRL tx therefore becomes the system layer that connects RL logic, environments and training code to concrete GPU resources through the Tinker interface.

Architecture, inference engine that also trains

The SkyRL tx architecture is described as an inference engine that also supports backward passes. It has four main components:

REST API server

Database

Engine

Worker

What v0.1.0 adds?

The v0.1.0 release focuses on reinforcement learning support and performance improvements. The official release highlights several concrete changes:

Sampling is now much faster, since it is jitted and properly batched and sharded in the engine.Different sampling parameters per request, per request seeds and stop tokens are now supported, which is useful when many experiments share a base model.After several fixes, the RL loop now runs properly through the engine.Gradient checkpointing support and micro batching for sampling are implemented.Postgres is now supported as a database backend, next to SQLite.

Running RL end to end on 8 H100 GPUs

The official release contains a specific code recipe for running reinforcement learning end to end on a cluster with 8 H100 GPUs.

First, users clone the SkyRL repository and in the skyrl-tx folder start the engine with:

Copy CodeCopiedUse a different Browser

uv run --extra gpu --extra tinker -m tx.tinker.api \  --base-model Qwen/Qwen3-4B \  --max-lora-adapters 3 \  --max-lora-rank 1 \  --tensor-parallel-size 8 \  --train-micro-batch-size 8 > out.log

Then they clone the Tinker Cookbook from the Thinking Machines team and in the tinker_cookbook/recipes folder run:

export TINKER_API_KEY=dummyexport WANDB_API_KEY=<your key>uv run --with wandb --with tinker rl_loop.py \ base_url=http://localhost:8000 \ model_name="Qwen/Qwen3-4B" \ lora_rank=1 \ max_length=1024 \ save_every=100

Tinker API in brief

Where SkyRL tx fits inside SkyRL

Architecture, inference engine that also trains

What v0.1.0 adds?

Running RL end to end on 8 H100 GPUs

Key Takeaways

Editorial Comments

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签