通用代码回归模型预测性能

Researchers from Cornell and Google introduce a unified Regression Language Model (RLM) that predicts numeric outcomes directly from code strings—covering GPU kernel latency, program memory usage, and even neural network accuracy and latency—without hand-engineered features. A 300M-parameter encoder–decoder initialized from T5-Gemma achieves strong rank correlations across heterogeneous tasks and languages, using a single text-to-number decoder that emits digits with constrained decoding.

What exactly is new?

Unified code-to-metric regression

Concrete results

Spearman ρ ≈ 0.93

ρ ≈ 0.52

ρ > 0.5

17 CodeNet languages

Kendall τ ≈ 0.46

Multi-objective decoding

https://arxiv.org/abs/2509.26476

Why is this important?

Performance prediction pipelines in compilers, GPU kernel selection, and NAS typically rely on bespoke features, syntax trees, or GNN encoders that are brittle to new ops/languages. Treating regression as next-token prediction over numbers standardizes the stack: tokenize inputs as plain text (source code, Triton IR, ONNX), then decode calibrated numeric strings digit-by-digit with constrained sampling. This reduces maintenance cost and improves transfer to new tasks via fine-tuning.

Data and benchmarks

Code-Regression dataset (HF)

code-to-metric

NAS/ONNX suite

ONNX text

How does it work?

Backbone

T5-Gemma

sign/exponent/mantissa digit tokens

Ablations

decoder-only numeric emission

Training code.

regress-lm

Stats that matters

APPS (Python) memory:

ρ > 0.9

CodeNet (17 languages) memory:

ρ > 0.5

Triton kernels (A6000) latency:

ρ ≈ 0.52

NAS ranking:

Kendall τ ≈ 0.46

Key Takeaways

Code-Regression

regress-lm

Our Comments

It is very interesting how this work reframes performance prediction as text-to-number generation: a compact T5Gemma-initialized RLM reads source (Python/C++), Triton kernels, or ONNX graphs and emits calibrated numerics via constrained decoding. The reported correlations—APPS memory (ρ>0.9), Triton latency on RTX A6000 (~0.52), and NAS Kendall-τ ≈0.46—are strong enough to matter for compiler heuristics, kernel pruning, and multi-objective NAS triage without bespoke features or GNNs. The open dataset and library make replication straightforward and lower the barrier to fine-tuning on new hardware or languages.

Check out the Paper, GitHub Page and Dataset Card. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Can a Small Language Model Predict Kernel Latency, Memory, and Model Accuracy from Code? A New Regression Language Model (RLM) Says Yes appeared first on MarkTechPost.

What exactly is new?

Why is this important?

Data and benchmarks

How does it work?

Stats that matters

Key Takeaways

Our Comments

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签