STaMP量化：提升低比特宽度激活量化精度

cs.AI updates on arXiv.org 10月31日 12:09

STaMP量化：提升低比特宽度激活量化精度

本文提出了一种名为STaMP的量化方法，通过在序列维度应用线性变换，保持中间激活的高精度，以提升低比特宽度激活量化的模型精度。

arXiv:2510.26771v1 Announce Type: cross Abstract: Quantization is the key method for reducing inference latency, power and memory footprint of generative AI models. However, accuracy often degrades sharply when activations are quantized below eight bits. Recent work suggests that invertible linear transformations (e.g. rotations) can aid quantization, by reparameterizing feature channels and weights. In this paper, we propose \textit{Sequence Transformation and Mixed Precision} (STaMP) quantization, a novel strategy that applies linear transformations along the \textit{sequence} dimension to exploit the strong local correlation in language and visual data. By keeping a small number of tokens in each intermediate activation at higher precision, we can maintain model accuracy at lower (average) activations bit-widths. We evaluate STaMP on recent LVM and LLM architectures, demonstrating that it significantly improves low bit width activation quantization and complements established activation and weight quantization methods including recent feature transformations.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

量化低比特宽度激活量化 STaMP 模型精度

相关文章

The Next Big Trends in Large Language Model (LLM) Research

使用PEFT库进行ChatGLM3-6B模型的QLORA高效微调

基金经理业绩不好，确实可以批评，但以此来否定他的研究，甚至人身攻击，是有失偏颇的。在A股做主观投资，是一门艺术，而不是科学，有学识不一定能赚钱，反而在A...

交易难，难于上青天。早盘集合竞价，大众交通这种当红炸子鸡点的股，资金开始集合竞价加丹引诱量化，量化真就开盘突突了一大片无人驾驶的个股。看起来一片热热闹...

roots-4 - Track your digital dopamine, break your phone addiction

乡亲们，过分了哈！跌的时候天天骂转融通和量化，不停转融通和量化，坚决不入场。现在转融通暂停了，融券保证金提高，量化也在增本降速，这一系列措施中翻中就是...

Q-GaLore Released: A Memory-Efficient Training Approach for Pre-Training and Fine-Tuning Machine Learning Models

$上证指数(SH000001)$ $沪深300ETF(SH510300)$ 当大家以为会议开完，郭嘉不再护盘的时候，郭嘉队反而比前几天更大力度地护盘。从沪深300ETF的分时线来看，今天起...

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Accelerate LLM Inference

Llama对决GPT：AI开源拐点已至?