Tempo：结合动态执行与编译优化的深度学习系统

arXiv:2501.05408v2 Announce Type: replace-cross Abstract: Deep learning (DL) algorithms are often defined in terms of \emph{temporal relationships}: a tensor at one timestep may depend on tensors from earlier or later timesteps. Such \emph{dynamic} dependencies (and corresponding dynamic tensor shapes) are difficult to express and optimize: while \emph{eager} DL systems support such dynamism, they cannot apply compiler-based optimizations; \emph{graph-based} systems require static tensor shapes, which forces users to pad tensors or break-up programs into multiple static graphs. We describe Tempo, a new DL system that combines the dynamism of eager execution with the whole-program optimizations of graph-based compilation. Tempo achieves this through a declarative programming model with \emph{recurrent tensors}, which include explicit \emph{temporal dimensions}. Temporal dimensions can be indexed using \emph{symbolic expressions} to express dynamic dependencies on past and future tensors. Based on this, Tempo constructs a \emph{symbolic dependence graph}, which concisely encodes dynamic dependencies between operators, and applies whole-program optimizations, such as algebraic simplifications, vectorization, tiling, and fusion. By tiling dynamic dependencies into static-size blocks, Tempo can also reuse existing static code-generators. It then uses a polyhedral model to find a feasible execution schedule, which includes memory management operations. We show that Tempo achieves a 7$\times$ speedup over JAX for Llama-3.2-3B decoding; for reinforcement learning algorithms, Tempo achieves a 54$\times$ speedup, with 16$\times$ lower peak memory usage.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签