推理延迟_Fishai

热点

"推理延迟" 相关文章

Benchmarking On-Device Machine Learning on Apple Silicon with MLX

cs.AI updates on arXiv.org 2025-10-23T04:13:45.000000Z

ZSPAPrune: Zero-Shot Prompt-Aware Token Pruning for Vision-Language Models

cs.AI updates on arXiv.org 2025-10-21T04:27:57.000000Z

BeLLMan: Controlling LLM Congestion

cs.AI updates on arXiv.org 2025-10-20T04:13:29.000000Z

BeLLMan: Controlling LLM Congestion

cs.AI updates on arXiv.org 2025-10-20T04:13:29.000000Z

BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation

cs.AI updates on arXiv.org 2025-10-14T04:17:55.000000Z

Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning

cs.AI updates on arXiv.org 2025-10-14T04:08:53.000000Z

Taming Latency-Memory Trade-Off in MoE-Based LLM Serving via Fine-Grained Expert Offloading

cs.AI updates on arXiv.org 2025-10-07T04:19:02.000000Z

Elastic On-Device LLM Service

cs.AI updates on arXiv.org 2025-10-07T04:18:48.000000Z

Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving

cs.AI updates on arXiv.org 2025-10-03T04:17:59.000000Z

Nav-EE: Navigation-Guided Early Exiting for Efficient Vision-Language Models in Autonomous Driving

cs.AI updates on arXiv.org 2025-10-03T04:17:59.000000Z

From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement

cs.AI updates on arXiv.org 2025-09-29T04:15:44.000000Z

Learning Primitive Embodied World Models: Towards Scalable Robotic Learning

cs.AI updates on arXiv.org 2025-09-23T06:12:43.000000Z

Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency

cs.AI updates on arXiv.org 2025-09-18T04:47:55.000000Z

Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism

cs.AI updates on arXiv.org 2025-09-11T15:51:36.000000Z

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

cs.AI updates on arXiv.org 2025-07-14T04:08:35.000000Z

Copyright © 2019 FISHAI.All Rights Reserved