Prefix Caching (RadixAttention)
Prefix caching is an optimization technique on top of KV cache that is massively exploited in multi-turn session.
Prefix caching is an optimization technique on top of KV cache that is massively exploited in multi-turn session.
LLM Inference 中的 PD 分离技术
KV Cache 是支撑让大模型记住超长上下文的关键技术,也是大模型推理中最重要的优化之一。
Adopt ideas of memory management from operating system and apply it to KV Cache management during inference, bringing performance boost. This work further gives birth to vLLM, a popular LLM deployment framework.