Updated on: 2026-05-13Posted on: 2026-05-12

Prefix Caching (RadixAttention)

In PagedAttention, KV Cache can only be reused within one single round of inference. While in multi-round sessions, next round chat can be regarded as a concat of prompts and output from previous round.

Zhihu: Prefix Caching Introduction