Updated on: 2025-05-04

Attention 中的 KV Cache

KV Cache

KV Cache（键值缓存）是 Transformer 模型推理优化中的核心技术，其核心思想是缓存 Attention 机制中已计算的 Key 和 Value 矩阵，避免重复计算，从而减少计算量并提升推理效率。