Arca's Blog

Sage Attention v3

Sage Attention v3，相比之前的两份工作更进一步，提出了 FP4 推理和 INT8 训练框架。

Sage Attention v2 与 v2++

第二版 Sage Attention 以及其改良

Sage Attention v1: 对 Attention 的 INT8 PTQ

将低精度方法应用在 Flash Attention 上，computation pattern 和 Flash Attention 是一样的，整体的提速主要来自于低精度计算的提速减去量化的 overhead，当然同时也保证了一定的精度．