Arca's Blog

QLoRA 解读：LLM 4-bit 方案与双层量化

算法 2-Level (Double) Quantization QLoRA 使用了两阶段量化的方案，我们先来说说量化是怎么个流程，需要保存哪些个变量。 First Level Quantization 对于输入的权重，假设其为大小 R×CR\...

Markov Decision Processes

Markov Decision Process (MDP) serves as the theoretical foundation of RL.

Formulation of Reinforcement Learning

Briefly introduce the general background of RL.

LoRA Fine-tuning

似乎已经成为工业界快速针对下游任务进行 SFT 的标准方法了（吗