Activation Checkpointing

In essence, activation checking is to replace memory with computation. During forward, only a few activation checkpoints are ...

Training System

ZeRO: Zero Redundancy Optimizer

针对分布式训练的场景,对模型权重、优化器状态进行切分,从而减少显存占用

Training System/Distributed

AutoGrad 自动微分

在 PyTorch 代码执行计算的时候,AutoGrad 会构建一张由 Function 对象组成的 DAG 计算图,用于反向传播.每一个 Function 对象表示操作,通过其 .apply() 计算前向传播结果,并记录其反向传播的逻辑 .grad...

Training System/原理