ZeRO: Zero Redundancy Optimizer 针对分布式训练的场景,对模型权重、优化器状态进行切分,从而减少显存占用 2026-05-14 Training System/Distributed