(yet) Pytorch Impl of Distributed Shampoo
11 Oct 2024< 목차 >
- Optimizers for Single Processor
- Distributed Optimizer
- Learning Rate Grafting: Transferability of Optimizer Tuning
- References
Optimizers for Single Processor
Vanilla AdamW
Shampoo
SOAP
Distributed Optimizer
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
Learning Rate Grafting: Transferability of Optimizer Tuning
Fig.
References
- Papers
- Distributed Shampoo references
- megatron distributed optimizer