1/3
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What are FLOPs?
Measure for required compute for a model=> “Floating Point Operations”.
What are key differences between model Inference and Training?
Training takes 3x more compute than inference.
Training holds activations in memory for the backward pass.
What is Activation Checkpointing?
Trade-off memory for compute in model training.
Drop activations from memory and re-compute when needed => keep some activations as “Checkpoints”.
=> Re-computing activations is slow when done from the beginning. Checkpoints throughout make average re-compute faster.
What is Gradient Accumulation?
Trade-off memory for compute in model training.
Problem: we want to run a large batch size, that does not fit in memory.
Solution: We run multiple forward-backward passes before doing an optimizer step, by keeping a running mean of the gradients.