According to https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/
We can use:
with torch.cuda.amp.autocast():
loss = model(data)
In order to casts operations to mixed precision.
Another thing is that we can use model.half() to convert all the model weights to half precision.
FP16 (in order to create larger models and shorter training time), what do I need ?
Do I need to use model.half() or using torch.cuda.amp (according the link above) ?If you convert the entire model to fp16, there is a chance that some of the activations functions and batchnorm layers will cause the fp16 weights to underflow, i.e., become zero. So it is always recommended to use autocast which internally converts the weights to fp32 in problematic layers.
model.half() in the end will save weight in fp16 where as autocast weights will be still in fp32. Training in fp16 will be faster than autocast but higher chance for instability if you are not careful.
While using autocast you also need to scale up the gradient during the back propagation
If fp16 requirement is on the inference side, I recommend using autocast and then converting to fp16 using ONNX and tensorrt.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With