Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between cuda.amp and model.half()?

Tags:

nvidia

pytorch

According to https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/

We can use:

   with torch.cuda.amp.autocast():
      loss = model(data)

In order to casts operations to mixed precision.

Another thing is that we can use model.half() to convert all the model weights to half precision.

  1. What is the difference between these 2 commands ?
  2. If I want to take advantage of FP16 (in order to create larger models and shorter training time), what do I need ? Do I need to use model.half() or using torch.cuda.amp (according the link above) ?
like image 828
user3668129 Avatar asked Jan 17 '26 21:01

user3668129


1 Answers

If you convert the entire model to fp16, there is a chance that some of the activations functions and batchnorm layers will cause the fp16 weights to underflow, i.e., become zero. So it is always recommended to use autocast which internally converts the weights to fp32 in problematic layers.

model.half() in the end will save weight in fp16 where as autocast weights will be still in fp32. Training in fp16 will be faster than autocast but higher chance for instability if you are not careful. While using autocast you also need to scale up the gradient during the back propagation

If fp16 requirement is on the inference side, I recommend using autocast and then converting to fp16 using ONNX and tensorrt.

like image 179
Akhil Ashref Avatar answered Jan 19 '26 16:01

Akhil Ashref



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!