What is the difference between cuda.amp and model.half()?

Question

According to https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/

We can use:

   with torch.cuda.amp.autocast():
      loss = model(data)

In order to casts operations to mixed precision.

Another thing is that we can use model.half() to convert all the model weights to half precision.

What is the difference between these 2 commands ?
If I want to take advantage of FP16 (in order to create larger models and shorter training time), what do I need ? Do I need to use model.half() or using torch.cuda.amp (according the link above) ?

Akhil Ashref · Accepted Answer

If you convert the entire model to fp16, there is a chance that some of the activations functions and batchnorm layers will cause the fp16 weights to underflow, i.e., become zero. So it is always recommended to use autocast which internally converts the weights to fp32 in problematic layers.

model.half() in the end will save weight in fp16 where as autocast weights will be still in fp32. Training in fp16 will be faster than autocast but higher chance for instability if you are not careful. While using autocast you also need to scale up the gradient during the back propagation

If fp16 requirement is on the inference side, I recommend using autocast and then converting to fp16 using ONNX and tensorrt.

What is the difference between cuda.amp and model.half()?

Tags:

nvidia

pytorch

user3668129

1 Answers

Akhil Ashref

Recent Activity

Donate For Us

What is the difference between cuda.amp and model.half()?

Tags:

nvidia

pytorch

user3668129

1 Answers

Akhil Ashref

Related questions

Recent Activity

Donate For Us