I am running experiments on synthetic data (e.g. fitting a sine curve) and I get errors in pytorch that are really small. One if about 2.00e-7. I was reading about machine precision and it seems really close to the machine precision. How do I know if this is going to cause problems (or if perhaps it already has e.g. I can't differentiate between the different errors since they are "machine zero").
errors:
p = np.array([2.3078539778125768e-07,
               1.9997889411762922e-07,
               2.729681222011256e-07,
               3.2532371115080884e-07])
m = np.array([3.309504692539563e-07,
                 4.1058904888091606e-06,
                 6.8326703386053605e-06,
                 7.4616147721799645e-06])
what confuses me is that I tried adding what I thought was to small of a number so that it returned no difference but it did return a difference (i.e. I tried to do a+eps = a using eps = smaller than machine precision):
import torch
x1 = torch.tensor(1e-6)
x2 = torch.tensor(1e-7)
x3 = torch.tensor(1e-8)
x4 = torch.tensor(1e-9)
eps = torch.tensor(1e-11)
print(x1.dtype)
print(x1)
print(x1+eps)
print(x2)
print(x2+eps)
print(x3)
print(x3+eps)
print(x4)
print(x4+eps)
output:
torch.float32
tensor(1.0000e-06)
tensor(1.0000e-06)
tensor(1.0000e-07)
tensor(1.0001e-07)
tensor(1.0000e-08)
tensor(1.0010e-08)
tensor(1.0000e-09)
tensor(1.0100e-09)
I expected everything to be zero but it wasn't. Can someone explain to me what is going on? If I am getting losses close to 1e-7 should I use double rather than float? googling it seems that single is the precision for float afaik.
If I want to use doubles what are cons/pros + what is the least error prone way to change my code? Is a single change to double type enough or is there a global flag?
Useful reminder:
recall machine precision:
Machine precision is the smallest number ε such that the difference between 1 and 1 + ε is nonzero, i.e., it is the smallest difference between these two numbers that the computer recognizes. For IEEE-754 single precision this is 2-23 (approximately 10-7) while for IEEE-754 double precision it is 2-52 (approximately 10-16) .
Potential solution:
Ok let’s see if this is a good summary of what I think is correct (modulo ignoring some details that I don’t fully understand right now of floats, like the bias).
But I’ve concluded that the best thing for me is to make sure my errors/numbers have two properties:
they are within 7decimals of each other (due to the mantissa being 24 bigs like you pointed out the log_10(2^24) = 7.225) they are far enough from the edges. For this I take the mantissa to be 23 bits away from the lower edge (point position about -128+23) and the same for the largest edge but 127-23. As long we satisfy that more or less we avoid adding two numbers that are too small for the machine to distinguish (condition 1) and avoid overflows/underflows (condition 2).
Perhaps there is a small detail I might be missing with the bias or some other float detail (like representing infinity, NaN). But I believe that is correct.
If anyone can correct the details, that would be fantastic.
useful links:
I think you misunderstood how floating points work. There are many good resources (e.g.) about what floating points are, so I am not going into details here.
The key is that floating points are dynamic. They can represent the addition of very large values up to a certain accuracy, or the addition of very small values up to a certain accuracy, but not the addition of a very large value with a very small value. They adjust their ranges on-the-go.
So this is why your testing result is different than the explanation in "machine precision" -- you are adding two very small values, but that paragraph explicitly said "1+eps". 1 is a much larger value than 1e-6. The following thus will work as expected:
import torch
x1 = torch.tensor(1).float()
eps = torch.tensor(1e-11)
print(x1.dtype)
print(x1)
print(x1+eps)
Output:
torch.float32
tensor(1.)
tensor(1.)
The second question -- when should you use double?
Pros - higher accuracy.
Cons - Much slower (hardware are configured to like float most of the time), doubled memory usage.
That really depends on your application. Most of the time I would just say no. As I said, you need double when you have very large values and very small values coexist in the network. That should not be happening anyway with proper normalization of data.
(Another reason is the overflow of exponent, say when you need to represent very very very large/small values, beyond 1e-38 and 1e38)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With