Let's say I have a numpy array of some integer type (say np.int64) and want to cast it to another type (say np.int8). How can I most effectively check if the operation is safe (preserving all values)?
There are two approaches I've come up with:
Approach 1: Use the type information
def is_safe(data, new_type):
if np.can_cast(data, new_type):
return True # Handle the trivial allowed cases
type_info = np.iinfo(new_type)
return np.all((data >= type_info.min) & (data <= type_info.max))
Approach 2: Use np.can_cast on all items
def is_safe(data, new_type):
if np.can_cast(data, new_type):
return True # Handle the trivial allowed cases
return all(np.can_cast(item, new_type) for item in np.nditer(item))
Both of these approaches seem to be valid (and work for trivial cases) but are they correct and efficient? Is there another, better approach?
P.S. To complicate things further, np.can_cast(np.int8, np.uint64) returns False (naturally) so changing between signed and unsigned integers has to be checked somewhat separately.
If you already know that the array is of a NumPy integer type, then the only check needed is that the values are within the range specified by min/max of the target integer range. This is a much simpler check than the generic can_cast, which has no a priori knowledge of the things it is fed. Consequently, can_cast takes longer. I tested this on casting integers 0-99 from np.int64 to np.int8.
So, while both approaches are correct, the first one is preferable if you know that data is a NumPy integer array.
>>> timeit.timeit("np.all((data >= type_info.min) & (data <= type_info.max))", setup="import numpy as np\ndata = np.array(range(100), dtype=np.int64)\ntype_info = np.iinfo(np.int8)")
6.745509549000417
>>> timeit.timeit("all(np.can_cast(item, np.uint8) for item in np.nditer(data))", setup="import numpy as np\ndata = np.array(range(100), dtype=np.int64)")
51.0065170609887
It is slightly faster (20% or so) to assign the min and max values to new variables:
type_info = np.iinfo(new_type)
a = type_info.min
b = type_info.max
return np.all((data >= a) & (data <= b))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With