I have an LSTM Dataset. Some labels contain NaNs at the end, which cant be backward filled (because theres no values after them) and foreward-filling them would make no sense (since the labels timestamp will be deprecated in a 'nearer future'-timestamp (=missing value locatoin) compared to its acutal timeindex)
So: is there a way to mask NaN-values in the LABEL-set(/Output-set)? (because sample_weights is for Input data only as it seems).
You can accomplish data masking via a Keras Masking layer:https://keras.io/api/layers/core_layers/masking/.
Layers that following the masking layer and support masking (the LSTM layer does) will skip samples / steps where all features equal the masking value for that step.
However, since the Masking layer checks for equality between the mask value and the data, you can't use NaN as a masking value since NaN does not equal itself, (np.nan == np.nan is false).
Instead, you can use the Masking layer by first converting your NaN values to 0 (the default mask value) or another value if explicitly specified as the masking value when generating the Masking layer:
from tensorflow.keras import layers
import numpy as np
# your example timesteps
ex_data = np.array([0.123, 0.437, 0.891, np.nan, 1.497, 1.1])
# reshape your example timesteps into a 3D matrix (1 sample x 6 timesteps x 1 feature per timestep)
data = np.reshape(ex_data, (1, 6, 1))
# set NaN values to 0, which is the default masking value
data[np.isnan(data)] = 0
masked_data = layers.Masking()(data)
print(masked_data._keras_mask)
Returns:
tf.Tensor([[ True True True False True True]], shape=(1, 6), dtype=bool)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With