I am building a model to predict geospatial-temporal datasets.
My data has original dimensions (features, lat, lon, time), i.e. for each feature and at each lat/lon point there is a time series.
I have created a CNN-LSTM model using Keras like so (I assume the below needs to be modified, this is just a first attempt):
def define_model_cnn_lstm(features, lats, lons, times):
    """
    Create and return a model with CN and LSTM layers. Input and output data is 
    expected to have shape (lats, lons, times).
    :param lats: latitude dimension of input 3-D array 
    :param lons: longitude dimension of input 3-D array
    :param times: time dimension of input 3-D array
    :return: CNN-LSTM model appropriate to the expected input array
    """
    # define the CNN model layers, wrapping each CNN layer in a TimeDistributed layer
    model = Sequential()
    model.add(TimeDistributed(Conv2D(features, (3, 3), 
                                     activation='relu', 
                                     padding='same', 
                                     input_shape=(lats, lons, times))))
    model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
    model.add(TimeDistributed(Flatten()))
    # add the LSTM layer, and a final Dense layer
    model.add(LSTM(units=times, activation='relu', stateful=True))
    model.add(Dense(1))
    model.compile(optimizer='adam', loss='mse')
    return model
My assumption is that this model will take data with shape (features, lats, lons, times), so for example if my geospatial grid is 180 x 360 and there are 100 time steps at each point, and I have 4 features per observation/sample, then the shape will be (4, 180, 360, 100).
I assume that I will want the model to take arrays with shape (features, lats, lons, times) as input and be able to predict labels arrays with shape (labels, lats, lons, times) as output. I am first using a single variable as my label, but it might be interesting later to be able to have multivariate output as well (i.e. labels > 1).
How should I best shape my data for input, and/or how to structure the model layers in a way that's most appropriate for this application?
Well, I think it is better to reshape your data to (time, lats, lons, features), i.e. it is a timeseries of mutli-channel (i.e. features) spatial maps:
data = np.transpose(data, [3, 1, 2, 0])
Then you can easily wrap Conv2D and MaxPooling2D layers inside a TimeDistributed layer to process the (multi-channel) maps at each timestep:
num_steps = 50
lats = 128
lons = 128
features = 4
out_feats = 3
model = Sequential()
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same'), 
                          input_shape=(num_steps, lats, lons, features)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
So far we would have a tensor of shape of (50, 16, 16, 32). Then we can use Flatten layer (of course, wrapped in a TimeDistributed layer to not lose time axis) and feed the result to one or multiple LSTM layers (with return_sequence=True to get the output at each timestep):
model.add(TimeDistributed(Flatten()))
# you may stack multiple LSTM layers on top of each other here
model.add(LSTM(units=64, return_sequences=True))
Then we need to go back up. So we need to first reshape the result of LSTM layers to make it 2D and then use the combination of UpSampling2D and Conv2D layers to get the original map's shape back:
model.add(TimeDistributed(Reshape((8, 8, 1))))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(32, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(16, (3,3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2,2))))
model.add(TimeDistributed(Conv2D(out_feats, (3,3), padding='same')))
Here is the model summary:
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed_132 (TimeDi (None, 50, 128, 128, 16)  592       
_________________________________________________________________
time_distributed_133 (TimeDi (None, 50, 64, 64, 16)    0         
_________________________________________________________________
time_distributed_134 (TimeDi (None, 50, 64, 64, 32)    4640      
_________________________________________________________________
time_distributed_135 (TimeDi (None, 50, 32, 32, 32)    0         
_________________________________________________________________
time_distributed_136 (TimeDi (None, 50, 32, 32, 32)    9248      
_________________________________________________________________
time_distributed_137 (TimeDi (None, 50, 16, 16, 32)    0         
_________________________________________________________________
time_distributed_138 (TimeDi (None, 50, 8192)          0         
_________________________________________________________________
lstm_13 (LSTM)               (None, 50, 64)            2113792   
_________________________________________________________________
time_distributed_139 (TimeDi (None, 50, 8, 8, 1)       0         
_________________________________________________________________
time_distributed_140 (TimeDi (None, 50, 16, 16, 1)     0         
_________________________________________________________________
time_distributed_141 (TimeDi (None, 50, 16, 16, 32)    320       
_________________________________________________________________
time_distributed_142 (TimeDi (None, 50, 32, 32, 32)    0         
_________________________________________________________________
time_distributed_143 (TimeDi (None, 50, 32, 32, 32)    9248      
_________________________________________________________________
time_distributed_144 (TimeDi (None, 50, 64, 64, 32)    0         
_________________________________________________________________
time_distributed_145 (TimeDi (None, 50, 64, 64, 16)    4624      
_________________________________________________________________
time_distributed_146 (TimeDi (None, 50, 128, 128, 16)  0         
_________________________________________________________________
time_distributed_147 (TimeDi (None, 50, 128, 128, 3)   435       
=================================================================
Total params: 2,142,899
Trainable params: 2,142,899
Non-trainable params: 0
_________________________________________________________________
As you can see we have a output tensor of shape (50, 128, 128, 3) where 3 refers to number of desired labels we want to predict for location at each timestep.
Further notes:
As the number of layers and parameters increases (i.e. the model becomes deeper), you may need to deal with problems such as vanishing gradient (1, 2) and overfitting (1, 2, 3). One solution for the former is to use BatchNormalization layer right after each (trainable) layer to ensure that the data being fed to next layer is normalized. To prevent overfitting you could use Dropout layers (and/or set dropout and recurrent_dropout arguments in LSTM layer).
As you can see above, I have assumed that we are feeding the model a timeseries of length 50. This is concerned with data preprocessing step where you need to create windowed training (and test) samples from your whole (long) timeseries and feed them in batches to your model for training.
As I have commented in the code, you can add multiple LSTM layers on top of each other to increase the representational capacity of the network. But be aware it may increase the training time and it make your model (much more) prone to overfitting. So do it if you have justified reasons for it (i.e. you have experimented with one LSTM layer and have not gotten good results). Alternatively, you can use GRU layers instead, but there might be a tradeoff between representation capacity and computational cost (i.e. training time) compared to LSTM layer.
To make the output shape of the network compatible with the shape of your data, you could use a Dense layer after the LSTM layer(s) or adjust the number of units of last LSTM layer.
Obviously, the above code is just for demonstration and you may need to tune its hyperparamters (e.g. number of layers, number of filters, kernel size, optimizer used, activation functions, etc.) and experiment (a lot!) to achieve a final working model with great accuracy.
If you are training on a GPU, you can use CuDNNLSTM (CuDNNGRU) layer instead of LSTM (GRU) to increase training speed as it is has been optimized for GPUs.
And don't forget to normalize the training data (it's very important and helps training process a lot).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With