Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update a keras LSTM weights to avoid Concept Drift

Iยดm trying to update a Keras LSTM to avoid the concept of drift. For that Iยดm following the approach proposed in this paper [1] on which they compute an anomaly score and they apply it to update the network weights. In the paper they use the L2 norm to compute the anomaly score and then they update the model weights. As it is stated in the paper:

RNN Update: The anomaly score ๐‘Ž๐‘ก is then used to update the network W๐‘กโˆ’1 to obtain W๐‘ก using backpropagation through time (BPTT):

W๐‘ก = W๐‘กโˆ’1 โˆ’ ๐œ‚โˆ‡๐‘Ž๐‘ก(W๐‘กโˆ’1) where ๐œ‚ is the learning rate

Iโ€™m trying to update the LSTM network weights, but although I have seen some improvements in the model performance for forecasting multi-step ahead multi-sensor data Iโ€™m not sure if the improvement is because the updates deal with the drift concept or just because the model is refitted with the newest data.

Here is an example model:

model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(n_neurons, input_shape=(n_seq, n_features)))
model.add(layers.Dense(n_pred_seq * n_features))
model.add(layers.Reshape((n_pred_seq, n_features)))
model.compile(optimizer='adam', loss='mse')

And here is the way on which Iโ€™m updating the model:

y_pred = model.predict_on_batch(x_batch)
up_y = data_y[i,]
a_score = sqrt(mean_squared_error(data_y[i,].flatten(), y_pred[0, :]))
w = model.layers[0].get_weights() #Only get weights for LSTM layer
for l in range(len(w)):
    w[l] = w[l] - (w[l]*0.001*a_score) #0.001=learning rate
model.layers[0].set_weights(w)
model.fit(x_batch, up_y, epochs=1, verbose=1)
model.reset_states()

Iโ€™m wondering if this is the correct way to update the LSTM neural network and how the BPTT is applied after updating the weights.

P.D.: I have also seen other methods to detect concept drift such as the ADWIN method from the skmultiflow package but I found this one especially interesting because it also deals with anomalies, updating the model slightly when new data with concept drift comes and almost ignoring the updates when anomalous data comes.

[1] Online Anomaly Detection with Concept Drift Adaptation using Recurrent Neural Networks Saurav, S., Malhotra, P., TV, V., Gugulothu, N., Vig, L., Agarwal, P., & Shroff, G. (2018, January). In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 78-87). ACM.

like image 480
kevin Avatar asked Feb 01 '26 18:02

kevin


2 Answers

I personally thinks that it's a valid method. The fact that you're updating the ntework weights depends on what you're doing, so if you do it like you do it's fine.

Maybe another way to do it is to implement your own loss function and embed the anti-drift parameter into it, but it might be a little complicated.

Regarding the BPTT i think it's applied as normal, but you have different "starting points", the weights you've just updated.

like image 168
Federico Andreoli Avatar answered Feb 03 '26 07:02

Federico Andreoli


Looking at the second block of your code, I believe you are not calculating the gradient properly. Specifically, the gradient update w[l] = w[l] - (w[l]*0.001*a_score) seems to be wrong to me.

Here you are multiplying the weights and the anomaly score. However, the original gradient update equation enter image description here

means to calculate the gradient of W_{t-1} using the loss \alpha_t, it does not mean to multiply \alpha_t with W_{t-1}.

To apply the online update correctly, you just need to sample your stream sequentially and apply the model.fit() as usual.

Hope this helps.

like image 30
Hammer. Wang Avatar answered Feb 03 '26 07:02

Hammer. Wang