Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Character LSTM keeps generating same character sequence

I'm training a 2-layer character LSTM with keras to generate sequences of characters similar to the corpus I am training on. When I train the LSTM, however, the generated output by the trained LSTM is the same sequence over and over again.

I've seen suggestions for similar problems to increase the LSTM input sequence length, increase the batch size, add dropout layers, and increase the dropout amount. I've tried all these things and none of them seem to have fixed the issue. The one thing that has yielded some success is adding a random noise vector to each vector outputted by the LSTM during generation. This makes sense since the LSTM uses the previous step's output to generate the next output. However, generally if I add enough noise to break the LSTM out of its repetitive generation, the quality of the output degrades a great deal.

My LSTM training code is as follows:

# [load data from file]
raw_text = collected_statements.lower()
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text + '\b')))
char_to_int = dict((c, i) for i, c in enumerate(chars)) 
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out]) 

# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), 
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, 
save_best_only=True, mode='min')
callbacks_list = [checkpoint]

# fix random seed for reproducibility
seed = 8
numpy.random.seed(seed)
# split into 80% for train and 20% for test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
  random_state=seed)

# train the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=18, 
  batch_size=256, callbacks=callbacks_list)

My generation code is as follows:

filename = "weights-improvement-18-1.5283.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')
int_to_char = dict((i, c) for i, c in enumerate(chars))
# pick a random seed
start = numpy.random.randint(0, len(dataX)-1)
pattern = unpadded_patterns[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")
# generate characters
for i in range(1000):
x = numpy.reshape(pattern, (1, len(pattern), 1))
x = (x / float(n_vocab)) + (numpy.random.rand(1, len(pattern), 1) * 0.01)
prediction = model.predict(x, verbose=0)
index = numpy.argmax(prediction)
#print(index)
result = int_to_char[index]
seq_in = [int_to_char[value] for value in pattern]
sys.stdout.write(result)
pattern.append(index)
pattern = pattern[1:len(pattern)]
print("\nDone.")

When I run the generation code, I get the same sequence over and over again:

we have the best economy in the history of our country." "we have the best 
economy in the history of our country." "we have the best economy in the 
history of our country." "we have the best economy in the history of our 
country." "we have the best economy in the history of our country." "we 
have the best economy in the history of our country." "we have the best 
economy in the history of our country." "we have the best economy in the 
history of our country." "we have the best economy in the history of our 
country."

Is there anything else I could try that could help to generate something besides the same sequence over and over?

like image 957
gautamh Avatar asked Oct 28 '25 09:10

gautamh


1 Answers

In your character generation I would suggest sampling from the probabilities your model outputs instead of taking the argmax directly. This is what the keras example char-rnn does to get diversity.

This is the code they use for sampling in their example:

def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In your code you've got index = numpy.argmax(prediction)

I'd suggest just replacing that with index = sample(prediction) and experiment with temperatures of your choice. Keep in mind that higher temperatures make your output more random and lower temperatures make it less random.

like image 65
Primusa Avatar answered Oct 31 '25 08:10

Primusa