Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to train a Keras model with very a big dataset?

I am trying to train an autoencoder using TensorFlow and Keras. My training data has more than 200K 512x128 unlabeled images. If I want to load the data in a matrix, its shape will be (200000, 512, 128, 3). That is a few hundred GB of RAM space. I know I can reduce the batch size while training but that is for limiting memory usage in GPU/CPU.

Is there a workaround to this problem?

like image 376
Nirmal Baishnab Avatar asked Jan 26 '26 12:01

Nirmal Baishnab


1 Answers

You can use the tf.data API for lazily loading the images... Below tutorial goes into the details..

  • https://www.tensorflow.org/tutorials/load_data/images

Also look into tf.data.Dataset.prefetch, tf.data.Dataset.batch and tf.data.Dataset.cache methods to optimize performance..

  • https://www.tensorflow.org/guide/data
  • https://www.tensorflow.org/guide/data_performance

You can also preprocess the data into TFRecords for reading them more efficiently before reading them in your training pipeline...

  • https://www.tensorflow.org/tutorials/load_data/tfrecord#tfrecord_files_using_tfdata
like image 54
Deepak Sadulla Avatar answered Jan 28 '26 02:01

Deepak Sadulla