Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keras ImageDataGenerator for Cloud ML Engine

I need to train a neural net fed by some raw images that I store on the GCloud Storage. To do that I’m using the flow_from_directory method of my Keras image generator to find all the images and their related labels on the storage.

training_data_directory = args.train_dir
testing_data_directory = args.eval_dir

training_gen = datagenerator.flow_from_directory(
                    training_data_directory,
                    target_size = (img_width, img_height),
                    batch_size = 32)

validation_gen = basic_datagen.flow_from_directory(
                    testing_data_directory,
                    target_size = (img_width, img_height),
                    batch_size = 32)

My GCloud Storage architecture is the following :

brad-bucket / data / train
brad-bucket / data / eval

The gsutil command allows me to be sure my folders exist.

brad$ gsutil ls gs://brad-bucket/data/
gs://brad-bucket/data/eval/
gs://brad-bucket/data/train/

So here is the script I'm running to launch the training on ML Engine with the strings I use for the paths of my directories (train_dir, eval_dir).

BUCKET="gs://brad-bucket"
JOB_ID="training_"$(date +%s)
JOB_DIR="gs://brad-bucket/jobs/train_keras_"$(date +%s)
TRAIN_DIR="gs://brad-bucket/data/train/"
EVAL_DIR="gs://brad-bucket/data/eval/"
CONFIG_PATH="config/config.yaml"
PACKAGE="trainer"

gcloud ml-engine jobs submit training $JOB_ID \
                                    --stream-logs \
                                    --verbosity debug \
                                    --module-name trainer.task \
                                    --staging-bucket $BUCKET \
                                    --package-path $PACKAGE \
                                    --config $CONFIG_PATH \
                                    --region europe-west1 \
                                    -- \
                                    --job_dir $JOB_DIR \
                                    --train_dir $TRAIN_DIR \
                                    --eval_dir $EVAL_DIR \
                                    --dropout_one 0.2 \
                                    --dropout_two 0.2

Though, what I’m doing throws an OSError.

ERROR   2018-01-10 09:41:47 +0100   service       File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/preprocessing/image.py", line 1086, in __init__
ERROR   2018-01-10 09:41:47 +0100   service         for subdir in sorted(os.listdir(directory)):
ERROR   2018-01-10 09:41:47 +0100   service     OSError: [Errno 2] No such file or directory: 'gs://brad-bucket/data/train/'

When I'm using another data structure (reading the data in another way), everything is working fine, but when I'm using flow_from_directory to read from directories and subdirectories I'm always getting this same error. Is it possible to use this method to retrieve data from the Cloud Storage or do I have to feed the data in a different way?

like image 632
Bradawk Avatar asked Dec 09 '25 12:12

Bradawk


1 Answers

If you check the source code, you see that the error arises when Keras (or TF) is trying to construct the classes from your directories. Since you are giving it a GCS-directory (gs://), this will not work. You can bypass this error by providing the classes argument yourself, e.g. in the following way:

def get_classes(file_dir):
    if not file_dir.startswith("gs://"):
      classes = [c.replace('/', '') for c in os.listdir(file_dir)]
    else:
      bucket_name = file_dir.replace('gs://', '').split('/')[0]
      prefix = file_dir.replace("gs://"+bucket_name+'/', '')
      if not prefix.endswith("/"):
          prefix += "/"

      client = storage.Client()
      bucket = client.get_bucket(bucket_name)

      iterator = bucket.list_blobs(delimiter="/", prefix=prefix)
      response = iterator.get_next_page_response()
      classes = [c.replace('/','') for c in response['prefixes']]

    return classes

Passing these classes to flow_from_directory will solve your error, but it will not recognize the files itself (I now get e.g. Found 0 images belonging to 2 classes.).

The only 'direct' workaround that I find, is to copy your files to local disk and read them from there. It would be great to have another solution (since e.g. in case of images, it can take long to copy).

Other resources also suggest to use TensorFlow's file_io function when interacting with GCS from Cloud ML Engine, but this will require you to fully rewrite flow_from_directory yourself in this case.

like image 184
dumkar Avatar answered Dec 12 '25 16:12

dumkar