I need to train a neural net fed by some raw images that I store on the GCloud Storage. To do that I’m using the flow_from_directory method of my Keras image generator to find all the images and their related labels on the storage.
training_data_directory = args.train_dir
testing_data_directory = args.eval_dir
training_gen = datagenerator.flow_from_directory(
training_data_directory,
target_size = (img_width, img_height),
batch_size = 32)
validation_gen = basic_datagen.flow_from_directory(
testing_data_directory,
target_size = (img_width, img_height),
batch_size = 32)
My GCloud Storage architecture is the following :
brad-bucket / data / train
brad-bucket / data / eval
The gsutil command allows me to be sure my folders exist.
brad$ gsutil ls gs://brad-bucket/data/
gs://brad-bucket/data/eval/
gs://brad-bucket/data/train/
So here is the script I'm running to launch the training on ML Engine with the strings I use for the paths of my directories (train_dir, eval_dir).
BUCKET="gs://brad-bucket"
JOB_ID="training_"$(date +%s)
JOB_DIR="gs://brad-bucket/jobs/train_keras_"$(date +%s)
TRAIN_DIR="gs://brad-bucket/data/train/"
EVAL_DIR="gs://brad-bucket/data/eval/"
CONFIG_PATH="config/config.yaml"
PACKAGE="trainer"
gcloud ml-engine jobs submit training $JOB_ID \
--stream-logs \
--verbosity debug \
--module-name trainer.task \
--staging-bucket $BUCKET \
--package-path $PACKAGE \
--config $CONFIG_PATH \
--region europe-west1 \
-- \
--job_dir $JOB_DIR \
--train_dir $TRAIN_DIR \
--eval_dir $EVAL_DIR \
--dropout_one 0.2 \
--dropout_two 0.2
Though, what I’m doing throws an OSError.
ERROR 2018-01-10 09:41:47 +0100 service File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/_impl/keras/preprocessing/image.py", line 1086, in __init__
ERROR 2018-01-10 09:41:47 +0100 service for subdir in sorted(os.listdir(directory)):
ERROR 2018-01-10 09:41:47 +0100 service OSError: [Errno 2] No such file or directory: 'gs://brad-bucket/data/train/'
When I'm using another data structure (reading the data in another way), everything is working fine, but when I'm using flow_from_directory to read from directories and subdirectories I'm always getting this same error. Is it possible to use this method to retrieve data from the Cloud Storage or do I have to feed the data in a different way?
If you check the source code, you see that the error arises when Keras (or TF) is trying to construct the classes from your directories. Since you are giving it a GCS-directory (gs://), this will not work. You can bypass this error by providing the classes argument yourself, e.g. in the following way:
def get_classes(file_dir):
if not file_dir.startswith("gs://"):
classes = [c.replace('/', '') for c in os.listdir(file_dir)]
else:
bucket_name = file_dir.replace('gs://', '').split('/')[0]
prefix = file_dir.replace("gs://"+bucket_name+'/', '')
if not prefix.endswith("/"):
prefix += "/"
client = storage.Client()
bucket = client.get_bucket(bucket_name)
iterator = bucket.list_blobs(delimiter="/", prefix=prefix)
response = iterator.get_next_page_response()
classes = [c.replace('/','') for c in response['prefixes']]
return classes
Passing these classes to flow_from_directory will solve your error, but it will not recognize the files itself (I now get e.g. Found 0 images belonging to 2 classes.).
The only 'direct' workaround that I find, is to copy your files to local disk and read them from there. It would be great to have another solution (since e.g. in case of images, it can take long to copy).
Other resources also suggest to use TensorFlow's file_io function when interacting with GCS from Cloud ML Engine, but this will require you to fully rewrite flow_from_directory yourself in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With