I have a Google Cloud Engine VM and am trying to grab data from my cloud storage - which is in the form of a blob, and turn it into a np array with the same shape as it was when stored.
Currently the only way I can get this working is by downloading to file and then loading into a numpy array which seems sub-optimal
I have tried downloading as a string array directly and converting into numpy array but the dimensions are not maintained (they are flattened).
I could move all files to the VM instead but would rather read 'on-the-fly' if possible?
Current code:
def __getitem__(self, index):
index = int(self.indexes[int(index)])
blob = bucket.blob(self.data_path + 'case_'+str(index)+'_volume.npy')
blob.download_to_filename('im.npy')
image = np.load('im.npy')
return image
If you have enough RAM to store the entire file in memory (while it is also loaded into numpy), you can do the read into a BytesIO
object, seek back to the beginning of the buffer, then hand it to numpy.load()
. Adapt this as necessary to your particular function:
import io
import numpy as np
from google.cloud import storage
storage_client = storage.Client()
bucket = storage_client.get_bucket('my-bucket')
blob = bucket.blob('my-file.npy')
with io.BytesIO() as in_memory_file:
blob.download_to_file(in_memory_file)
in_memory_file.seek(0)
image = np.load(in_memory_file)
# then, for example:
print(image)
At least for now there doesn't appear to be a way to actually stream the read out of GCS without writing the necessary client library yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With