Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I download from Google storage blobs into a VM as an n-d array?

I have a Google Cloud Engine VM and am trying to grab data from my cloud storage - which is in the form of a blob, and turn it into a np array with the same shape as it was when stored.

Currently the only way I can get this working is by downloading to file and then loading into a numpy array which seems sub-optimal

I have tried downloading as a string array directly and converting into numpy array but the dimensions are not maintained (they are flattened).

I could move all files to the VM instead but would rather read 'on-the-fly' if possible?

Current code:

def __getitem__(self, index):
    index = int(self.indexes[int(index)])
    blob = bucket.blob(self.data_path + 'case_'+str(index)+'_volume.npy') 
    blob.download_to_filename('im.npy')
    image = np.load('im.npy')
    return image
like image 478
Chris Culley Avatar asked Sep 06 '25 01:09

Chris Culley


1 Answers

If you have enough RAM to store the entire file in memory (while it is also loaded into numpy), you can do the read into a BytesIO object, seek back to the beginning of the buffer, then hand it to numpy.load(). Adapt this as necessary to your particular function:

import io
import numpy as np
from google.cloud import storage

storage_client = storage.Client()
bucket = storage_client.get_bucket('my-bucket')

blob = bucket.blob('my-file.npy')

with io.BytesIO() as in_memory_file:
  blob.download_to_file(in_memory_file)
  in_memory_file.seek(0)
  image = np.load(in_memory_file)

# then, for example:
print(image)

At least for now there doesn't appear to be a way to actually stream the read out of GCS without writing the necessary client library yourself.

like image 75
robsiemb Avatar answered Sep 08 '25 21:09

robsiemb