I have a 50,000x5,000 matrix(float) file. when use x = np.genfromtxt(readFrom, dtype=float) to load the file into memory, I am getting the following error message:
File "C:\Python27\lib\site-packages\numpy\lib\npyio.py", line 1583, in genfromtxt for (i, converter) in enumerate(converters)])
MemoryError
I want to load the whole file into memory because I am calculating the euclidean distance between each vectors using Scipy. dis = scipy.spatial.distance.euclidean(x[row1], x[row2])
Is there any efficient way to load a huge matrix file into memory.
Thank you.
Update:
I managed to solve the problem. Here is my solution. I am not sure whether it's efficient or logically correct but works fine for me:
x = open(readFrom, 'r').readlines()
y = np.asarray([np.array(s.split()).astype('float32') for s in x], dtype=np.float32)
....
dis = scipy.spatial.distance.euclidean(y[row1], y[row2])
Please help me to improve my solution.
Depending on your OS and Python version, it's quite likely that you'll never be able to allocate a 1GB array (mgilson's answer is spot on here). The problem is not that you're running out of memory, but that you're running out of contiguous memory. If you're on a 32-bit machine (especially running Windows), it will not help to add more memory. Moving to a 64-bit architecture would probably help.
Using smaller data types can certainly help; depending on the operations you use, a 16-bit float or even an 8-bit int might be sufficient.
If none of this works, then you're forced to admit that the data just doesn't fit in memory. You'll have to process it piecewise (in this case, storing the data as an HDF5 array might be very useful).
You're actually using 8 byte floats since python's float corresponds to C's double (at least on most systems):
a=np.arange(10,dtype=float)
print(a.dtype) #np.float64
You should specify your data type as np.float32. Depending on your OS, and whether it is 32bit or 64bit, (and whether you're using 32bit python vs. 64bit python), the address space available for numpy to use could be smaller than your 4Gb which could be an issue here as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With