I am using numpy.savetxt() to write a numpy array to a csv file, but the file that is generated is VERY large. For example, if I create a zeros array:
import numpy
test = numpy.zeros((10000,10000), dtype=numpy.float32)
numpy.savetxt('C:/datatest.csv',test,delimiter=',')
I would expect the file to be around 10,000*10,000*4 bytes (400 MB) large. (This is also what test.nbytes returns). However, the file is 2.3 GB large. Is there a reason for the large file size? I looked through the numpy documentation, there doesn't seem to be a way to specify the variable type when writing to a file. I tried other file types/delimiters, but get the same results.
The size of the native datatype differs from the size of the string representation of the datatype.
numpy.savetxt has a fmt argument that defaults to '%.18e', which formats each of your zeros as 0.000000000000000000e+00.  That is 24 characters per item plus one for the delimiter.
To get a smaller file you can change the format (beware of losing significant digits) or use numpy.save to save in binary or numpy.savez to save as a compressed archive.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With