If I have a large data file which is zipped with gzip, say dat.gz, what is more memory efficient?
mydat <- fread("gunzip -c dat.gz")
or, first unzip/uncompress the file to say, dat, and then do
mydat <- fread("dat")
I'm concerned with memory rather than speed, to prevent R from crashing.
I wrote a 5000x5000 matrix to temp.csv and profiled the memory usage of the two approaches using profvis:
profvis({system("gunzip -c temp.csv.gz > temp.csv"); mat <- fread("temp.csv")})
Memory usage: 190.9 MB
profvis({fread("gunzip -c temp.csv.gz")})
Memory usage: 190.8 MB
I ran it several times, and the memory usage fluctuated between 190-191 for both commands. So I conclude that the memory usage is the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With