Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fread with gunzip: What's the more memory efficient way?

If I have a large data file which is zipped with gzip, say dat.gz, what is more memory efficient?

mydat <- fread("gunzip -c dat.gz")

or, first unzip/uncompress the file to say, dat, and then do

mydat <- fread("dat")

I'm concerned with memory rather than speed, to prevent R from crashing.

like image 849
ved Avatar asked Dec 03 '25 14:12

ved


1 Answers

I wrote a 5000x5000 matrix to temp.csv and profiled the memory usage of the two approaches using profvis:

profvis({system("gunzip -c temp.csv.gz > temp.csv"); mat <- fread("temp.csv")})

Memory usage: 190.9 MB

profvis({fread("gunzip -c temp.csv.gz")})

Memory usage: 190.8 MB

I ran it several times, and the memory usage fluctuated between 190-191 for both commands. So I conclude that the memory usage is the same.

like image 196
thc Avatar answered Dec 06 '25 06:12

thc



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!