I'm reading a file into R using fread using below methods:
fread("file:///C:/Users/Desktop/ads.csv")   fread("C:/Users/Desktop/ads.csv")       # Just omitted "file:///"   I've observed the runtime to be very different:
microbenchmark(   fread("file:///C:/Users/Desktop/ads.csv"),   fread("C:/Users/Desktop/ads.csv") )  Unit: microseconds                           expr               min        lq      mean     median       uq       max    neval cld fread("file:///C:/Users/Desktop/ads.csv") 5755.975 6027.4735 6696.7807 6235.3365 6506.652 41257.476   100   b   fread("C:/Users/Desktop/ads.csv")          525.492  584.0215  673.7166  647.4745  727.703  1476.191   100   a    Why does the run-time vary so much? There isn't noticeable difference between 2 variants when I was using read.csv() though
For files beyond 100 MB in size fread() and read_csv() can be expected to be around 5 times faster than read. csv() .
Conclusion: For sequential access, both fread and ifstream are equally fast.
table package is an extremely useful and easy to use. Its fread() function is meant to import data from regular delimited files directly into R, without any detours or nonsense. Note that “regular” in this case means that every row of your data needs to have the same number of columns.
table package comes with a function called fread which is a very efficient and speedy function for reading data from files. It is similar to read. table but faster and more convenient.
The following has been added to ?fread:
When
inputbegins with http://, https://, ftp://, ftps://, or file://,freaddetects this and downloads the target to a temporary file (attempfile()) before proceeding to read the file as usual. Secure URLS (ftps:// and https://) are downloaded withcurl::curl_download; ftp:// and http:// paths are downloaded withdownload.fileandmethodset togetOption("download.file.method"), defaulting to"auto"; and file:// is downloaded withdownload.filewithmethod="internal". NB: this implies that for file://, even files found on the current machine will be "downloaded" (i.e., hard-copied) to a temporary file. See?download.filefor more details.
From the source of fread:
if (str6 == "ftp://" || str7 == "http://" || str7 == "file://") {   method = if (str7 == "file://") "auto"            else getOption("download.file.method", default = "auto")   download.file(input, tmpFile, method = method, mode = "wb", quiet = !showProgress) } That is, your file is being "downloaded" to a temporary file, which should consist of deep-copying the contents of the file to a temporary location. file:// is not really intended for use on local files, but on files in a network that need to be downloaded locally before being read (IIUC; FWIW, this is what fread's testing regime uses to imitate file download while testing on CRAN, where external file download is impossible).
I also notice that your timings are on the order of microseconds, which could explain the discrepancy vs. read.csv. Imagine read.csv takes 1 second to read the file, while fread takes .01 seconds; file copying takes .05 seconds. Then in both cases read.csv will look about the same (1 vs 1.05 seconds), while fread looks substantially slower for the file:// case (.01 vs. .06 seconds).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With