I have a huge .csv file, its size is ~ 1.4G and reading with read.csv takes time. There are several variables in that file and all i want is to extract data for few variables in a certain column.
For example, suppose ABC.csv is my file and it looks something like this:
ABC.csv
Date Variables Val
2017-11-01 X 23
2017-11-01 A 2
2017-11-01 B 0.5
............................
2017-11-02 X 20
2017-11-02 C 40
............................
2017-11-03 D 33
2017-11-03 X 22
............................
............................
So , here the variable of interest is X and while reading this file i want the df$Variables to be scanned reading only the rows with X string in this column. So that my new data from will look something like this:
> df
Date Variables Val
2017-11-01 X 23
2017-11-02 X 20
.........................
.........................
Any Help will be appreciated. Thank you in advance.
Check out the LaF package, it allows to read very large textfiles in blocks, so you don't have to read the entire file into memory.
library(LaF)
data_model <- detect_dm_csv("yourFile.csv", skip = 1) # detects the file structure
dat <- laf_open(data_model) # opens connection to the file
block_list <- lapply(seq(1,100000,1000), function(row_num){
goto(dat, row_num)
data_block <- next_block(dat, nrows = 1000) # reads data blocks of 1000 rows
data_block <- data_block[data_block$Variables == "X",]
return(data_block)
})
your_df <- do.call("rbind", block_list)
Admittedly, the package sometimes feels a bit bulky and in some situations I had to find small hacks to get my results (you might have to adapt my solution for your data). Nevertheless, I found it a immensely useful solution for dealing with files that exceeded my RAM.
Just wondering if doing this works. It worked for my code but I am not sure whether it is first reading in the entire data and then subsetting or is it only reading the part of the file where Variables == 'X'.
temp <- fread('dat.csv')[Variables == 'X']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With