Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table problem when read file with inconsistent column

When I use R data.table(fread) to read dat file (3GB) a problem occurs:

Stopped early on line 3169933. Expected 136 fields but found 138. Consider fill=TRUE and comment.char=. First discarded non-empty line:

enter image description here

My code:

library(data.table)
file_path = 'data.dat' # 3GB
fread(file_path,fill=TRUE)

The problem is that my file has ~ 5 million rows. In detail:

  • From row 1 to row 3169933 it has 136 columns
  • From row 3169933 to row 5000000 it has 138 columns

fread() only reads my file to row 3169933 due to this error. fill = TRUE did not help in this case. Could anyone help me ?

R version: 3.6.3 data.table version: 1.13.2

Note about fill=TRUE in this case:

[Case 1- not my case] if part 1 of my file (50% rows) have 138 columns and part 2 have 136 columns then the fill=TRUE will help (it will fill two column in part 2 with NA)

[Case 2- my case] if part 1 of my file (50% rows) have 136 columns and part 2 have 138 columns then the fill =TRUE will not help in this case.

like image 989
duy ngọc Avatar asked Mar 01 '26 06:03

duy ngọc


1 Answers

Not sure why you still have the problem even with fill=T... But if nothing helps, you can try playing with something like this:

tryCatch(
  expr    = {dt1 <<- fread(file_path)},
  warning = function(w){
    cat('Warning: ', w$message, '\n\n');
    n_line <- as.numeric(gsub('Stopped early on line (\\d+)\\..*','\\1',w$message))
    if (!is.na(n_line)) {
      cat('Found ', n_line,'\n')
      dt1_part1 <- fread(file_path, nrows=n_line)
      dt1_part2 <- fread(file_path, skip=n_line)
      dt1 <<- rbind(dt1_part1, dt1_part2, fill=T)
    }
  },
  finally = cat("\nFinished. \n")
);

tryCatch() construct catches warning message so you can extract the line number and process it accordingly.

like image 73
Vasily A Avatar answered Mar 02 '26 21:03

Vasily A