Readtable() with differing number of columns - Julia

Question

I'm trying to read a CSV file into a DataFrame using readtable(). There is an unfortunate issue with the CSV file in that if the last x columns of a given row are blank, instead of generating that number of commas, it just ends the line. For example, I can have:

Col1,Col2,Col3,Col4
item1,item2,,item4
item5

Notice how in the third line, there is only one entry. Ideally, I would like readtable to fill the values for Col2, Col3, and Col4 with NA, NA, and NA; however, because of the lack of commas and therefore lack of empty strings, readtable() simply sees this as a row that doesn't match the number of columns. If I run readtable() in Julia with the sample CSV above, I get the error "Saw 2 Rows, 2 columns, and 5 fields, * Line 1 has 6 columns". If I add in 3 commas after item5, then it works.

Is there any way around this, or do I have to fix the CSV file?

Dan Getz · Accepted Answer

If the CSV parsing doesn't need too much quote logic, it is easy to write a special purpose parser to handle the case of missing columns. Like so:

function bespokeread(s)
  headers = split(strip(readline(s)),',')
  ncols = length(headers)
  data = [String[] for i=1:ncols]
  while !eof(s)
    newline = split(strip(readline(s)),',')
    length(newline)<ncols && append!(newline,["" for i=1:ncols-length(newline)])
    for i=1:ncols
      push!(data[i],newline[i])
    end
  end
  return DataFrame(;OrderedDict(Symbol(headers[i])=>data[i] for i=1:ncols)...)
end

Then the file:

Col1,Col2,Col3,Col4
item1,item2,,item4
item5

Would give:

julia> df = bespokeread(f)
2×4 DataFrames.DataFrame
│ Row │ Col1    │ Col2    │ Col3 │ Col4    │
├─────┼─────────┼─────────┼──────┼─────────┤
│ 1   │ "item1" │ "item2" │ ""   │ "item4" │
│ 2   │ "item5" │ ""      │ ""   │ ""      │

Readtable() with differing number of columns - Julia

Tags:

dataframe

csv

julia

Brandon Edwards

1 Answers

Dan Getz

Recent Activity

Donate For Us

Readtable() with differing number of columns - Julia

Tags:

dataframe

csv

julia

Brandon Edwards

1 Answers

Dan Getz

Related questions

Recent Activity

Donate For Us