Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing TSV Files in Lua

I have a very very large TSV file. The first line is headers. The following lines contain data followed by tabs or double-tabs if a field was blank otherwise the fields can contain alphanumerics or alphanumerics plus punctuation marks.

for example:

Field1<tab>Field2<tab>FieldN<newline>

The fields may contain spaces, punctuation or alphanumerics. The only thing(s) that remains true are:

  1. each field is followed by a tab except the last one
  2. the last field is followed by a newline
  3. blank fields are filled with a tab. Like all other fields they are followed by a tab. This makes them double-tab.

I've tried many combinations of pattern matching in lua and never get it quite right. Typically the fields with punctuation (time and date fields) are the ones that get me.

I need the blank fields (the ones with double-tab) preserved so that the rest of the fields are always at the same index value.

Thanks in Advance!

like image 880
Argh Tastic Avatar asked Mar 20 '26 15:03

Argh Tastic


1 Answers

Try the code below:

function test(s)
    local n=0
    s=s..'\t'
    for w in s:gmatch("(.-)\t") do
        n=n+1
        print(n,"["..w.."]")
    end
end

test("10\t20\t30\t\t50")
test("100\t200\t300\t\t500\t")

It adds a tab to the end of the string so that all fields are follow by a tab, even the last one.

like image 99
lhf Avatar answered Mar 23 '26 03:03

lhf