I am using awk to add a column to a CSV file. Each line of the file starts with a date & time. I want the new column to be that time as a Unix epoch timestamp.
I'm feeding the first three fields to 'date' which does give me the timestamp, but there are some stray commas "," that I don't want. Below is a simplified version printing just that one timestamp column as output. Input file:
08/17/2020 21:46:04 -700 , 1 , 2 , 3
08/17/2020 21:47:03 -700 , 1 , 2 , 3
08/17/2020 21:48:03 -700 , 1 , 2 
08/17/2020 21:49:04 -700 , 1 , 2 
Here is my program:
cat input.csv | awk '{
       tmp=$(system("date +%s -d \""$1" "$2" "$3"\""));
       printf("%s", $tmp );
       }'
and here is the output. This is what I expect, except for the leading commas on lines 2 and 3. I think the 'tmp' variable gets the result from 'date' including a trailing newline, but also sometimes with a comma after the newline. It seems to depend on how many fields in the input file follow the first three, but I only reference the first three, as $1 $2 $3 so the rest of the input line plays no role (?) Why do those comma show up in the output, and how could I remove them? Is there a better way to do this?
1597725964
,1597726023
,1597726083
1597726144
Try:
$ while read -r line; do date +%s -d "${line%%,*}"; done < input.csv
1597725964
1597726023
1597726083
1597726144
while read -r line; do starts a while loop and reads a line from stdin.
"${line%%,*}" strips the commas and everything after them from the line.
date +%s -d "${line%%,*}" prints the date as epoch.
done completes the while loop.
<input.csv provides the stdin to the loop.
This prints the full line and adds the epoch as the final column:
$ while read line; do printf "%s, %s\n" "$line" $(date +%s -d "${line%%,*}"); done < input.csv
08/17/2020 21:46:04 -700 , 1 , 2 , 3, 1597725964
08/17/2020 21:47:03 -700 , 1 , 2 , 3, 1597726023
08/17/2020 21:48:03 -700 , 1 , 2, 1597726083
08/17/2020 21:49:04 -700 , 1 , 2, 1597726144
In awk you can use a coprocess with getline instead of system():
< input.csv awk -F' , ' '{
    "date +%s -d \047"$1"\047\n" | getline date
    print date
}'
1597725964
1597726023
1597726083
1597726144
With the help of Inian and oguz ismail in comments, and gawk, we came up with a better solution, which writes into date's stdin, instead of passing the arguments via command line to it. That's better because interpolating variables into a command line always comes with the risk of shell command injection (via input.csv).
< input.csv gawk -F' , ' '{
    cmd = "date +%s -f-";
    print $1 |& cmd;
    close(cmd, "to");
    if ((cmd |& getline line) > 0)
        print line; close(cmd)
}'
1597725964
1597726023
1597726083
1597726144
Thanks to both!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With