I'm looking for a relatively simple method for truncating CSV header names to a given maximum length. For example a file like:
one,two,three,four,five,six,seven
data,more data,words,,,data,the end
Could limit all header names to a max of 3 characters and become:
one,two,thr,fou,fiv,six,sev
data,more data,words,,,data,the end
Requirements:
I tried a few things with awk and sed, but am not proficient at either. The closest I found was this snippet:
csvcut -c 3 file.csv |
sed -r 's/^"|"$//g' |
awk -F';' -vOFS=';' '{ for (i=1; i<=NF; ++i) $i = substr($i, 0, 2) } { printf("\"%s\"\n", $0) }' >tmp-3rd
But it was focusing on columns and also feels more complicated than necessary to use csvcut.
Any help is appreciated.
With GNU sed:
sed -E '1s/([^,]{1,3})[^,]*/\1/g' file
Output:
one,two,thr,fou,fiv,six,sev data,more data,words,,,data,the end
See: man sed and The Stack Overflow Regular Expressions FAQ
With your shown samples, please try following awk program. Simple explanation would be, setting field separator and output field separator as , Then in first line cutting short each field of first line to 3 chars as per requirement and printing them(new line after last field of first line), printing rest of lines as it is.
awk '
BEGIN { FS=OFS="," }
FNR==1{
for(i=1; i<=NF; i++){
printf("%s%s",substr($i, 1, 3),(i==NF?ORS:OFS))
}
next
}
1
' Input_file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With