In the context of the bash shell and command output:
I'm mostly after a way to do this in bash shell/script, but I'm not averse to a programming language approach.
Sample Worst Case Data:
Name value 1 empty_col simpleHeader complex multi-header
foo bar -someVal1 1someOtherVal
monty python circus -someVal2 2someOtherVal
exactly the field_widthNextVal -someVal3 3someOtherVal
My current approach: The best I have come up with is redirecting the output to a file, then using a ruler/index type of feature in the editor to manually work out field widths. I'm hoping there is a smarter/faster way...
What I'm thinking:
With Headers:
Perhaps an approach that measures from the first character 'to the next character that is encountered, after having already encountered multiple spaces'?
Without Headers:
Drawing a bit of a blank on this one....?
This strikes me as the kind of problem that was cracked about 40 years ago though, so I'm guessing there are better solutions than mine to this stuff...
Column Widths
fieldwidths=$(head -n 1 file | grep -Po '\S+\s*' | awk '{printf "%d ", length($0)}')
This is proving to be helpful for determining column widths. I don't fully understand how it works yet to provide a complete answer, but it might be helpful to a future someone else. Source: https://unix.stackexchange.com/questions/465170/parse-output-with-dynamic-col-widths-and-empty-fields
File Examination
Redirect output to a file:
command > file.data
Use hexdump
or xxd
against file.data
to look at it's raw information. See links for some basics on those tools:
hexdump output vs xxd output
https://nwsmith.blogspot.com/2012/07/hexdump-and-xxd-output-compared.html?m=1
hexdump
https://man7.org/linux/man-pages/man1/hexdump.1.html
https://linoxide.com/linux-how-to/linux-hexdump-command-examples/
https://www.geeksforgeeks.org/hexdump-command-in-linux-with-examples/
xxd
https://linux.die.net/man/1/xxd
https://www.howtoforge.com/linux-xxd-command/
tl;dr:
# Determine Column Widths
# Source for this voodoo:
# https://unix.stackexchange.com/a/465178/266125
fieldwidths=$(echo "$(appropriate-command)" | head -n 1 | grep -Po '\S+\s*' | awk '{printf "%d ", length($0)}' | sed 's/^[ ]*//;s/[ ]*$//')
# Iterate
while IFS= read -r line
do
# You can do put awk command in a separate line if this is clearer to you
awkcmd="BEGIN {FIELDWIDTHS=\"$fieldwidths\"}{print \$1}"
field1="$(echo "$line" | awk "$awkcmd" | sed 's/^[ ]*//;s/[ ]*$//')"
# Or do it all in one line if you prefer:
field2="$(echo "$line" | awk "BEGIN {FIELDWIDTHS=\"$fieldwidths\"}{print \$2}" | sed 's/^[ ]*//;s/[ ]*$//')"
*** Code Stuff Here ***
done <<< $(appropriate-command)
Some explanation of the above - for newbies (like me)
Okay, so I'm a complete newbie, but this is my answer, based on a grand total of about two days of clawing around in the dark. This answer is relevant to those who are also new and trying to process data in the bash shell and bash scripts.
Unlike the *nix wizards and warlocks that have presented many of the solutions you will find to specific problems (some impressively complex), this is just a simple outline to help people understand what it is that they probably don't know; that they don't know. You will have to go and look this stuff up separately, it's way to big to cover it all here.
I would strongly suggest just buying a book/video/course for shell scripting. You do learn a lot doing it the school of hard knocks way as I have for the last couple of days, but it's proving to be painfully slow. The devil is very much in the details with this stuff. A good structured course probably instils good habits from the get go too, rather than potentially developing your own habits/short hand 'that seems to work' but will likely and unwittingly, bite you later on.
Bash references:
https://linux.die.net/man/1/bash
https://tldp.org/LDP/Bash-Beginners-Guide
https://www.gnu.org/software/bash/manual/html_node
Common Bash Mistakes, Traps and Pitfalls:
https://mywiki.wooledge.org/BashPitfalls
http://www.softpanorama.org/Scripting/Shellorama/Bash_debugging/typical_mistakes_in_bash_scripts.shtml
https://wiki.bash-hackers.org/scripting/newbie_traps
My take is that there is no 'one right way that works for everything' to achieve this particular task of processing fixed width command output. Notably, the fixed widths are dynamic and might changed each time the command is run. It can be done somewhat haphazardly using standard bash tools (it depends on the types of values in each field, particularly if they contain whitespace or unusual/control characters). That said, expect any fringe cases to trip up the 'one bash pipeline to parse them all' approach, unless you have really looked at your data and it's quite well sanitised.
Pre-reqs:
To get much out of all this:
IFS= read -r line
(and it's variants) work, it's one way of processing multiple lines of data, one line at a time. When doing this, you need to be aware of how things are expanded differently by the shell.\
than you would expect amongst the hieroglyphics.>
(overwrites without prompting) and >>
(which appends to any existing data).if [ cond ]
is not necessarily the same as if [[ cond ]]
bash -x script.sh
is useful for debugging. Targeted debugging of specific lines is done by using set -x
lines of code to debug set +x
within the script.As for the fixed width data:
If it's delimited:
Use the delimiter. Most *nix tools use a single white space as a default delimiter, but you can typically also set a specific delimiter (google how to do it for the specific tool).
Optional Step:
If there is no obvious delimiter, you can check to see if there is some secret hidden delimiter to take advantage of. There probably isn't, but you can feel good about yourself for checking. This is done by looking at the hex data in the file. Redirect the output of a command to a file (if you don't have the data in a file already). Do it using command > file.data
and then explore file.data using hexdump -Cv file.data
(another tool is xxd).
If you're stuck with fixed width:
Basically to do something useful, you need to:
Starting Guidelines:
Feed the pipe/script an entire line at a time, then chop up fields (unless you really know what you are doing). Doing the field separation inside any loops such as while IFS= read -r line; do stuff; done
is less error prone (in terms of the 'what is my pipe actually seeing' problem. When I was doing it outside, it tended to produce more scenarios where the data was being modified without me understanding that it was being altered (let alone why), before it even reached the pipe/script. This obviously meant I got extremely confused as to why a pipe that worked in one setting on the command line, fell over when I 'feed the same data' in a script or by some other method (but the pipe really wasn't actually getting the same data). This comes back to preserving whitespace with fixed-width data, particularly during expansion and redireciton, process substitiution and command substitution. Typically it amounts to liberal use of double quotes when calling a variable, i.e. not $someData
but "$someData"
. Use parenthesis to clear up which var you are talking about, i.e. ${var}bar
. Similarly when capturing the entire output of a command.
If there is nothing to leverage as a delimiter, you have some choices. Hack away directly at the fixed width data using tools like:
cut -c n1-n2
this directly cuts things out, starting from character n1 through to n2.
awk '{print $1}'
this uses a single space by default to separate fields and print the first field.
Or, you can try to be a bit more scientific and 'measure twic, cut once'.
fieldwidths=$(head -n 1 file | grep -Po '\S+\s*' | awk '{printf "%d ", length($0)}')
echo $fieldwidths
You can also look at all the data to see what length of data you are seeing in each field, and if you are actually getting the number of fields you expect (Thanks to David C. Rankin for this one!):
awk '{ for (i=1; i<=NF; i++) printf "%d\t",length($i) } {print ""}' file.data
awk 'BEGIN {FIELDWIDTHS=$10 20 30 10}{print $1}
For command output with dynamic field widths, if you feed it into a while IFS= read -r line; do; done
loop, you will need to parse the output using the awk above, as each time the field widths might have changed. Since I originally couldn't get the expansion right, I built the awk command on a separate line and stored it in a variable, which I then called in the pipe. Once you have it figured out though, you can just shove it all back into one line if you want:
# Determine Column Widths:
# Source for this voodoo:
# https://unix.stackexchange.com/a/465178/266125
fieldwidths=$(echo "$(appropriate-command)" | head -n 1 | grep -Po '\S+\s*' | awk '{printf "%d ", length($0)}' | sed 's/^[ ]*//;s/[ ]*$//')
# Iterate
while IFS= read -r line
do
# Separate the awk command if you want:
# This uses GNU awk to split the column widths and pipes it to sed to remove leading and trailing spaces.
awkcmd="BEGIN {FIELDWIDTHS=\"$fieldwidths\"}{print \$1}"
field1="$(echo "$line" | awk "$awkcmd" | sed 's/^[ ]*//;s/[ ]*$//')"
# Or do it all in one line, rather than two:
field2="$(echo "$line" | awk "BEGIN {FIELDWIDTHS=\"$fieldwidths\"}{print \$2}" | sed 's/^[ ]*//;s/[ ]*$//')"
if [ "${DELETIONS[0]}" == 'all' ] && [ "${#DELETIONS[@]}" -eq 1 ] && [ "$field1" != 'UUID' ]; then
*** Code Stuff ***
fi
*** More Code Stuff ***
done <<< $(appropriate-command)
Remove excess whitespace using various approaches:
tr -d '[:blank:]
and/or tr -d '[:space:]
(the later eliminates new lines and vertical whitespace, not just horizontal like :blank: does. They both also remove internal whitespace).sed s/^[ ]*//;s/[ ]*$//
this cleans up only leading and trailing whitespace.Now you should basically have clean, separated fields to work with one at a time, having started from multi-field, multi-line command output.
Once you get what is going on fairly well with the above, you can start to look into other more elegant approaches as presented in these answers:
Finding Dynamic Field Widths:
https://unix.stackexchange.com/a/465178/266125
Using perl's unpack
:
https://unix.stackexchange.com/a/465204/266125
Awk and other good answers:
https://unix.stackexchange.com/questions/352185/awk-fixed-width-columns
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With