I have text file named file
that contains the following:
Australia AU 10
New Zealand NZ 1
...
If I use the following command to extract the country names from the first column:
awk '{print $1}' file
I get the following:
Australia
New
...
Only the first word of each country name is output.
How can I get the entire country name?
Try this:
$ awk '{print substr($0,1,15)}' file
Australia
New Zealand
To complement Raymond Hettinger's helpful POSIX-compliant answer:
It looks like your country-name column is 23 characters wide.
In the simplest case, if you don't need to trim trailing whitespace, you can just use cut
:
# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia
New Zealand
Caveat: GNU cut
is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly.
To trim trailing whitespace, you can take advantage of GNU awk
's nonstandard FIELDWIDTHS
variable:
# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand
FIELDWIDTHS=23
declares the first field (reflected in $1
) to be 23 characters wide.
sub(" +$", "", $1)
then removes trailing whitespace from $1
by replacing any nonempty run of spaces (" +"
) at the end of the field ($1
) with the empty string.
However, your Linux distro may come with Mawk rather than GNU Awk; use awk -W version
to determine which one it is.
For a POSIX-compliant solution that trims trailing whitespace, extend Raymond's answer:
# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With