Extract values from a fixed-width column

Question

I have text file named file that contains the following:

Australia              AU 10
New Zealand            NZ  1
...

If I use the following command to extract the country names from the first column:

awk '{print $1}' file

I get the following:

Australia
New
...

Only the first word of each country name is output.

How can I get the entire country name?

Raymond Hettinger · Accepted Answer

Try this:

$ awk '{print substr($0,1,15)}' file
Australia
New Zealand

mklement0 · Answer

To complement Raymond Hettinger's helpful POSIX-compliant answer:

It looks like your country-name column is 23 characters wide.

In the simplest case, if you don't need to trim trailing whitespace, you can just use cut:

# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia              
New Zealand

Caveat: GNU cut is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly.

To trim trailing whitespace, you can take advantage of GNU awk's nonstandard FIELDWIDTHS variable:

# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand

FIELDWIDTHS=23 declares the first field (reflected in $1) to be 23 characters wide.
sub(" +$", "", $1) then removes trailing whitespace from $1 by replacing any nonempty run of spaces (" +") at the end of the field ($1) with the empty string.

However, your Linux distro may come with Mawk rather than GNU Awk; use awk -W version to determine which one it is.

For a POSIX-compliant solution that trims trailing whitespace, extend Raymond's answer:

# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand

Donate For Us