Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract values from a fixed-width column

Tags:

linux

awk

I have text file named file that contains the following:

Australia              AU 10
New Zealand            NZ  1
...

If I use the following command to extract the country names from the first column:

awk '{print $1}' file

I get the following:

Australia
New
...

Only the first word of each country name is output.

How can I get the entire country name?

like image 400
p. Trinx Avatar asked Sep 06 '25 03:09

p. Trinx


2 Answers

Try this:

$ awk '{print substr($0,1,15)}' file
Australia
New Zealand
like image 187
Raymond Hettinger Avatar answered Sep 07 '25 19:09

Raymond Hettinger


To complement Raymond Hettinger's helpful POSIX-compliant answer:

It looks like your country-name column is 23 characters wide.

In the simplest case, if you don't need to trim trailing whitespace, you can just use cut:

# Works, but has trailing whitespace.
$ cut -c 1-23 file
Australia              
New Zealand            

Caveat: GNU cut is not UTF-8 aware, so if the input is UTF-8-encoded and contains non-ASCII characters, the above will not work correctly.


To trim trailing whitespace, you can take advantage of GNU awk's nonstandard FIELDWIDTHS variable:

# Trailing whitespace is trimmed.
$ awk -v FIELDWIDTHS=23 '{ sub(" +$", "", $1); print $1 }' file
Australia
New Zealand
  • FIELDWIDTHS=23 declares the first field (reflected in $1) to be 23 characters wide.

  • sub(" +$", "", $1) then removes trailing whitespace from $1 by replacing any nonempty run of spaces (" +") at the end of the field ($1) with the empty string.

However, your Linux distro may come with Mawk rather than GNU Awk; use awk -W version to determine which one it is.


For a POSIX-compliant solution that trims trailing whitespace, extend Raymond's answer:

# Trailing whitespace is trimmed.
$ awk '{ c=substr($0, 1, 23); sub(" +$", "", c); print c}' file
Australia
New Zealand
like image 29
mklement0 Avatar answered Sep 07 '25 21:09

mklement0