Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Awk to read file as a whole

Let a file with content as under -

abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq

In general if any operation using awk is performed, it iterates line by line and performs that action on each line.

For e.g:

awk '{print substr($0,8,10)}' file

O/P:

hijklmn
wxyzabc
klmnopq

I would like to know an approach in which all the contents inside the file is treated as a single variable and awk prints just one output.

Example Desired O/P:

hijklmnpqr

It's not that I wish for the desired output for the given question but in general would appreciate if anyone could suggest an approach to provide the content of a file as a whole to the awk.

like image 237
Ashish K Avatar asked Oct 21 '25 04:10

Ashish K


1 Answers

This is a gawk solution

From the docs:

There are times when you might want to treat an entire data file as a single record. The only way to make this happen is to give RS a value that you know doesn’t occur in the input file. This is hard to do in a general way, such that a program always works for arbitrary input files.


$ cat file
abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq

The RS must be set to a pattern not present in archive, following Denis Shirokov suggestion on the docs (Thanks @EdMorton):

$ gawk '{print ">>>"$0"<<<<"}' RS='^$' file
>>>abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq

abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
<<<<

The trick is in bold font:

It works by setting RS to ^$, a regular expression that will never match if the file has contents. gawk reads data from the file into tmp, attempting to match RS. The match fails after each read, but fails quickly, such that gawk fills tmp with the entire contents of the file


So:

$ gawk '{gsub(/\n/,"");print substr($0,8,10)}' RS='^$' file

Returns:

hijklmnpqr
like image 186
Juan Diego Godoy Robles Avatar answered Oct 23 '25 19:10

Juan Diego Godoy Robles