Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Metaprogramming in awk, convert file to html table format

I have the following files:

table.txt (comma separate)

1,Example Title
COL1,COL2,COL3,COL4,COL5
BRCC,ACGC,15869,105A,1
BCAS,GAAG,73345,369T,2

template.awk

function parse_print(s){
    s = gensub(/^\s+|\s+$/,"","g",s)
    s = gensub(/[\42]/,"\\\\\042","g",s)
    s = gensub(/\$[0-9]+/,"\" & \"","g",s)
    s = gensub(/\$e/,"\" & \"","g",s)
    return s;
}

/^[^%]/{print "print \"" parse_print($0) "\""; next}
/^%BEGIN$|^%END$/{print substr($1,2) "{"; next}
/^%END.+$/{print "}"; next}
{print substr($1,2) "{"}
{
    if($2 == "%FOREACH"){
        pprint = gensub(/(\S+\s+){2}(.*)/,"\\2","g")
        print "for(e=1; e<=NF; ++e) print \"" parse_print(pprint) "\""
    }else{
        pprint = gensub(/\S+\s+(.*)/,"\\1","g")
        print "print \"" parse_print(pprint) "\""
    }
}
{print "}"}

table.tawk

%BEGIN
                    <style>
                        .my_table {border-bottom:3px double black; border-collapse: collapse; }
                        .my_table tr.header{border-bottom:3px double black;}
                        .my_table td{text-align: center;}
                    </style>
                    <table class="my_table">
%ENDBEGIN
%NR==1              <caption>Table $1. $2</caption>
%NR==2              <tr class="header">
%NR>2               <tr>
%NR==2  %FOREACH    <th>$e</th>
%NR>2   %FOREACH    <td>$e</td>
%NR!=1              </tr>
%END
                    </table>
%ENDEND

metaprogramming.sh

#!/bin/sh
# metaprogram
awk '@include "template"' $1 > .table.awk
awk -vFS="," -f .table.awk $2
rm .table.awk

The idea was to use metaprogramming to separate the logic of the presentation, this based in comment of @kent in How to format text in html using awk's question for to convert text file to html table format.

./metaprogramming.sh table.tawk table.txt > table.html

this gets,

<style>
.my_table {border-bottom:3px double black; border-collapse: collapse; }
.my_table tr.header{border-bottom:3px double black;}
.my_table td{text-align: center;}
</style>
<table class="my_table">
<caption>Table 1. Example Title</caption>
<tr class="header">
<th>COL1</th>
<th>COL2</th>
<th>COL3</th>
<th>COL4</th>
<th>COL5</th>
</tr>
<tr>
<td>BRCC</td>
<td>ACGC</td>
<td>15869</td>
<td>105A</td>
<td>1</td>
</tr>
<tr>
<td>BCAS</td>
<td>GAAG</td>
<td>73345</td>
<td>369T</td>
<td>2</td>
</tr>
</table>

Question 1

Is there a way to do the calling without creating the temporary file .table.awk, even, without to use bash script (awk direct calling) ?

Question Bonus

Is there a way to do this better? is there a library in awk that already does this?

like image 746
Jose Ricardo Bustos M. Avatar asked May 12 '26 08:05

Jose Ricardo Bustos M.


1 Answers

TXR is a tool which provides a language for template-based extraction and formatting of data, combined with an original Lisp dialect:

In format.txr we have:

@num,@title
@(coll)@{heading /[^,]+/}@(end)
@(collect)
@  (coll)@{data /[^,]+/}@(end)
@(end)
@(output :filter :tohtml)
<style>
.my_table {border-bottom:3px double black; border-collapse: collapse; }
.my_table tr.header{border-bottom:3px double black;}
.my_table td{text-align: center;}
</style>
<table class="my_table">
<caption>Table @num. @title</caption>
<tr class="header">
@  (repeat)
<th>@heading</th>
@  (end)
</tr>
@  (repeat)
<tr>
@    (repeat)
<td>@data</td>
@    (end)
</tr>
@  (end)
</table>
@(end)

We apply it to the data file like this:

$ txr format.txr data 
<style>
.my_table {border-bottom:3px double black; border-collapse: collapse; }
.my_table tr.header{border-bottom:3px double black;}
.my_table td{text-align: center;}
</style>
<table class="my_table">
<caption>Table 1. Example Title</caption>
<tr class="header">
<th>COL1</th>
<th>COL2</th>
<th>COL3</th>
<th>COL4</th>
<th>COL5</th>
</tr>
<tr>
<td>BRCC</td>
<td>ACGC</td>
<td>15869</td>
<td>105A</td>
<td>1</td>
</tr>
<tr>
<td>BCAS</td>
<td>GAAG</td>
<td>73345</td>
<td>369T</td>
<td>2</td>
</tr>
</table>

Note that the :filter :tohtml takes care of escaping characters for HTML; if the data contains & for instance, we get &amp; and so on.

The vertical collect and horizontal coll directives implicitly shore up the matched pattern variables into nested lists; repeat implicitly unwraps lists, so that just simple variable references like @data appear in both the input matching section and in the output.

Here is what that looks like with syntax highlighting under Vim, under which it is very clear what is just template material and what is TXR syntax:

Screenshot of code with syntax coloring

like image 123
Kaz Avatar answered May 14 '26 22:05

Kaz