Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK - Processing multiple file through for loop and conditional check

Tags:

bash

awk

File 1: myfilename_WEEK.csv

w27_2018,257,1,26.20,0.00,24.26
w28_2018,257,1,7.97,0.00,24.26
w29_2018,257,1,34.86,0.00,24.26
w30_2018,257,1,3.29,0.00,24.26

File 2: myfilename_MONTH.csv

m07_2018,257,1,94.78,0.00,121.31
m08_2018,257,1,719.60,0.00,262.47
m09_2018,257,1,14925.60,0.00,13903.24
m10_2018,257,1,51099.66,0.00,81600.69

File 3: myfilename_HALF.csv

h02_2018,257,1,155345.19,480029.21,235802.91
h01_2019,257,1,273961.84,552545.36,140706.27
h02_2018,258,1,3250552.06,1299785.91,3697749.57
h01_2019,258,1,3582585.66,2670427.72,4009391.28

calendar_file:

20180805,08/05/2018,w27_2018,WK27 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,27,WEEK 27,01,SUNDAY
20180806,08/06/2018,w27_2018,WK27 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,27,WEEK 27,02,MONDAY
...
20180811,08/11/2018,w27_2018,WK27 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,27,WEEK 27,07,SATURDAY
20180812,08/12/2018,w28_2018,WK28 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,28,WEEK 28,01,SUNDAY
..
20180816,08/16/2018,w28_2018,WK28 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,28,WEEK 28,05,THURSDAY

Expected output (newlines added for readability):

2018,w27_2018,WK27 2018,257,1,26.20,0.00,24.26
2018,w27_2018,WK27 2018,258,1,97192.07,9028.38,52130.32
2018,w27_2018,WK27 2018,300,1,181.44,0.00,-69.72

2018,m07_2018,AUG 2018,257,1,94.78,0.00,121.31
2018,m07_2018,AUG 2018,258,1,509253.46,45141.91,399648.71
2018,m07_2018,AUG 2018,300,1,409.10,0.00,-348.60

2018,h02_2018,H02 2018,257,1,155345.19,480029.21,235802.91
2018,h02_2018,H02 2018,258,1,3250552.06,1299785.91,3697749.57
2018,h02_2018,H02 2018,300,1,1112.93,0.00,-1164.35

I would like to join all myfilename_* to add a label and Fiscal Year using calendar_file:

Individual commands are:

awk -F, 'NR==FNR {a[$3]=substr($12,3,4) FS $3 FS $4; next} {print a[$1] FS $2 FS $3 FS $4 FS $5 FS $6}' calendar_file myfilename_WEEK.csv >> my_report.csv

awk -F, 'NR==FNR {a[$5]=substr($12,3,4) FS $5 FS $6; next} {print a[$1] FS $2 FS $3 FS $4 FS $5 FS $6}' calendar_file myfilename_MONTH.csv >> my_report.csv

awk -F, 'NR==FNR {a[$9]=substr($12,3,4) FS $9 FS $10; next} {print a[$1] FS $2 FS $3 FS $4 FS $5 FS $6}' calendar_file myfilename_HALF.csv >> my_report.csv

I am trying to join all of these into a single loop:

I have tried the following but it doesn't work:

    for exp_file in `ls myfilename_*.csv`
     do
     awk -F, '\
     { \
        if(NR==FNR && FILENAME ~ /WEEK/) {a[$3]=substr($12,3,4) FS $3 FS $4; next} ;\
        if(NR==FNR && FILENAME ~ /MONTH/) {a[$5]=substr($12,3,4) FS $5 FS $6; next} ;\
        if(NR==FNR && FILENAME ~ /HALF/) {a[$9]=substr($12,3,4) FS $9 FS $10; next} ;\
       {print a[$1] FS $2 FS $3 FS $4 FS $5 FS $6} \
     }' calendar_file $exp_file >> my_report.csv
     done

How can I achieve this? Thanks for your help in advance!

like image 956
acs005 Avatar asked May 09 '26 16:05

acs005


2 Answers

Firt way(GNU awk, if you don't have GNU awk please leave comment):

awk -F, 'NR==FNR{y=substr($12,3,4); a[$3]=y FS $3 FS $4; b[$5]=y FS $5 FS $6; c[$9]=y FS $9 FS $10; next} FNR==1{printf nl;nl="\n"} match(FILENAME, /myfilename_([A-Z]*)/, f){NF=6;switch(f[1]){case "WEEK": $1=a[$1];break; case "MONTH": $1=b[$1];break; case "HALF": $1=c[$1];}}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv

Multiple lines for readability:

awk -F, '
NR==FNR{
    y=substr($12,3,4); 
    a[$3]=y FS $3 FS $4; 
    b[$5]=y FS $5 FS $6; 
    c[$9]=y FS $9 FS $10; 
    next
} 
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
match(FILENAME, /myfilename_([A-Z]*)/, f){
    NF=6;  ## To limit results for 6 columns only, can remove it here.
    switch(f[1]){
    case "WEEK": 
        $1=a[$1];
        break; 
    case "MONTH": 
        $1=b[$1];
        break; 
    case "HALF": 
        $1=c[$1];
    }
}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv

An update to it:

awk -F, '
NR==FNR{
    y=substr($12,3,4); 
    a[$3]=y FS $3 FS $4; 
    b[$5]=y FS $5 FS $6; 
    c[$9]=y FS $9 FS $10; 
    next
} 
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
match(FILENAME, /myfilename_([A-Z]*)/, f){
    NF=6;  ## To limit results for 6 columns only, can remove in your case.
    $1 = f[1]=="WEEK" ? a[$1] : ( f[1]=="MONTH" ? b[$1] : (f[1]=="HALF" ? c[$1] : $1) )
}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv

Second Way, more concise and without using switch (also GNU awk):

awk -F, '
NR==FNR{
    y=substr($12,3,4); 
    a[$3 "WEEK"]=y FS $3 FS $4; 
    a[$5 "MONTH"]=y FS $5 FS $6; 
    a[$9 "HALF"]=y FS $9 FS $10; 
    next
} 
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
match(FILENAME, /myfilename_([A-Z]*)/, f){
    $1=a[$1 f[1]];
}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv

Third way: If your data are all corresponding to their filenames, like you showed in your samples, there's a third way which removes the need of match, so it can work on other awks:

awk -F, '
NR==FNR{
    y=substr($12,3,4); 
    a[$3 "w"]=y FS $3 FS $4; 
    a[$5 "m"]=y FS $5 FS $6; 
    a[$9 "h"]=y FS $9 FS $10; 
    next
} 
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
$1~/^([wmh])[0-9]{2}_[0-9]{4}/{   ## Check first fields to make sure it matches, the checking is optional if your data is all like you showed.
    $1=a[$1 substr($1,1,1)]
    print
}' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv

Given a second thought, based on your data-filename relations, there's actually no need to check the first letter (nor the filename):

awk -F, '
NR==FNR{
    y=substr($12,3,4); 
    a[$3]=y FS $3 FS $4; 
    a[$5]=y FS $5 FS $6; 
    a[$9]=y FS $9 FS $10; 
    next
} 
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
{   ## Add  $1~/^([wmh])[0-9]{2}_[0-9]{4}/  to the beginning of this line if  you want to check and make sure first column.
    $1=a[$1]
}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv
like image 61
Tiw Avatar answered May 11 '26 08:05

Tiw


Here is another awk solution which is portable, efficient and doesn't rely on input filenames but the order they are given on command line.

awk -F ',' -v OFS=',' '
NR==FNR {
  y=substr($12,3,4)
  a[ARGV[2],$3]=y OFS $3 OFS $4  # week
  a[ARGV[3],$5]=y OFS $5 OFS $6  # month
  a[ARGV[4],$9]=y OFS $9 OFS $10 # half
  next
}
{
  $1=a[FILENAME,$1]
} 1' calendar.csv week.csv month.csv half.csv

Note that if your calendar file is sorted, there is no need to parse fiscal year field again and again for each line. Something like this would be way more efficient in that case:

if(p!=$12) y=substr(p=$12,3,4)
like image 39
oguz ismail Avatar answered May 11 '26 07:05

oguz ismail



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!