Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I properly iterate through two separate files of unequal length in bash?

Tags:

bash

I am relatively new to bash scripting.

I am attempting to create a bash script that will generate a list of SQL statements; I have a list of 'available hosts' & 'jobs' which I am pointing the script to (THE FILES ARE NOT THE SAME LENGTH).

$ cat avail_hosts.log
computer1
computer2
computer3
computer4
computer5

$ cat archiv_jobs.log
job1
job2
job3
job4
job5

The desired result of the script is to output a list of SQL statements that each contain a (unique) job and a (unique) host (see desired result below). For clarification, all jobs from 'archiv_jobs.log' must be assigned to a host; a single job should not be assigned to more than 1 host & a single host should not have more than 1 job assigned; any left-over hosts in 'avail_hosts.log' can be ignored after all jobs in 'archiv_jobs.log' are assigned. Lastly, there will never be an issue of more jobs than available hosts.

Currently, I am unable to make the script increment the host value so that it matches the 'desired output' below:

#! /bin/bash

rm ~/logs/assign_hosts.sql

sql_script=~/logs/assign_hosts.sql
host=~/logs/avail_hosts.log

for n in $(cat ~/logs/archiv_jobs.log)
do
    for h in $(cat ~/logs/avail_hosts.log)
    do
    host=$h
    done    
    printf "UPDATE pw_batch_app.job_control SET host_name = $host WHERE job_name='$n';\n" >> $sql_script
done

Current output: (same host)

UPDATE pw_batch_app.job_control SET host_name = computer1 WHERE job_name='Job1';
UPDATE pw_batch_app.job_control SET host_name = computer1 WHERE job_name='Job2';
UPDATE pw_batch_app.job_control SET host_name = computer1 WHERE job_name='job3....

Desired output: (unique host)

UPDATE pw_batch_app.job_control SET host_name = computer1 WHERE job_name='Job1';
UPDATE pw_batch_app.job_control SET host_name = computer2 WHERE job_name='Job2';
UPDATE pw_batch_app.job_control SET host_name = computer3 WHERE job_name='job3....

I have attempted to add ((h++)) to no avail:

for n in $(cat ~/logs/archiv_jobs.log)
do
    for h in $(cat ~/logs/avail_hosts.log)
    do
    host=$h
    ((h++))
    done    
    printf "UPDATE pw_batch_app.job_control SET host_name = $host WHERE job_name='$n';\n" >> $sql_script
done

Thank you to everyone that contributed & gave feedback! This was a case of beginner logic fail

SOLUTION: (courtesy of @merof)

i=0
for host in $(cat avail_hosts.log)
do
  n=$(sed -n "$((i+1))p" archiv_jobs.log | sed 's/ *$//')

  # If there are no more jobs, break out of the loop
  if [ -z "$n" ]; then
      break
  fi
  printf "UPDATE pw_batch_app.job_control SET host_name = $host WHERE job_name='$n';\n" >> str.tql
  ((i++))
done

Detailed explanation in comments section
like image 488
RyanAcevedo Avatar asked Nov 17 '25 11:11

RyanAcevedo


1 Answers

Assumptions:

  • we want to match lines from the input files based on the line number
  • per OP's latest description update (any left-over hosts can be ignored), we'll halt processing if we find we're missing a job

To demonstrate processing when there's an extra host (or job) we'll use the following inputs:

$ head avail_hosts.log archiv_jobs.log

==> avail_hosts.log <==
computer1
computer2
computer3
computer4                         # no matching entry from archiv_jobs.log

==> archiv_jobs.log <==
job1
job2
job3

While it's possible to write a bash shell script to step through both input files at the same time, I think we can simplify the code by first making use of paste to match the lines for us, eg:

$ paste avail_hosts.log archiv_jobs.log
computer1       job1
computer2       job2
computer3       job3
computer4                         # a leftover will show up as a single entry in the line

We can then feed this to a single while/read loop, eg:

while read -r host job            # if there's a leftover then the second variable (job) will be empty
do
    [[ -z "${job}" ]] && break    # if $job is empty then break out of loop (ie, halt generation of UPDATE statements)

    printf "UPDATE pw_batch_app.job_control SET host_name = '${host}' WHERE job_name='${job}';\n" > $sql_script

done < <(paste avail_hosts.log archiv_jobs.log)

###########
# or, to reduce the number of times we need to open/close $sql_script
# we can move the write operator to the end of the while loop:

while read -r host job
do
    [[ -z "${job}" ]] && break
    printf "UPDATE pw_batch_app.job_control SET host_name = '${host}' WHERE job_name='${job}';\n"

done < <(paste avail_hosts.log archiv_jobs.log) > $sql_script

This generates:

$ cat $sql_script
UPDATE pw_batch_app.job_control SET host_name = 'computer1' WHERE job_name='job1';
UPDATE pw_batch_app.job_control SET host_name = 'computer2' WHERE job_name='job2';
UPDATE pw_batch_app.job_control SET host_name = 'computer3' WHERE job_name='job3';

NOTES:

  • this eliminates the need for this line from the current script: host=~/logs/avail_hosts.log
  • I've gone ahead and wrapped the $host value in quotes (in the UPDATE statement) since I'm assuming the host_name column's datatype is a string/(var)char
  • if OP prefers to read from both files at the same time I'd suggest a web search on bash read two files concurrently site:stackoverflow.com (brings up many Q&As that address this approach)
like image 167
markp-fuso Avatar answered Nov 19 '25 23:11

markp-fuso



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!