I have a bash script that takes a simple properties file and substitutes the values into another file. (Property file is just lines of 'foo=bar' type properties)
INPUT=`cat $INPUT_FILE`
while read line; do
PROP_NAME=`echo $line | cut -f1 -d'='`
PROP_VALUE=`echo $line | cut -f2- -d'=' | sed 's/\$/\\\$/g`
time INPUT="$(echo "$INPUT" | sed "s\`${PROP_NAME}\b\`${PROP_VALUE}\`g")"
done <<<$(cat "$PROPERTIES_FILE")
# Do more stuff with INPUT
However, when my machine has high load (upper forties) I get a large time loss on my seds
real 0m0.169s
user 0m0.001s
sys 0m0.006s
Low load:
real 0m0.011s
user 0m0.002s
sys 0m0.004s
Normally losing 0.1 seconds isn't a huge deal but both the properties file and the input files are hundreds/thousands of lines long and those .1 seconds add up to over an hour of wasted time.
What can I do to fix this? Do I just need more CPUs?
Sample properties (lines start with special char to create a way to indicate that something in the input is trying to access a property)
$foo=bar
$hello=world
^hello=goodbye
Sample input
This is a story about $hello. It starts at a $foo and ends in a park.
Bob said to Sally "^hello, see you soon"
Expected result
This is a story about world. It starts at a bar and ends in a park.
Bob said to Sally "goodbye, see you soon"
One idea/approach using bash and sed , you could try something like:
#!/usr/bin/env bash
while IFS='=' read -r prop_name prop_value; do
if [[ "$prop_name" == "^"* ]]; then
prop_name="\\${prop_name}"
fi
input_value+=("s/${prop_name}\\b/${prop_value}/g")
done < properties.txt
sed_input="$(IFS=';'; printf '%s' "${input_value[*]}")"
sed "$sed_input" sample_input.txt
One way to check the value of sed_input is
declare -p sed_input
Or
printf '%s\n' "$sed_input"
Embedding an external utility from bash within a shell loop like cut and sed should be avoided. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice
The sed invocation above run only once even if the file that needs to be edited has 500+ lines.
See How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
See How can I use array variables in bash?
See Parameter Expansion
See Howto_Parameter_Expansion
See How_do_I_do_string_manipulation_in_bash
Adding additional lines to OP's input file to demonstrate word boundary matching and a property name occurring more than once in a line:
$ cat input.txt
This is a story about $hello. It starts at a $foo and ends in a park.
Bob said to Sally "^hello, see you soon"
Leave first 2 matches alone: $foobar $hellow ^hello
^hello $foo $hello ^hello $foo $hello
Assumptions:
name is not an alphabetic character ([a-zA-Z]); otherwise we can expand the next_char testing (see awk code, below)General idea:
properties.txt entries into an array (map[name]=value)input.txt, loop through all names, checking for any word boundary matches to replaceOne idea using awk:
$ cat replace.awk
FNR==NR { split($0,arr,"=") # 1st file: split on "=" delimiter
map[arr[1]]=arr[2] # build map[name]=value array, eg: map[$foo]=bar
len[arr[1]]=length(arr[1]) # save length of "name" so we do not have to repeatedly calculate later
next
}
NF { newline = $0 # 2nd file: if we have at least one non white space field then make copy of current input line
for (name in map) { # loop through all "names" to search for
line = newline # start over copy of current line
newline = ""
while ( pos = index(line,name) ) { # while we have a match ...
# find next_character after "name"; if it is an
# alpha/numeric character we do not have a word
# boundary otherwise we do have a word boundary
# and we need to make the replacement with
# map[name]=value
next_char = substr(line,pos+len[name],1)
if (next_char ~ /[[:alnum:]]/)
newline = newline substr(line,1,pos+len[name]-1)
else
newline = newline substr(line,1,pos-1) map[name]
line = substr(line,pos+len[name]) # strip off rest of line to test for additional matches of "name"
}
newline = newline line # append remaining contents of line
}
$0 = newline # overwrite current input line with "newline"
}
1 # print current line
NOTES:
awk string matching functions (eg, sub(), gsub(), match()) treat the search pattern as a regex$, ^) will need to be escaped before trying to use sub() / gsub() / match()index() function treats search patterns as literal text (so no need to escape special characters)Taking for a test drive:
$ awk -f replace.awk properties.txt input.txt
This is a story about world. It starts at a bar and ends in a park.
Bob said to Sally "goodbye, see you soon"
Leave first 2 matches alone: $foobar $hellow goodbye
goodbye bar world goodbye bar world
For timing purposes I created a couple larger files from OP's properties file and my input.txt file (see above):
$ awk 'BEGIN {FS=OFS="="} {map[$1]=$2} END {for (i=1;i<=300;i++) {for (name in map) {nn=name x;print nn,map[name]};x++}}' properties.txt > properties.900.txt
$ for ((i=1;i<=250;i++)); do cat input.txt; done > input.1500.txt
$ wc -l properties.900.txt input.1500.txt
900 properties.900.txt
1500 input.1500.txt
Timing for the larger data files:
$ time awk -f replace.awk properties.900.txt input.1500.txt > output
real 0m0.126s
user 0m0.122s
sys 0m0.004s
$ head -12 output
This is a story about world. It starts at a bar and ends in a park.
Bob said to Sally "goodbye, see you soon"
Leave first 2 matches alone: $foobar $hellow goodbye
goodbye bar world goodbye bar world
This is a story about world. It starts at a bar and ends in a park.
Bob said to Sally "goodbye, see you soon"
Leave first 2 matches alone: $foobar $hellow goodbye
goodbye bar world goodbye bar world
NOTE: timing is from an Ubuntu 22.04 system (metal, vm) running on an Intel i7-1260P
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With