Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve performance of bash script

Tags:

bash

shell

I'm working on looping over hundreds of thousands of CSV files to generate more files from them. The requirement is to extract previous 1 month, 3 month, month, 1 year & 2 years of data from every file & generate new files from them.

I've written the below script which gets the job done but is super slow. This script will need to be run quite frequently which makes my life cumbersome. Is there a better way to achieve the outcome I'm after or possibly enhance the performance of this script please?

for k in *.csv; do
    sed -n '/'"$(date -d "2 year ago" '+%Y-%m')"'/,$p' ${k} > temp_data_store/${k}.2years.csv
    sed -n '/'"$(date -d "1 year ago" '+%Y-%m')"'/,$p' ${k} > temp_data_store/${k}.1year.csv
    sed -n '/'"$(date -d "6 month ago" '+%Y-%m')"'/,$p' ${k} > temp_data_store/${k}.6months.csv
    sed -n '/'"$(date -d "3 month ago" '+%Y-%m')"'/,$p' ${k} > temp_data_store/${k}.3months.csv
    sed -n '/'"$(date -d "1 month ago" '+%Y-%m')"'/,$p' ${k} > temp_data_store/${k}.1month.csv
done
like image 402
usert4jju7 Avatar asked Jan 23 '26 07:01

usert4jju7


1 Answers

You read each CSV five times. It would be better to read each CSV only once.

You extract the same data multiple times. All but one parts are subsets of the others.

  • 2 years ago is a subset of 1 year ago, 6 months ago, 3 months ago and 1 month ago.
  • 1 year ago is a subset of 6 months ago, 3 months ago and 1 month ago.
  • 6 months ago is a subset of 3 months ago and 1 month ago.
  • 3 months ago is a subset of 1 month ago.

This means every line in "2years.csv" is also in "1year.csv". So it will be sufficient to extract "2years.csv" from "1year.csv". You can cascade the different searches with tee.

The following assumes, that the contents of your files is ordered chronologically. (I simplified the quoting a bit)

sed -n "/$(date -d '1 month ago' '+%Y-%m')/,\$p" "${k}" |
tee temp_data_store/${k}.1month.csv |
sed -n "/$(date -d '3 month ago' '+%Y-%m')/,\$p" |
tee temp_data_store/${k}.3months.csv |
sed -n "/$(date -d '6 month ago' '+%Y-%m')/,\$p" |
tee temp_data_store/${k}.6months.csv |
sed -n "/$(date -d '1 year ago' '+%Y-%m')/,\$p" |
tee temp_data_store/${k}.1year.csv |
sed -n "/$(date -d '2 year ago' '+%Y-%m')/,\$p" > temp_data_store/${k}.2years.csv
like image 89
ceving Avatar answered Jan 25 '26 23:01

ceving



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!