Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bulk rename pdf files with name from specific line of its content in linux

I have multiple pdf files which I want to rename. new name should be taken from pdf's file content on specific(lets say 5th) line. for example, if file's 5th line has content some string <-- this string should be name of file. and same thing goes to the rest of files. each file should be renamed with content's 5th line. I tried this in terminal

for pdf in *.pdf
do
   filename=`basename -s .pdf "${pdf}"`
   newname=`awk 'NR==5' "${filename}.pdf"`
   mv "${pdf}" "${newname}"
done

it copies the files, but name is invalid string. I know the system doesn't see the file as plain text and images, there are metadata, xml tags and so on.. but is there way to take content from that line?


1 Answers

Out of the box, bash and its usual utilities are not able to read pdf files. However, less is able to recover the text from a pdf file. You could change your script as follow :

for pdf in *.pdf
do
    mv "$pdf" "$(less $pdf | sed '5q;d').pdf"
done

Explanation :

  • less "$pdf" : display the text part of the pdf file. Will take spacing into account
    • make some tests to see if less returns the desired output
  • sed '5q;d' : extracts the 5th line of the input file

Optionally, you could use the following script to remove blank lines and exceeding spaces :

mv "$pdf" "$(less "$pdf" | sed -e '/^\s*$/d' -e 's/ \+/ /g' | sed '5q;d').pdf"
like image 126
Aserre Avatar answered Oct 17 '25 17:10

Aserre