Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to process each page of a multi-page PDF in place with imagemagick?

I have a multi-page PDF with photographed book pages. I want to remove gradients from every page to prepare for optical character recognition.

This command works fine on a PNG of a single page:

convert page.png \( +clone -blur 0x64 \) -compose minus -composite -channel RGB -negate page_deblurred.png

However, as soon as I try this on a multi-page PDF by using this command...

convert full.pdf \( +clone -blur 0x64 \) -compose minus -composite -channel RGB -negate full_deblurred.pdf

...I get a single-page PDF with inversed colors overlaid with text from several pages.

How do I tell imagemagick to process every page like it does with the PNG and return a multi-page PDF to me?

like image 584
303 Avatar asked Dec 09 '25 08:12

303


1 Answers

As imagemagick does not seem to be capable to do this in one shot, I put together a script based on the suggestion Mark Setchell made in a comment to his answer.

#!/usr/bin/bash

set -e

tmpdir=$(mktemp -d)

echo "Splitting PDF into single pages"
convert -density 288 "$1" "${tmpdir}/page-%03d.png"
for f in "$tmpdir"/page-*.png
do
    echo "Processing ${f##*/}"
    convert "$f" \( +clone -blur 0x64 \) -compose minus -composite -channel RGB -negate "$(printf "%s%s" "$f" "_gradient_removed.png")"
done
pdf_file_name_without_suffix="${1%.pdf}"
echo "Reassembling PDF"
convert "$tmpdir"/*_gradient_removed.png -quality 100 "$pdf_file_name_without_suffix"_gradient_removed.pdf

rm -rf "${tmpdir}"

It works fine with my material. Your mileage may vary.

like image 63
303 Avatar answered Dec 11 '25 00:12

303