Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Text Searchable PDF to Image PDF with CUPS?

I'm struggling in finding a way to convert Text searchable PDF to image PDF. Tipically you can manually achieve that on Adobe Reader by printing the PDF to file, having selected the option "print as image". In this way all the pages of the selected PDF would be converted to images.

I need a linux command line procedure to get the same result fast because I need to process a huge amount of PDF files.

The common call:

lp -d PRINTER_NAME "$FILENAME"

doesn't convert text-PDF to image-PDF and I couldn't find any available option for doing that. Do you have any clue what I can do? or do you suggest me a better tool? Thanks

like image 510
Spatz Avatar asked Oct 25 '25 04:10

Spatz


2 Answers

Ghostscript (Debian/Ubuntu apt package: ghostscript, tested version 10.01.1) provides PDF output devices which will output a "rasterized* PDF (image PDF) with a resolution of 720 DPI by default:

gs -sDEVICE=pdfimage24 -o output.pdf input.pdf

The output resolution can be configured with the -r... option. For example, to create a low-resolution 150 DPI rasterized image PDF:

gs -sDEVICE=pdfimage24 -r150 -o output-dpi-150.pdf input.pdf

Notice that Ghostscript on low-resolution settings will create pixelish results for vector-based fonts, as it doesn't use internal upscaling / antialiasing: e.g. text in black color will be output as either-black-or-white pixels, without grey fading in between.

In order to leverage antialiasing for improved quality in low-resolution PDFs, let Ghostscript render at a high DPI value and then have it downscale to the desired resolution using -dDownScaleFactor=...:

gs -sDEVICE=pdfimage24 -r1200 -dDownScaleFactor=8 -o output-internal-1200-dpi-final-150-dpi.pdf input.pdf
like image 133
Abdull Avatar answered Oct 26 '25 17:10

Abdull


I had the same problem and I looked for a solution and found only one: a software called Okular (https://okular.kde.org/ - at this time, for Debian, version 17.12.2).

Unfortunately, it's not a command line.

For you convert PDF text to image (or similar), you need to complete these steps:

  • open Okular,
  • open PDF document into Okular and
  • choose menu option "File | Print...".

A print's window will be open, then you must choose the printer "Printer to File (PDF)". And then click "Option" window's button. Now, choose "PDF Options" tab and check "Force rasterization" option.

To finish, click at "Print" button.

like image 23
eddiesaliba Avatar answered Oct 26 '25 18:10

eddiesaliba