Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicate objects in PDF using ghostscript?

Using command-line ghostscript, is it possible to remove duplicate embedded objects (images) in the PDF and replace them with a single instance?

I have a 200+ pages PDF with a background image and some smaller logos on each page. The file is very large, because the very same background image and logo binaries are embedded in each individual page, instead of being embedded once and then referenced on each page. I am not the creator of the PDF so I can not solve the problem at it's source.

(I do not want to shrink or reduce the image quality, and I do not want delete them completely.)

like image 702
TeXter Avatar asked Oct 16 '25 00:10

TeXter


1 Answers

As supplement to ghostscript, pdfsizeopt does a very good job in eliminating duplicate embedded objects (including background images) in the PDF and can be run in addition before or after a file is processed by ghostscript. A bit tricky to include in the workflow due it's dependencies however, and creates a lot of temporary files. Can be found at https://github.com/pts/pdfsizeopt (formerly https://code.google.com/p/pdfsizeopt/)

My 200+ pages document got from 150MB to 40MB just by removing duplicate images.

like image 195
TeXter Avatar answered Oct 19 '25 05:10

TeXter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!