A simple example:
from matplotlib.pyplot import plot, savefig
from numpy.random import randn
plot(randn(100),randn(100,500),"k",alpha=0.03,rasterized=True)
savefig("test.pdf",dpi=90)
Produces:

But the file size comes out to be ~8Mb. Any ideas what's going wrong? Could this be a bug? I'm on Python 3.5.1 and Matplotlib 2.1.2.
Looks like the full answer is in the comment to here: https://stackoverflow.com/a/12102852/1078529
The trick is to use set_rasterization_zorder to rasterize everything below a certain zorder together into a single bitmap,
gca().set_rasterization_zorder(1)
plot(randn(100),randn(100,500),"k",alpha=0.03,zorder=0)
savefig("test.pdf",dpi=90)
With rasterized=True, you get a PDF with an embedded bitmap (which can be big).
With rasterized=False, you get a PDF with tons of embedded line-drawing instructions (which aren't big, but can take a while to render).
With rasterized=False, I get a 374 KiB document.
EDIT: Digging a little deeper, in the rasterized=True document (which clocks in at about 7 megabytes), it looks like every line gets its own bitmap, and they are overlaid:
$ pdfimages -list -all test.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 408 177 rgb 3 8 image no 12 0 90 90 4192B 1.9%
1 1 smask 408 177 gray 1 8 image no 12 0 90 90 7511B 10%
1 2 image 408 170 rgb 3 8 image no 13 0 90 90 4472B 2.1%
1 3 smask 408 170 gray 1 8 image no 13 0 90 90 7942B 11%
1 4 image 408 180 rgb 3 8 image no 14 0 90 90 5454B 2.5%
1 5 smask 408 180 gray 1 8 image no 14 0 90 90 9559B 13%
1 6 image 408 180 rgb 3 8 image no 15 0 90 90 4554B 2.1%
1 7 smask 408 180 gray 1 8 image no 15 0 90 90 8077B 11%
[... 993 more images ...]
For the nonrasterized document, there are no images at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With