Thursday, September 7, 2017

linux - Batch resize and compress PDF files


I need a way to size down and compress batches of PDF files. I'd prefer to do this on Windows, but Linux would be fine if it meant a smoother workflow.


I know that there's programs such as NitroPDF and Acrobat that allow you to accomplish this, but I'm afraid it would have to be done on a file-by-file basis. These programs also aren't cheap and I'd prefer not to buy them just so I can use one or two features.


Background info: I use CamScanner to digitize receipts and invoices for entry into accounts (FreeAgent). CamScanner pdfs are all A4 sized and multi-page ones often exceed the 2MB attachment limit.


Answer



I'm suggesting a command line tool here, which can be easily batched with loops in built-in scripting languages in Windows, Linux, OS X, etc.




ImageMagick supports PDFs and has a resize option with its convert tool. I've never used it personally, but you can try to play around with that.


You can also use the compress option (there's an example here):



Rotate a PDF


$ convert -rotate 270 -density 300x300 -compress lzw in.pdf out.pdf

This assumes a TIFF-backed PDF. The density parameter is important because otherwise ImageMagick down-samples the image (for some reason). Adding in the compression option helps keep the overall size of the PDF smaller, with no loss in quality.



For multipage PDFs, you may want to use pdftk, then use mogrify from ImageMagick to convert each page in place:



$ pdftk in.pdf burst
$ mogrify -rotate 270 -density 300x300 -compress lzw pg_*.pdf
$ pdftk pg*.pdf cat output out.pdf
$ rm pg*.pdf




To convert PDF files with ImageMagick, you need to have GhostScript installed.




ImageMagick can convert multipage PDFs. While mogrify will convert in place, I recommend you use convert so you can keep the originals in case of accident.




I've done some testing on your provided sample PDF. This worked quite well for me:


convert -density 200 -compress jpeg -quality 20 test.pdf test2.pdf

Density defaults to 72 DPI. By setting it higher we can get a higher resolution and therefore acceptable quality. It looked alright at 150, and was a little smaller, but if you want to cater for a range of PDFs 200 should work.


JPEG compression should either auto choose a level or default to 92 on a scale of 1 to 100 with 100 being the best. Setting it at 20, it looks almost as good as the original (a little fuzzier and the small text at the bottom is a little hard to read, but it was originally anyway).


These options bring your 1.7MB sample down to 0.5MB, while keeping it readable. You can experiment a little.


If you want a smaller size (both of the file and of the image/PDF), you can use -resize #%, e.g. -resize 75%. On your example PDF, this makes the small print at the bottom pretty much unreadable, though.


If you're still tight for space, especially for the multipage PDFs, you could compress further by adding the files to a ZIP (or other) archive. This brought the file size down to 0.43MB on that test PDF (reducing the JPEG compression quality has a much more drastic effect). You could also split the PDF file into pages with pdftk, as @glallen suggested in his edit, or split the archive and recombine at the other end.


2MB is also a rather small attachment limit, you may want to look into other email providers. From memory, GMail provides over 10MB per email.


These options, and more, are fully documented on their website.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...