Sunday, February 24, 2019

Zip/gzip compression ratio much less in Linux than in Windows


I have a bunch of very huge files located on a Linux machine that I would like to compress and save some space. I have tried using the tar/gzip combination and I have noticed that the compression ratio is not very good. A 1.2GB file was compressed into a 1.1GB file. I have tried increasing the compression level as suggested here: How to specify level of compression when using tar -zcvf?


but it still wasn't any better. I've copied the same file to a Windows machine and ran WinRar on it. The resulting compressed file was only 0.45GB in size.


Is there a reason for such a huge discrepancy? Is there a better compressing tool for Linux?


UPDATE: I've even tried lzma and still not much better


Answer



Gzip is not a very good algorithm compared to Rar.


A more common method for linux these days is bzip2 which is installed by default on almost all linux distributions.


You can switch the tar archiver to use bzip2 compression by changing your command line to tar -cvjf rather than tar -cvzf the key being the replacement of the z with j in the options.


This should hopefully yield a good increase in compression ratio.


The reason for the discrepancy is because they are fundamentally different algorithms for compression. Gzip is an older algorithm and older algorithms tend to be less computationally intensive so that they would finish in a reasonable time. This is an effect of more readily available processing power, better and more computationally intensive algorithms can be used that finish in a similar time than an older algorithm did on an older computer. Conversely the older algorithms will complete compression much faster on a newer computer.


Almost any Windows archiver has an equivalent on Linux. 7zip is a nice archiver that gets good results on Windows and has an unofficial Linux version.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...