Friday, November 10, 2017

linux - Does specifying the compression engine to tar directly actually use less intermediate disk space than tarring first and then compressing?




When I use tar to archive a directory and then compress it separately using e.g. xz, there will be a point where I have three files on my system - dir, dir.tar and dir.tar.xz. As soon as the compression is completed, dir.tar is deleted, but it seems like I must still make sure I have enough free disk space to accommodate all three files in this setup.



When using the compression flag with tar directly, there compressed file is created without an observable .tar intermediate and it appears I only need free space equal to the directory and the compressed file.



I was initially hypothesizing that maybe the tar archive was created and deleted bit by bit as it was compressed, but at the same time, I remember reading somewhere that the entire tar archive needs to be created before compression. I can't observe any temporary tar file, hidden or not.



Does using tar with a compression flag, actually need less free disk space than when first using tar followed by a compression utility? Why/why not (maybe a step by step of what tar+compression flag does)?


Answer



Yes, using the compression flags in the tar command directly (eg, tar czf) will reduce intermediate disk usage as it does not create any temporary uncompressed tar file, but rather uses pipes to pass the stdout of tar directly to stdin of the compression utility.




Depending on how pipes are implemented on your particular system, tar might appear to be writing a file, but that file will actually be a FIFO queue with no appreciable space consumption.



Without the flag:
Files > tar = original files + .tar the same size
.tar > gzip = .tgz = original files + .tar + .tgz
Total disk usage just before deleting the .tar is 2-3x the original files depending on the compression ratio.



With the flag:
Files > tar > gzip = files + .tgz
Worst case usage is 2x the original files.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...