When I use tar
to archive a directory and then compress it separately using e.g. xz
, there will be a point where I have three files on my system - dir
, dir.tar
and dir.tar.xz
. As soon as the compression is completed, dir.tar
is deleted, but it seems like I must still make sure I have enough free disk space to accommodate all three files in this setup.
When using the compression flag with tar directly, there compressed file is created without an observable .tar
intermediate and it appears I only need free space equal to the directory and the compressed file.
I was initially hypothesizing that maybe the tar archive was created and deleted bit by bit as it was compressed, but at the same time, I remember reading somewhere that the entire tar archive needs to be created before compression. I can't observe any temporary tar file, hidden or not.
Does using tar with a compression flag, actually need less free disk space than when first using tar followed by a compression utility? Why/why not (maybe a step by step of what tar+compression flag does)?
Answer
Yes, using the compression flags in the tar
command directly (eg, tar czf
) will reduce intermediate disk usage as it does not create any temporary uncompressed tar file, but rather uses pipes to pass the stdout of tar directly to stdin of the compression utility.
Depending on how pipes are implemented on your particular system, tar
might appear to be writing a file, but that file will actually be a FIFO queue with no appreciable space consumption.
Without the flag:
Files > tar = original files + .tar the same size
.tar > gzip = .tgz = original files + .tar + .tgz
Total disk usage just before deleting the .tar is 2-3x the original files depending on the compression ratio.
With the flag:
Files > tar > gzip = files + .tgz
Worst case usage is 2x the original files.
No comments:
Post a Comment