Tuesday, June 26, 2018

linux - BTRFS filesystem, compression and copy on write


I'm planning about moving from etx4 to BTRFS filesystem on my pc and backup server.
I'm very interested in the BTRFS snapshot feature (I have understood it uses the copy on write feature) and the compression feature but I have doubts/questions:


How do the compression and cow interact/work together?
Does the filesystem compression affect(penalize) the efficiency of cow?
Has anyone here used both features, cow and compression, with BTRFS and can confirm they are working fine together?


Update: I have found this explanation\answer inside the btrfs wiki:


"How does compression interact with direct IO or COW?
Compression does not work with DIO, does work with COW and does not work for NOCOW files. If a file is opened in DIO mode, it will fall back to buffered IO."
It looks like compression and cow work toghether.


Has anyone used cow+compression in production toghether --i.e: multiple snapshots of compressed folders ?


Answer



BTRFS compression is designed to fit in efficiently with the whole CopyOnWrite design. I use both together and I can confirm they haven't caused me any problems.


How they work together: file data in BTRFS is stored in extents, which are long sections of sequential blocks. Blocks are all the same size, usually 4K, while extents vary in size by the actual file and free space. So for example, if you have a file that is 1M in size, it could be one extent of 256 blocks, or it could be two extents of 113 blocks and 143 blocks. Or dozens of extents of all different sizes, in any combination. If you change one byte in the middle of the file, it will copy the extent that contains the changed byte. It might create a whole new extent, or it might split that extent into three: two on either side of the changed byte which point at the original unchanged blocks, and one with the new data.


The way compression fits in, according to the btrfs wiki, is that compression is done on a block-by-block (4K size) basis, in block groups up to 128K in size. So the file is not stored one long compressed stream; it is stored as sections of compressed chunks. When you change one byte in the middle of the file, the majority of the file in compressed blocks is untouched. The compressed block, and possibly a few blocks around it up to 128K, are copied and recompressed, and the extent list is updated like any other COW write. In today's systems, compressing 4K or 128K is trivial, so there is no performance penalty.


Since adjusting the extent map of the file is a normal part of COW functionality, there is no significant difference in whether some of the 4K blocks are compressed or uncompressed. (In fact, in BTRFS, a file can include any combination of uncompressed blocks, ZLIB compressed blocks, and LZO compressed blocks, depending on which compression option was active in the filesystem when parts of the file were updated.)


I haven't done any exhaustive study or measurements; it "just works" like I expected it to.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...