diff options
author | David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 2019-07-27 00:19:13 +0200 |
---|---|---|
committer | David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 2019-07-28 16:33:57 +0200 |
commit | 256c2458a4fa298c876d8e4a4450cb9a0834b877 (patch) | |
tree | b8e619b55d0bd497010effce5a475b960d5bb845 /lib/tar/cleanup.c | |
parent | cce36f459ddb5698fd1a40061c466996482146eb (diff) |
Implement data block deduplication
The strategy is as follows:
- At the beginning of every file, remember the current position
- Once a file is done scan the list of existing files for the following:
- Look for an existing file that has a block with the same size and
checksum as the first non-sparse block of the current file
- After that, every block in the current file has to match in size and
checksum the ones in the file that we found, from that point onward
- sparse blocks in either file are skipped
- If we found a match, we update the current file to point to the first
matching block and rewind the squashfs image to remove the newly written
data
This strategy should in theory be able to find an existing file where the
on-disk data *contains* the on-disk data of the current file.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Diffstat (limited to 'lib/tar/cleanup.c')
0 files changed, 0 insertions, 0 deletions