aboutsummaryrefslogtreecommitdiff
path: root/lib/sqfs/data_writer.c
AgeCommit message (Collapse)Author
2019-08-23Some simple search/replace cases for allocationDavid Oberhollenzer
This commit exchanges some malloc(x + y * z) patterns that can be found with a simple git grep and are obvious for the new wrappers. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-19Fix memory leak in data writer fragment deduplicationDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-19Fix memory leak in data writer error code pathsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Replace update_crc32 helper function with crc32 from zlibDavid Oberhollenzer
It is optimized to the maximum and if we already use zlib anyway, why not use zlib crc32? This also makes zlib a hard dependency which also means the whole "do we have a compressor" sanity check in the build system can be removed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Make data writer use block processorDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Restructure data writer around passing block_t structuresDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Minor interface change to data writerDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18cleanup: internalize deduplication list in data_writerDavid Oberhollenzer
This change removes the need for passing a list of files around for deduplication. Also the deduplication code no longer needs to worry about order, since the file being deduplicated is only added after deduplication is done. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-30Add propper copyright headers to all source filesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-29Fix order of data block deduplicationDavid Oberhollenzer
Data blocks need to be deduplicated before attempting to write a fragment. In the current attempt if the data blocks are found to be duplicates but the fragment isn't, the flushed fragments are purged as well, possibly damaging other files. Also, when the deduplication happens, the HAS_FRAGMENT flag needs to be set, otherwise the deduplication code thinks that there is one more block than there actually is. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-29Cleanup: move deduplication code from data writer to fstreeDavid Oberhollenzer
Since it is actually completely independend of libsqfs and only works on file_info_t lists, it can be safely moved over to libfstree and the data writer becomes less cluttered as a result. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix used bytes accounting when deduplicating file blocksDavid Oberhollenzer
If an entire file is eliminated, we need to reset the "used_bytes" counter, otherwise, ALL the table positions are way off. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Add general purpose flags field to file_info_tDavid Oberhollenzer
Simplifies some task if we can just add a flag that a file has a framgent or that it has already been detected as a duplicate. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Implement data block deduplicationDavid Oberhollenzer
The strategy is as follows: - At the beginning of every file, remember the current position - Once a file is done scan the list of existing files for the following: - Look for an existing file that has a block with the same size and checksum as the first non-sparse block of the current file - After that, every block in the current file has to match in size and checksum the ones in the file that we found, from that point onward - sparse blocks in either file are skipped - If we found a match, we update the current file to point to the first matching block and rewind the squashfs image to remove the newly written data This strategy should in theory be able to find an existing file where the on-disk data *contains* the on-disk data of the current file. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Implement fragment deduplication in data writerDavid Oberhollenzer
The strategy is simple: - The data writer function that write data/fragment blocks get access to the list files. - When writing a fragment, we look for an already written file that has a fragment with the same size and checksum. - If we find one, we throw away the fragment and reuse the existing one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Unify common file start/end code from data writer in helper functionsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Compute per-block and per-fragment checksums in data wrtierDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Add fragment and block checksum fields to file_info_tDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Cleanup sqfs_write_tableDavid Oberhollenzer
This commit attempts to make the generic table writer more readable. A few changes are made, including heap allocation of the block list. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24Enable largefile supportMatt Turner
Requires that config.h be included before other headers, since the macro _FILE_OFFSET_BITS changes the definitions of things like 'struct stat'. I chose to simply include it at the top of every C file and at immediately after the double-inclusion guards of every header. Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16cleanup: move error handling into read_retryDavid Oberhollenzer
If read_retry fails to read the expected amount of data (EOF or otherwise), it is almost always an error. This commit renames read_retry to read_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16cleanup: move error handling into write_retryDavid Oberhollenzer
If write_retry fails to write everything, it is *always* an error. This commit renames write_retry to write_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-15Add flags to data writer to micro manage behaviourDavid Oberhollenzer
The added flags allow controlling the following on a per file level: - forcing a file to be written uncompressed - forcing a file to not have a fragment, i.e. the last truncated block actually being written as a block - padding a file to be alligned to device block size The flags are not yet exposed to anything user controllable (such as command line flags). Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-12Simplify writer code for files without a fragmentDavid Oberhollenzer
The data writer sparse block code can take advantage that it can add a block size instead of a fragment and doesn't have to initialize the framgent location. In return, the tree node to inode serialization code doesn't need a special case for sparse file anymore and can now also handle files that are forced to not have a fragment. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-30Add support for repacking condensed sparse filesDavid Oberhollenzer
This commit broadly does the following things: - Rename and move the sparse mapping structure to libutil - Add a function to the data writer for writing condensed versions of sparse files, given the mapping. - This shares code with the already existing function for regular files. The shared code is moved to a common helper function. - Add support to tar2sqfs for repacking sparse files. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-28Add support for packing sparse filesDavid Oberhollenzer
This commit adds support for packing sparse files into squashfs images as follows: - In the data writer: simply detect zero blocks and write a zero to the block size field and don't emit any data. Record the number of bytes saved this way. For fragments, set the fragment offset to invalid. - In the inode writer: write out the number of bytes saved for sparse files. If there should be a fragment but there is none, append a block count of 0. - In the data reader: if the block size is 0, read nothing from disk and emit an empty block. Do the same if the fragment is missing. - In the inode reader: restore the number of bytes saved for sparse files. The sparse files can be packed and unpacked, but the unpacking will not create sparse files for now. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-28Ommit fragment table if there really are no fragmentsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-11Move data writer to libsqfs.aDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>