summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2019-08-19Minor improvements for parallel block processorDavid Oberhollenzer
- Fewer lock aquires in worker function - There is no point in locking/unlocking for inserting the completed block if we are going to lock again immediately in the next iteration -> Merge those two critical sections into one - Constant time queue insertion - Bypass queue entirely if there is nothing to do for a block Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-19Fix memory leak in data writer fragment deduplicationDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-19Fix memory leak in data writer error code pathsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-19Fix memory leak in dir-scan error code pathDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Replace update_crc32 helper function with crc32 from zlibDavid Oberhollenzer
It is optimized to the maximum and if we already use zlib anyway, why not use zlib crc32? This also makes zlib a hard dependency which also means the whole "do we have a compressor" sanity check in the build system can be removed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Make data writer use block processorDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Restructure data writer around passing block_t structuresDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Minor interface change to data writerDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18cleanup: internalize deduplication list in data_writerDavid Oberhollenzer
This change removes the need for passing a list of files around for deduplication. Also the deduplication code no longer needs to worry about order, since the file being deduplicated is only added after deduplication is done. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Add pthread based, parallel block processor implementationDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Add block processor data structureDavid Oberhollenzer
The interface is designed for parallel, asynchronuous processing of data blocks with an I/O callback that handles the serialized result. The underlying implementation is currently still synchronuous. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-16Add deep-copy function to compressor interfaceDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-16Fix: don't try to read xattrs if there are noneDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-11Add gensquashfs option to read xattrs from input filesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-11Add --one-file-system option to gensquashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-11Replace fstree_from_dir boolean with flag fieldDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-07Add pread(2) like function to data_readerDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-07Fix forward seek when unpacking sparse filesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-07Fix zero padding of extracted data blocksDavid Oberhollenzer
Only padd it if the *extracted* size is less then block size. Doing it with the compressed size results in garbled blocks. Especially because most of them are less than block size when compressed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-05cleanup data readerDavid Oberhollenzer
- Split block reading code out from "dump_blocks" into precache_data_block, similar to precache_fragment_block - Merge the code paths for fragment/data block reading and uncompression Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-05cleanup: unify all the code that reads squashfs imagesDavid Oberhollenzer
This commit creates a new data structure called 'sqfs_reader_t' that takes care of all the repetetive tasks like opening the file, reading the super block, creating the compressor, deserializing an fstree and creating a data reader. This in turn makes it possible to remove all the duplicate code from rdsquashfs and sqfs2tar. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-04Improve file unpacking orderDavid Oberhollenzer
This commit moves the file unpacking order & job scheduling to a libfstree function. The ordering is improved by making sure fragment blocks are not extracted more than once and files with data blocks are extracted in order. This way, serial unpacking of a 2GiB Debian live image could be reduced from ~5' on my test machine to ~3.5', whereas parallel unpacking stays roughly the same (~3' for -j 4). Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-04Fix functions with side effect being used inside assertsDavid Oberhollenzer
If -DNDEBUG is set, the entire thing is omitted from the output. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-03Fix tar header error reporting on 32 bit systemsDavid Oberhollenzer
If an extension header is rejected because its too big, the error path would print the size as size_t, altough it is an uint64_t. On 64 bit systems, this works because size_t is a 64 bit unsigned integer, on 32 bit systems, not so much. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-03cleanup: remove left over atime/ctime codeDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02Fix explicit NULL dereference in deserialize_fstree failure pathDavid Oberhollenzer
If we failed to create the root node, we don't need to cleanup the fstree_t which would attempt to recursively cleanup the root node. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02cleanup: merge error paths in xattr reader restore_kv_pairsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02Fix potential double free of xattr reader id_block_startsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02Implement support for SOURCE_DATE_EPOCH environment variableDavid Oberhollenzer
reproducible-builds.org suggests the use of an environment variable as a source for time stamps: https://reproducible-builds.org/specs/source-date-epoch/ This commit adds support for setting the default mtime from the variable, if it is set and only defaulting to 0 if not. The timestamp given by the command line switch takes precedence. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Add ability to write_tar_header to embedd extended attributesDavid Oberhollenzer
This commit patches the tar writer to generate a PAX header with SCHILY xattr key/value pairs if requested. The Schily format is used for two reasons: - It is simple - It is apparently more widely supported than the libarchive format Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Add option to restore xattrs to deserialize_fstreeDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Add xattr reader implementation to recover xattrs from squashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Fix xattr writer size accountingDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Fix super block flags: clear "no xattr" flag when writing xattrsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Fix xattr OOL positionDavid Oberhollenzer
We need to get the position _before_ writing the header, otherwise the reader has no way to know the length of the value. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-30Update print_version textDavid Oberhollenzer
This commit updates the text issued by print_version() to reflect in some way that the software contains contributions from co-authors. The original text was based on the sterotypical --version output of GNU coreutils programs. It may have to be rewritten eventually. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-30Add propper copyright headers to all source filesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-29Fix order of data block deduplicationDavid Oberhollenzer
Data blocks need to be deduplicated before attempting to write a fragment. In the current attempt if the data blocks are found to be duplicates but the fragment isn't, the flushed fragments are purged as well, possibly damaging other files. Also, when the deduplication happens, the HAS_FRAGMENT flag needs to be set, otherwise the deduplication code thinks that there is one more block than there actually is. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-29Cleanup: move deduplication code from data writer to fstreeDavid Oberhollenzer
Since it is actually completely independend of libsqfs and only works on file_info_t lists, it can be safely moved over to libfstree and the data writer becomes less cluttered as a result. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-29Simplify fstree sortingZachary Dremann
For merging, the use of a pointer to a pointer can simplify linked list operations For sorting, find the half-way point of the list in a single iteration over the list Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix missing initialization of file fragment fieldsDavid Oberhollenzer
Despite having a flag for that now, they still need to be initialized because they are written straight to disk. Fixes: d4d1854aaed867d28ebfc97afb3518254ab6fd4b Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix duplicate file accountingDavid Oberhollenzer
A file is a complete duplicate if: - It has no blocks, only a single fragment and that is a duplicate - It has blocks but no fragment and the blocks are duplicate - It has blocks and a fragment and both are duplicate The previous version only counted the last one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix used bytes accounting when deduplicating file blocksDavid Oberhollenzer
If an entire file is eliminated, we need to reset the "used_bytes" counter, otherwise, ALL the table positions are way off. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix free() of stack pointer in id_table_read error pathDavid Oberhollenzer
We didn't allocate the ID table, so we don't need to free() it when reading from disk fails. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix: return the correct value from data_reader_createDavid Oberhollenzer
Cut & paste misshap after mergining with fragment reader: If there are no fragments, data_reader_create should return the data reader, not 0! Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Add some nice statistics output to tar2sqfs and gensquashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Add general purpose flags field to file_info_tDavid Oberhollenzer
Simplifies some task if we can just add a flag that a file has a framgent or that it has already been detected as a duplicate. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Implement data block deduplicationDavid Oberhollenzer
The strategy is as follows: - At the beginning of every file, remember the current position - Once a file is done scan the list of existing files for the following: - Look for an existing file that has a block with the same size and checksum as the first non-sparse block of the current file - After that, every block in the current file has to match in size and checksum the ones in the file that we found, from that point onward - sparse blocks in either file are skipped - If we found a match, we update the current file to point to the first matching block and rewind the squashfs image to remove the newly written data This strategy should in theory be able to find an existing file where the on-disk data *contains* the on-disk data of the current file. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Implement fragment deduplication in data writerDavid Oberhollenzer
The strategy is simple: - The data writer function that write data/fragment blocks get access to the list files. - When writing a fragment, we look for an already written file that has a fragment with the same size and checksum. - If we find one, we throw away the fragment and reuse the existing one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Unify common file start/end code from data writer in helper functionsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>