squashfs-tools-ng.git - A new set of tools and libraries for working with SquashFS images

Age	Commit message (Collapse)	Author
2020-04-22	Import and use Mesa's hash table	Matt Turner
	With `perf record`/`perf report` I saw that 30% of the time was spent in `sqfs_frag_table_find_tail_end` with tar2sqfs for a tarball containing the Gentoo ebuild repository (many thousands of small files). The reason was the bucketing hash table in frag_table.c: too many elements in too few buckets meant lots of walking over the linked lists. This patch replaces that hash table with the hash table implementation from Mesa. Its implementation is more complex (is is an open-addressing, linear-reprobing) hash table, but it is much better suited for the task. On my 4c/8t Skylake, the time to run tar2sqfs drops from 7.5s to less than 3s. CPU usage increases from ~207% to ~356%, presumably indicating an increase in available parallelism due to the removal of the hash table as a bottleneck. The `perf report` profile with this patch shows that the time spent in `sqfs_frag_table_find_tail_end` has dropped from ~30% to 0.01%. Output from ministat: x before + after N Min Max Median Avg Stddev x 20 7.476 7.685 7.5725 7.5615 0.051254268 + 20 2.79 2.901 2.846 2.84475 0.03543842 Difference at 95.0% confidence -4.71675 +/- 0.0282015 -62.3785% +/- 0.241477% (Student's t, pooled s = 0.0440618) I imported only the bits of the hash table implementation that were needed for frag_table.c. Among the changes I made after importing are - removed usage of ralloc, Mesa's recursive memory allocator - Replaced ralloc -> malloc ralloc_free -> free rzalloc_array -> calloc - Removed mem_ctx parameters - Added free()s to the appropriate places (valgrind confirms there are no leaks) - removed _mesa_-prefix from function names Fixes: #40 Signed-off-by: Matt Turner <mattst88@gmail.com>
2020-03-19	Fix destruction of NULL pointer in xattr reader cleanup	David Oberhollenzer
	This fixes a copy and paste error in the cleanup path, destroying a previously destroyed object again instead of the one being tested for. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-19	Fix pthread_join check for valid thread handles	David Oberhollenzer
	On Linux, checking for > 0 worked because pthread_t is internally an integer type. On other platforms (caugh Mac OS X caugh), it is typedefed to an opaque pointer, causing a warning if used in an integer relational comparison. The intended use is to allow the generic cleanup function to be used in the error path of the block processor creation function, while preventing pthread_join being called on threads that haven't been created at all. Since they are calloc'ed to 0, testing for non-zero values should suffice in both cases. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-18	Fix build of lz4 compressor with older versions of liblz4	David Oberhollenzer
	Older versions of liblz4 don't define LZ4HC_CLEVEL_MAX. This commit adds a definition if liblz4 doesn't provide one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-18	Cleanup: Move xxhash32 code to libutil	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-05	Get rid of sqfs_compressor_exists	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-05	Change the signature of sqfs_compressor_create to return an error code	David Oberhollenzer
	Make sure the function has a way of telling the caller why it failed. This way, the function can convey whether it had an internal error, an allocation failure, whether the arguments are totaly nonsensical, or simply that the compressor or specific configuration is not supported. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-05	Cleanup: Remove the E_ prefix from all libsquashfs enumerators	David Oberhollenzer
	Avoid namespace polution. Make sure all exportet symbols are prefixed with either sqfs_ or SQFS_. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-04	Fix block writer inheritance of sqfs_object_t	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-04	Cleanup: match xattr reader API closer to id table API	David Oberhollenzer
	Instead of creating everything in the "create" function, cleanup and create/initialize stuff in a "load" function. This allows the xattr reader to be reset/re-used and adds the benefit of not having to lug around references to the super block, compressor and file (altough the later two are hidden inside the meta reader). Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-04	Add a generic copying mechanism to sqfs_object_t	David Oberhollenzer
	This patch adds a deep-copy callback to sqfs_object_t and removes the copying mechanism from sqfs_compressor_t. This is also interesting for other types. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-01	Add a "do not deduplicate" block flag	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-01	Fix alloca in write_inode.c for windows build	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-27	Fix: strictly verify compressor settings in config initialization	David Oberhollenzer
	Make sure the function throws an error if a given compressor ID or flag is not known. This way, libsquahfs supports exactly and only what the on-disk format specifies. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-27	Add a function to the compressor interface to get the configuration	David Oberhollenzer
	This allows getting the compressor configuration back after creating it, for various purposes. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-23	Remove the sqfs_inode_copy function	David Oberhollenzer
	With unified payload size counters, copying an inode is now trivial. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-23	Turn file inode management completely over to the block processor	David Oberhollenzer
	If the block processor allocates and dynamically resizes inodes on the fly, we can add data indefinitely without knowing the size of the file ahead of time. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-23	Unify the payload counters in the sqfs_inode_generic_t	David Oberhollenzer
	Instead of having seperate counters for blocks, dir index bytes and having to fiddle out the link target size, simply use a single value that stores the number of payload bytes used. A seperate "payload bytes available" is used for dynamically growing inodes during processing. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Move inode size accounting completely to the block processor	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Cleanup block processor: merge common initialization code	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Cleanup block processor: Merge destructors for Windows & pthreads	David Oberhollenzer
	Since the merged destructor checks if the objects it destroys were actually initialized, the pthread implementation can also replace its error path cleanup with simply calling the destructor. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Add a seperate sqfs_block_processor_sync function	David Oberhollenzer
	This function waits for all pending blocks to be written to disk, but doesn't flush the fragment block, so processing can continue afterwards as if nothing happened. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-21	Cleanup: move utilities back out of libsquashfs	David Oberhollenzer
	This commit removes the allocation helpers and string table functions out of libsquashfs back into a "libutil.a". The problem of libsquashfs exporting stuff that it shouldn't is resolved by retaining the internal attributes and directly adding the source to libsquashfs instead of trying to somehow link against libutil.la. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-20	Thread pool block processor: Cleanup after restructuring	David Oberhollenzer
	- Merge duplicated code from append_to_work_queue and sqfs_block_processor_finish Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-20	Restructure thread pool block processor	David Oberhollenzer
	Implement the io-queue based design as outline in doc/parallelism.txt Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-18	Simplify the thread pool block processor somewhat	David Oberhollenzer
	- Split the worker function up into smaller functions that are a little more readable. - Only dequeue one block at a time. Makes the dequeueing a lot more readable and understandable. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-16	block processor: move the internals to the respective implementations	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-16	block processor: merge rest of fileapi.c into common.c	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-16	Replace crc32 with xxhash32	David Oberhollenzer
	On the one hand, benchmarking and profiling determined xxhash32 to be faster than the zlib implementation of crc32, on the other hand profiling determined that crc32 computation contributed signifficantly to the overall runtime. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-16	Move all the queue-waiting logic to the thread pool implemenation	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-16	Minor cleanup	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-16	block processor: move sparse block detection into worker thread	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-15	Move block block accounting to the other end of the block pipeline	David Oberhollenzer
	This commit moves all of the fragment/block accounting in the block processor over to the writing end of the pipeline. In order to do this, the sparse blocks are allowed to bubble through the pipeline. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-15	Cleanup: block processor: move init/cleanup functions into implemenations	David Oberhollenzer
	Again, the generic init/cleanup functions do way too many things that are specific to the thread pool implementation. Thanks to the splitting up of the block processor, they also have become quite trivial. This commit moves those functions into their respective implementations, allowing even further simplificiation. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-15	Cleanup: block processor: move finish function back into implementations	David Oberhollenzer
	The "generic" version actually uses specific internals of the thread pool implementation. Move it back into the thread pool based implementation and simplify the serial processor. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-15	Cleanup: block processor: remove test_and_set_status	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-15	Don't set error status on block processor for non-fatal errors	David Oberhollenzer
	If the API is mis-used or allocation fails, return an appropriate error code, but don't permanently break the block processor. It's easy to recover from that. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-15	Cleanup: block processor: remove delayed thread notification	David Oberhollenzer
	Under the assumption that block processing is CPU bound and not I/O bound, this is entirely useless. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-12	Use a hash table for fragment lookup instead of linear search	David Oberhollenzer
	Profiling on a sample filesystem determined that fragment deduplication lookups rank third place directly after crc32 and the actual compression. By using a hash table instead of linear search, this time can be reduced drastically. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-12	Implement a more explicit object system	David Oberhollenzer
	Make every dynamically allocated, opaque data structure inherit from a common sqfs_object_t structure with common entry points (e.g. destroy). This removes tons of public API functions and replaces them with a simple sqfs_destroy instead. If semantics of the (until now implicit) object system need to be extended, it can be much more conveniantely done this way. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-12	Cleanup: Move sqfs_block_t to block processor internals	David Oberhollenzer
	It was basically built around the block processor and exposed way too many internals. Removing it from other places was mostly trivial. This commit completely removes it from the public API and even most of the libsquashfs internals. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-12	Remove usage of sqfs_block_t from block reader	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-12	Clenaup: remove useage of sqfs_block_t from block writer	David Oberhollenzer
	The sqfs_block_t structure has been written for the block processor and exposes way too many internals. This commit removes its usage from the block writer, cutting it down to the bare essentials, so the structure can be removed from the public API later on. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-12	Fix data reader return codes	David Oberhollenzer
	Return an error number as document instead of throwing -1. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-10	Cleanup: remove block hooks entirely from block processor	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-10	Cleanup: remove the fragment store/discard and block discard hooks	David Oberhollenzer
	There is no obvious non-footgun use for those other than tallying statistics, which is now done by the data structures themselves. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-10	Add run time statistics to the block writer and processor	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-09	block processor: merge left overs of block.c/fragment.c into common.c	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-09	Move block writer and fragment table management out of block processor	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-01-31	Split the block writing/deduplication away from the block processor	David Oberhollenzer
	This commit moves the entire block writing and deduplication of data blocks over to a different data type named "block writer". For simplicity, the interfaces of the block processor are left as is and are turned into warppers. Likewise, most of the code in the block writer is just verbatim from the block processor, to be cleaned up in subsequent commits. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>