squashfs-tools-ng.git - A new set of tools and libraries for working with SquashFS images

Age	Commit message (Collapse)	Author
2020-05-23	block processor: don't zero initialize the block payload area	David Oberhollenzer
	In the block processor, the payload area is only accessed up to the indicated size. Even the part that is accessed is initialized by copying data into the block before increasing the size, so there is no real point in zero-initializing hundres of kilobytes if not megabytes of payload area, especially since this is done in the locked, serial path of the block processor. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-21	Fix: zstd: actually set the compression level from the options	David Oberhollenzer
	In the zstd compressor, the compression level from the configuration structure wasn't used at all. Instead, the zstd compressor was told to use level 0 and compressor options with that parameter were written to disk. This commit makes sure the level parameter is propperly initialized. Reported-by: Sébastien Gross Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-21	hash table: switch to sqfs_* types, mark functions as hidden	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-21	Fix the semantics of the super block deduplication	David Oberhollenzer
	Its purely informational, but make sure other programs don't print out scary messages that imply the data has been ineficiently. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-19	Cleanup: move hash table header to include directory	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-18	libtar: fix size computation of PAX line length	David Oberhollenzer
	This commit attempts to fix the following two problems: - The number of digits computation returning an off-by-one result if the number is 10, or the resulting digit string starts with "10". This results in one-too-many padding bytes, corrupting the rest of the archive since the headers now don't start at multiples of 512 anymore. - Adding the line length prefix affects the line length (duh). If it grows far enough to require more digits, the result is a similar problem. This is a converging series that we need to compute the limit of. Unit tests for this still need to be added. Or maybe I can convince a bored undergrad student to provide an induction proof. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-04	Expose more fine grained control values & flags on the XZ compressor	David Oberhollenzer
	This patch allows external users to fiddle with the XZ compressors compression strength, alignment and other values. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-04	Fix: propperly set the last block flag if fragments are disabled	David Oberhollenzer
	If a file consisting of multiple blocks is produced, the last block is short and the don't fragment flag is set, the last block flag has to be set on the block when we flush it, so the processing pipeline does it's job correctly. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-03	Fix: use 0644 as default permissions when creating files	David Oberhollenzer
	Until now, when packing or unpacking a SquashFS image, files where created with paranoid permissions (i.e. 0600). The rational behind this was that otherwise, the tools may inadvertently expose secrets, e.g. if a root user packs files that that aren't world readable, such as the /etc/shadows file, but the packed SquashFS image is, we have accidentally leaked this file to other users that can access the newly created SquashFS image. The same line of reasoning also applies when unpacking files. Unfortunately, this breaks a list of other, more common standard use cases (e.g. a build server where the an image is built by a deamon running as user X but then has to be accessed by another deamon running as Y). This commit changes to a more standard approach of using permissive file permissions by default and asking paranoid users to simply use a paranoid umask. For tar2sqfs & gensquashfs this simply means chaning the default permissions in the libsquashfs file implementation. For rdsquashfs on the other hand there is still the use case where the unpacked files get the permissions from the [secret] image, so setting a strict umask is not applicable and changing to permissive file mode leaks something. For this case a second code path needs to be added that derives the permissions from the ones in the image. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-27	Enable uint128_t path	Matt Turner
	I forgot to enable this when I copied it over from Mesa. Mesa's meson configuration system checks that a C program using the uint128_t type compiles, but I think this is likely unnecessary. Simply check the macro that clang and gcc define. This cuts the .text size of hash_table.o by 160 bytes or about 4% on my system. Signed-off-by: Matt Turner <mattst88@gmail.com>
2020-04-27	Add hash table code to libutil.a	David Oberhollenzer
	Not only does this build the hashtable into libutil.a, it also makes sure the headers end up in the distribution tarball. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-22	Import and use Mesa's hash table	Matt Turner
	With `perf record`/`perf report` I saw that 30% of the time was spent in `sqfs_frag_table_find_tail_end` with tar2sqfs for a tarball containing the Gentoo ebuild repository (many thousands of small files). The reason was the bucketing hash table in frag_table.c: too many elements in too few buckets meant lots of walking over the linked lists. This patch replaces that hash table with the hash table implementation from Mesa. Its implementation is more complex (is is an open-addressing, linear-reprobing) hash table, but it is much better suited for the task. On my 4c/8t Skylake, the time to run tar2sqfs drops from 7.5s to less than 3s. CPU usage increases from ~207% to ~356%, presumably indicating an increase in available parallelism due to the removal of the hash table as a bottleneck. The `perf report` profile with this patch shows that the time spent in `sqfs_frag_table_find_tail_end` has dropped from ~30% to 0.01%. Output from ministat: x before + after N Min Max Median Avg Stddev x 20 7.476 7.685 7.5725 7.5615 0.051254268 + 20 2.79 2.901 2.846 2.84475 0.03543842 Difference at 95.0% confidence -4.71675 +/- 0.0282015 -62.3785% +/- 0.241477% (Student's t, pooled s = 0.0440618) I imported only the bits of the hash table implementation that were needed for frag_table.c. Among the changes I made after importing are - removed usage of ralloc, Mesa's recursive memory allocator - Replaced ralloc -> malloc ralloc_free -> free rzalloc_array -> calloc - Removed mem_ctx parameters - Added free()s to the appropriate places (valgrind confirms there are no leaks) - removed _mesa_-prefix from function names Fixes: #40 Signed-off-by: Matt Turner <mattst88@gmail.com>
2020-04-22	Skip PAX global headers	David Oberhollenzer
	Tar archives can contain set two kinds of PAX headers: - local headers that modify the attributes of the next file - global headers that set defaults for all files The later is used "... not widely used", according to tar(5) and has been deliberately not implemented. Some programs (e.g. git-archive) do generate them (in the case of git, it stores the commit hash). This commit adds a code path that skips a PAX global header entirely and resumes tar parsing, instead of erroneusly reporting it as an entry. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-17	Remove some configure time sizeof checks	David Oberhollenzer
	In libtar, the sizeof time_t checked when trying to store a time value. It is pointless using the preprocessor here, as we can simply do an if (sizeof(time_t) < ...) check and the compiler will take care optimizing away one or the other branch. After changing the libtar check and the corresponding unit tests, the sizeof check can be removed from configure.ac, along with other unused sizeof checks. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-17	Cleanup: split read_header.c in libtar.a	David Oberhollenzer
	Simply moving the pax header decoding to a separate file and splitting out the common helper functions should be a good start. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-16	tar2sqfs & gensquashfs: Delete the output file on failure	David Oberhollenzer
	This commit changes the tar2sqfs & gensquashfs code to pass the exit status on to sqfs_writer_cleanup in libcommon. The function sqfs writer code in libcommon is changed to retain the output file name and delete it if the status passed to the cleanup function is anything other than EXIT_SUCCESS. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-01	Fix missing header without LZO	Alyssa Ross
	lib/common/compress.c: In function 'compressor_get_default': lib/common/compress.c:39:2: warning: implicit declaration of function 'assert' [-Wimplicit-function-declaration] 39 \| assert(0); \| ^~~~~~ lib/common/compress.c:8:1: note: 'assert' is defined in header '<assert.h>'; did you forget to '#include <assert.h>'? 7 \| #include "common.h" +++ \|+#include <assert.h> 8 \|
2020-03-19	Fix compressor availability check in libcommon	David Oberhollenzer
	Initialize have_compressor to false before testing, to make the check work. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-19	Fix destruction of NULL pointer in xattr reader cleanup	David Oberhollenzer
	This fixes a copy and paste error in the cleanup path, destroying a previously destroyed object again instead of the one being tested for. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-19	Fix pthread_join check for valid thread handles	David Oberhollenzer
	On Linux, checking for > 0 worked because pthread_t is internally an integer type. On other platforms (caugh Mac OS X caugh), it is typedefed to an opaque pointer, causing a warning if used in an integer relational comparison. The intended use is to allow the generic cleanup function to be used in the error path of the block processor creation function, while preventing pthread_join being called on threads that haven't been created at all. Since they are calloc'ed to 0, testing for non-zero values should suffice in both cases. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-18	Fix build of lz4 compressor with older versions of liblz4	David Oberhollenzer
	Older versions of liblz4 don't define LZ4HC_CLEVEL_MAX. This commit adds a definition if liblz4 doesn't provide one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-18	Restore workaround for unaligned reads in xxhash	David Oberhollenzer
	The code was originally used inside the block processor, where 32 bit aligned data could be guaranteed. If it is available in libutil, I cannot possibly guarantee for alignment in future use elsewhere. Even for the block processor it was rather risky "remember this detail very well" buisness. This commit restores the unaligned read treatment of the original. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-18	Cleanup: Move xxhash32 code to libutil	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-05	Get rid of sqfs_compressor_exists	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-05	Change the signature of sqfs_compressor_create to return an error code	David Oberhollenzer
	Make sure the function has a way of telling the caller why it failed. This way, the function can convey whether it had an internal error, an allocation failure, whether the arguments are totaly nonsensical, or simply that the compressor or specific configuration is not supported. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-05	Cleanup: Remove the E_ prefix from all libsquashfs enumerators	David Oberhollenzer
	Avoid namespace polution. Make sure all exportet symbols are prefixed with either sqfs_ or SQFS_. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-04	Fix block writer inheritance of sqfs_object_t	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-04	Cleanup: match xattr reader API closer to id table API	David Oberhollenzer
	Instead of creating everything in the "create" function, cleanup and create/initialize stuff in a "load" function. This allows the xattr reader to be reset/re-used and adds the benefit of not having to lug around references to the super block, compressor and file (altough the later two are hidden inside the meta reader). Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-04	Add a generic copying mechanism to sqfs_object_t	David Oberhollenzer
	This patch adds a deep-copy callback to sqfs_object_t and removes the copying mechanism from sqfs_compressor_t. This is also interesting for other types. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-04	Add a deep copy function for the str_table_t helper	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-01	Add a "do not deduplicate" block flag	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-01	Fix printf format specifies for sqfs_u64	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-01	Fix: Replace bit shifts in parse_size with SZ_MUL_OV	David Oberhollenzer
	On 32 bit systems, size_t is a 32 bit integer and doing 64 bit shifts won't do us any good. So instead, use the existing size_t multiply and catch overflow macros. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-01	Fix alloca in write_inode.c for windows build	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-28	Cleanup pax header parser a little	David Oberhollenzer
	This commit tries to untangle the logic of parsing and sanitizing the pax header length field and the associated bounds checks. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-27	Fix: strictly verify compressor settings in config initialization	David Oberhollenzer
	Make sure the function throws an error if a given compressor ID or flag is not known. This way, libsquahfs supports exactly and only what the on-disk format specifies. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-27	Add a function to the compressor interface to get the configuration	David Oberhollenzer
	This allows getting the compressor configuration back after creating it, for various purposes. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-23	Remove the sqfs_inode_copy function	David Oberhollenzer
	With unified payload size counters, copying an inode is now trivial. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-23	Turn file inode management completely over to the block processor	David Oberhollenzer
	If the block processor allocates and dynamically resizes inodes on the fly, we can add data indefinitely without knowing the size of the file ahead of time. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-23	Unify the payload counters in the sqfs_inode_generic_t	David Oberhollenzer
	Instead of having seperate counters for blocks, dir index bytes and having to fiddle out the link target size, simply use a single value that stores the number of payload bytes used. A seperate "payload bytes available" is used for dynamically growing inodes during processing. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	libcommon: stdin file: Fix size accounting for sparse files	David Oberhollenzer
	The file has to report the "apparent size" for sparse files, but internally work with the actual size in the tar ball. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Move inode size accounting completely to the block processor	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Cleanup block processor: merge common initialization code	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Cleanup block processor: Merge destructors for Windows & pthreads	David Oberhollenzer
	Since the merged destructor checks if the objects it destroys were actually initialized, the pthread implementation can also replace its error path cleanup with simply calling the destructor. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Add a seperate sqfs_block_processor_sync function	David Oberhollenzer
	This function waits for all pending blocks to be written to disk, but doesn't flush the fragment block, so processing can continue afterwards as if nothing happened. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Make hard link detection scale better	David Oberhollenzer
	Instead of doing a linear search, which scales quadratically, use a red-black tree with inode numbers as key for finding hard links. To reiterate: we can't just use a flat table, because the SquashFS file is potentially untrusted and the inode numbers can be anything, no matter what the super block says. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22	Add a very simple red-black tree implementation	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-21	Cleanup: move utilities back out of libsquashfs	David Oberhollenzer
	This commit removes the allocation helpers and string table functions out of libsquashfs back into a "libutil.a". The problem of libsquashfs exporting stuff that it shouldn't is resolved by retaining the internal attributes and directly adding the source to libsquashfs instead of trying to somehow link against libutil.la. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-20	Thread pool block processor: Cleanup after restructuring	David Oberhollenzer
	- Merge duplicated code from append_to_work_queue and sqfs_block_processor_finish Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-20	Restructure thread pool block processor	David Oberhollenzer
	Implement the io-queue based design as outline in doc/parallelism.txt Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>