aboutsummaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2020-08-16Fix libtar treatment of link targets that fill the header fieldDavid Oberhollenzer
The tar header has a 100 byte field for symlink and hard link targets. If the target is longer than 100 bytes, an extension header has to be used. However, it is perfectly valid to fill all 100 bytes to the brim without adding a null terminator. In case of a symlink, this can result in garbage link targets, while for hard links it results in an immediate error since the target cannot be resolved later on. This commit attempts to fix the problem by replacing the strdup of the link target with an strndup that copies at most the size of the target header field. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-08-12Fix block processor single block with don't fragment flag bugDavid Oberhollenzer
This commit fixes a bug where the block processor state machine would not add the "last block" flag if there is only one not entirely filled block and the "don't fragment" flag is set. If the flag isn't set, the inode start block position is not updated and points to the beginning of the image instead. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-08-04Cleanup: move zlib/lz4 code from lib/sqfs/comp/ to lib/David Oberhollenzer
The source code of a modified liblz4 and zlib are included with the option to compile them into libsquashfs if they are not available on the system. So far, the source code was included directly in the compressor sub directory within libsqsuashfs. This commit moves the libraries out into the lib directory. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-07-29Fix: xattr reader: read the header after seaking to an OOL valueDavid Oberhollenzer
If an xattr value is stored OOL, the value actually holds an 8 byte reference to another, previously stored value. This reference points to the header that we need to read to know the actual size of the value before reading it, not the value itself, so after reading the reference and seeking to it, the xattr reader needs to read the actual header. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-20Fix block bounds checking in libsquashfs data readerDavid Oberhollenzer
Instead of doing the fragile size comparison in both loops, simply bail from the function if offset is out of bounds, clamp the size to the available range of the file and abail if it is zero. As a result, a lot of checks can be removed and the function will not return data beyond EOF. This problem occoured with files that have a short last block instead of a fragment. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-13Fix: don't include alloca.h on systems that don't provide this headerv1.0.0David Oberhollenzer
This commit fixes a build issue on BSD based systems, where alloca is defined in stdlib.h and there is no such thing as "alloca.h". Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-13Bump the so version number for libsquashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-12Add an explicit defition for the libsquashfs so versionDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-11Add flags to functions that might logically be expanded in the futureDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-09Cleanup: mark sqfs_xattr_writer_flush writer argument as constDavid Oberhollenzer
It does not make any changes to the writer itself, so mark it as const. This also requires some similar changes to the string table. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-09Cleanup: remove refcount adjusting in sqfs_xattr_writer_endDavid Oberhollenzer
After finding a match, reducing the reference count of the matched elements and increasing them afterwards leaves the reference count identical, because they refere to the same entries. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-09Cleanup: split libsquashfs xattr writer codeDavid Oberhollenzer
This commit moves the libsquashfs xattr related code into a sub directory and splits the xattr writer code up into several files. No actual code is changed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-07Fix uninitialized error code in block processor error pathDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-07Move the fragment deduplication hash table back into the block processorDavid Oberhollenzer
Fragment deduplication really doesn't belong into the public API of the fragment table. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-07block processor: add an internal common cleanup functionDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-04Cleanup: libcommon: use global LUTs for compressor optionsDavid Oberhollenzer
Instead of the convoluted logic, simply use a small number of LUTs that point to the available compressor flags for each compressor, the avaialble options and their ranges. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-04Cleanup: Pull compression level parameter out into compressor configDavid Oberhollenzer
Every compressor (except LC4) has a compression level parameter. This commit pulls the compression level field out into the generic configuration structure and applies some code clean ups as a result from this. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-04Strictly enfore min/max dictionary size in XZ & LZMA compressorsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-04lzma compressor: support micro management optionsDavid Oberhollenzer
The LZMA compressor (through the xz-utils library) supports basically the same options for micro management as the XZ compressor. This commit enables support for those options in the compressor, the option parser and adds an option field to the configuration structure. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-04lzma compressor: add support for the "extreme" flagDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-06-03Cleanup: Add defines for minimum and maximum block sizeDavid Oberhollenzer
This commit adds propper defines in the super block header and removes some of the hard coded constants. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-30Cleanup: sqfs2tar: break up and simplify the repacking codeDavid Oberhollenzer
- Move the xattr extraction and repacking to xattr.c - Don't on-the-fly delete the tar xattr list, use the function from libtar.a - Split minor tasks into static helper functions - creating a libtar xattr struct from libsqfs xattr data - finding a hard link entry from current path and inode number Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-30Block processor: cleanup macros, merge windows & pthread initializationDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29Block processor: merge finish & sync functionsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29Block processor: allow operation without a fragment tableDavid Oberhollenzer
This commit modifies the block processor to support operating without a fragment table. If that is the case, fragment deduplication is essentially disabled and fragment blocks aren't indexed anymore. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29Block processor: Add a raw block submission functionDavid Oberhollenzer
This function allows submission of raw blocks to the block processor, completely bypassing the file API. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29Block processor: add flags to manage hashing & sparse block detectionDavid Oberhollenzer
This commit adds 2 new user settable flags to the block processor: - A flag to ignore sparse blocks and treat them like normal data blocks. - A flag to disable checksum computation altogether. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29Support associating a user pointer with data blocksDavid Oberhollenzer
This commit modifies the block processor to support associating a user data pointer with data blocks that it forwards to the block writer, which is modified to accept an optional user data pointer. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29Block processor: turn internal functions into interface entry pointsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29Make the block processor inode management optionalDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29Turn the sqfs_block_writer_t into an interfaceDavid Oberhollenzer
This way, everything that could be done through the hooks (and more) can be done by simply providign a custom implementation. The result is a lot clener that the previous hook based version. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29cleanup: libsqfs: eliminate block writer statisticsDavid Oberhollenzer
- the "bytes submitted" can be moved over to the block processor - the number of blocks submitted are already there (implcitily, by adding the data block count to the fragment block count) - actual data bytes written can be computed from the super block - the remaining block count can be changed to simple counter that can be obtained through a function. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-29cleanup: libsqfs: remove hooks from sqfs_block_writer_tDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-24Minor fixes/cleanups in the block processorDavid Oberhollenzer
- Move the inode modifications out of do_block. The inode may be reallocated in parallel by the process_completed_block function, so it is not safe to store the fragment location in the do_block function which is used from the worker threads. - Move the accounting of fragment blocks to the process_completed_block function. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-24Cleanup: split the block processor common.c againDavid Oberhollenzer
This commit breaks the common code up again by moving the data submission code to a separate file, making both a little bit more readable. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-24block processor: promote fragments to fragment blocksDavid Oberhollenzer
Instead of [potentially] allocating a new fragment block, take an existing fragment and promote it to the fragmenet block. This saves as a potential block allocation and a memcpy of the initial data. Also it *definitely* removes block allocation from the backend path of the block processor. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-23block processor: move the block consolidation to the worker threadDavid Oberhollenzer
Instead of merging fragments into the fragment block inside the process_completed_fragment function, store a linked list of fragments in the fragment block and do the actual merging (several memcpy calls totaling of up to 1M of data in worst case) in the worker thread instead of the locked, serial path. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-23block processor: recycle blocks to reduce allocation pressureDavid Oberhollenzer
Instead of freeing/allocating blocks all the time in the locked, serial path, use a free list to "recycle" blocks. Once a block is no longer used, throw it onto the free list. If a new block is, needed try to get one from the free list before calling malloc. After a few iterations, the block processor should stop allocating new blocks and only re-use the ones it already has. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-23block processor: don't zero initialize the block payload areaDavid Oberhollenzer
In the block processor, the payload area is only accessed up to the indicated size. Even the part that is accessed is initialized by copying data into the block before increasing the size, so there is no real point in zero-initializing hundres of kilobytes if not megabytes of payload area, especially since this is done in the locked, serial path of the block processor. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-21Fix: zstd: actually set the compression level from the optionsDavid Oberhollenzer
In the zstd compressor, the compression level from the configuration structure wasn't used at all. Instead, the zstd compressor was told to use level 0 and compressor options with that parameter were written to disk. This commit makes sure the level parameter is propperly initialized. Reported-by: Sébastien Gross Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-21hash table: switch to sqfs_* types, mark functions as hiddenDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-21Fix the semantics of the super block deduplicationDavid Oberhollenzer
Its purely informational, but make sure other programs don't print out scary messages that imply the data has been ineficiently. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-19Cleanup: move hash table header to include directoryDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-18libtar: fix size computation of PAX line lengthDavid Oberhollenzer
This commit attempts to fix the following two problems: - The number of digits computation returning an off-by-one result if the number is 10, or the resulting digit string starts with "10". This results in one-too-many padding bytes, corrupting the rest of the archive since the headers now don't start at multiples of 512 anymore. - Adding the line length prefix affects the line length (duh). If it grows far enough to require more digits, the result is a similar problem. This is a converging series that we need to compute the limit of. Unit tests for this still need to be added. Or maybe I can convince a bored undergrad student to provide an induction proof. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-04Expose more fine grained control values & flags on the XZ compressorDavid Oberhollenzer
This patch allows external users to fiddle with the XZ compressors compression strength, alignment and other values. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-04Fix: propperly set the last block flag if fragments are disabledDavid Oberhollenzer
If a file consisting of multiple blocks is produced, the last block is short and the don't fragment flag is set, the last block flag has to be set on the block when we flush it, so the processing pipeline does it's job correctly. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-03Fix: use 0644 as default permissions when creating filesDavid Oberhollenzer
Until now, when packing or unpacking a SquashFS image, files where created with paranoid permissions (i.e. 0600). The rational behind this was that otherwise, the tools may inadvertently expose secrets, e.g. if a root user packs files that that aren't world readable, such as the /etc/shadows file, but the packed SquashFS image is, we have accidentally leaked this file to other users that can access the newly created SquashFS image. The same line of reasoning also applies when unpacking files. Unfortunately, this breaks a list of other, more common standard use cases (e.g. a build server where the an image is built by a deamon running as user X but then has to be accessed by another deamon running as Y). This commit changes to a more standard approach of using permissive file permissions by default and asking paranoid users to simply use a paranoid umask. For tar2sqfs & gensquashfs this simply means chaning the default permissions in the libsquashfs file implementation. For rdsquashfs on the other hand there is still the use case where the unpacked files get the permissions from the [secret] image, so setting a strict umask is not applicable and changing to permissive file mode leaks something. For this case a second code path needs to be added that derives the permissions from the ones in the image. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-27Enable uint128_t pathMatt Turner
I forgot to enable this when I copied it over from Mesa. Mesa's meson configuration system checks that a C program using the uint128_t type compiles, but I think this is likely unnecessary. Simply check the macro that clang and gcc define. This cuts the .text size of hash_table.o by 160 bytes or about 4% on my system. Signed-off-by: Matt Turner <mattst88@gmail.com>
2020-04-27Add hash table code to libutil.aDavid Oberhollenzer
Not only does this build the hashtable into libutil.a, it also makes sure the headers end up in the distribution tarball. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-22Import and use Mesa's hash tableMatt Turner
With `perf record`/`perf report` I saw that 30% of the time was spent in `sqfs_frag_table_find_tail_end` with tar2sqfs for a tarball containing the Gentoo ebuild repository (many thousands of small files). The reason was the bucketing hash table in frag_table.c: too many elements in too few buckets meant lots of walking over the linked lists. This patch replaces that hash table with the hash table implementation from Mesa. Its implementation is more complex (is is an open-addressing, linear-reprobing) hash table, but it is much better suited for the task. On my 4c/8t Skylake, the time to run tar2sqfs drops from 7.5s to less than 3s. CPU usage increases from ~207% to ~356%, presumably indicating an increase in available parallelism due to the removal of the hash table as a bottleneck. The `perf report` profile with this patch shows that the time spent in `sqfs_frag_table_find_tail_end` has dropped from ~30% to 0.01%. Output from ministat: x before + after N Min Max Median Avg Stddev x 20 7.476 7.685 7.5725 7.5615 0.051254268 + 20 2.79 2.901 2.846 2.84475 0.03543842 Difference at 95.0% confidence -4.71675 +/- 0.0282015 -62.3785% +/- 0.241477% (Student's t, pooled s = 0.0440618) I imported only the bits of the hash table implementation that were needed for frag_table.c. Among the changes I made after importing are - removed usage of ralloc, Mesa's recursive memory allocator - Replaced ralloc -> malloc ralloc_free -> free rzalloc_array -> calloc - Removed mem_ctx parameters - Added free()s to the appropriate places (valgrind confirms there are no leaks) - removed _mesa_-prefix from function names Fixes: #40 Signed-off-by: Matt Turner <mattst88@gmail.com>