Age | Commit message (Collapse) | Author |
|
This commit modifies sqfsdiff to extract the compressor options from
the squashfs image and store them in a compressor configuration if
possible. The failure path is modified to *not* burst into flames on
error, because those options are not required by any compressor to
read data from the disk and pretty much every vendor modifed SquashFS
has messed with those to the point that they cannot be propperly
decoded (or the flag is set and there are no options).
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
- Move the inode modifications out of do_block. The inode may be
reallocated in parallel by the process_completed_block function, so
it is not safe to store the fragment location in the do_block
function which is used from the worker threads.
- Move the accounting of fragment blocks to the
process_completed_block function.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This commit breaks the common code up again by moving the data submission
code to a separate file, making both a little bit more readable.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Instead of [potentially] allocating a new fragment block, take an
existing fragment and promote it to the fragmenet block. This saves
as a potential block allocation and a memcpy of the initial data.
Also it *definitely* removes block allocation from the backend path
of the block processor.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Instead of merging fragments into the fragment block inside the
process_completed_fragment function, store a linked list of fragments
in the fragment block and do the actual merging (several memcpy calls
totaling of up to 1M of data in worst case) in the worker thread
instead of the locked, serial path.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Instead of freeing/allocating blocks all the time in the locked,
serial path, use a free list to "recycle" blocks. Once a block is
no longer used, throw it onto the free list. If a new block is,
needed try to get one from the free list before calling malloc.
After a few iterations, the block processor should stop allocating
new blocks and only re-use the ones it already has.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
In the block processor, the payload area is only accessed up to
the indicated size. Even the part that is accessed is initialized
by copying data into the block before increasing the size, so there
is no real point in zero-initializing hundres of kilobytes if not
megabytes of payload area, especially since this is done in the
locked, serial path of the block processor.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
In the zstd compressor, the compression level from the configuration
structure wasn't used at all. Instead, the zstd compressor was told
to use level 0 and compressor options with that parameter were written
to disk.
This commit makes sure the level parameter is propperly initialized.
Reported-by: Sébastien Gross
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Its purely informational, but make sure other programs don't print
out scary messages that imply the data has been ineficiently.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This commit attempts to fix the following two problems:
- The number of digits computation returning an off-by-one result
if the number is 10, or the resulting digit string starts
with "10". This results in one-too-many padding bytes, corrupting
the rest of the archive since the headers now don't start at
multiples of 512 anymore.
- Adding the line length prefix affects the line length (duh). If it
grows far enough to require more digits, the result is a similar
problem. This is a converging series that we need to compute the
limit of.
Unit tests for this still need to be added. Or maybe I can convince a
bored undergrad student to provide an induction proof.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
- Some clarifications
- Some typo fixes
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
When compiling with GCC4 the following error occurs.
> lib/util/rbtree.c:140: undefined reference to `__builtin_uaddl_overflow'
This is because __builtin_uaddl_overflow() and the other
__builtin_u{add,mul}{,l,ll}_overflow() functions are only defined in
GNUC < 5 for Clang. When using GCC4 and below they are not defined.
Since the SZ_ADD_OV and SZ_MUL_OV are only used to check 'size_t' type
values. And overflow on add and multiply of unsigned types is defined
behaviour (C Standard 6.2.5 paragraph 9). It's simple to write overflow
functions for this specific case. These are based on the overflow
wrappers from the SEI CERT C Standard INT30-C.
[1] https://gcc.gnu.org/gcc-5/changes.html
Signed-off-by: Brandon Maier <brandon.maier@rockwellcollins.com>
|
|
This patch allows external users to fiddle with the XZ compressors
compression strength, alignment and other values.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
If a file consisting of multiple blocks is produced, the last block is
short and the don't fragment flag is set, the last block flag has to
be set on the block when we flush it, so the processing pipeline does
it's job correctly.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Add missing options, rephrase some things to be a bit more clear and
fix a bunch of typos.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Since this is a fairly common use case, it deserves a simple test case
to check out that e.g. option processing hasn't been botched up (again).
As input directory, the licenses directory is used as it contains no
intermediate build output and should change fairly infrequently.
The test is enabled irregardless of the corpora-test option.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Change the "required_argument" to the correct "no_argument".
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Until now, when packing or unpacking a SquashFS image, files where
created with paranoid permissions (i.e. 0600). The rational behind
this was that otherwise, the tools may inadvertently expose secrets,
e.g. if a root user packs files that that aren't world readable,
such as the /etc/shadows file, but the packed SquashFS image is, we
have accidentally leaked this file to other users that can access
the newly created SquashFS image. The same line of reasoning also
applies when unpacking files.
Unfortunately, this breaks a list of other, more common standard use
cases (e.g. a build server where the an image is built by a deamon
running as user X but then has to be accessed by another deamon
running as Y).
This commit changes to a more standard approach of using permissive
file permissions by default and asking paranoid users to simply use
a paranoid umask.
For tar2sqfs & gensquashfs this simply means chaning the default
permissions in the libsquashfs file implementation.
For rdsquashfs on the other hand there is still the use case where
the unpacked files get the permissions from the [secret] image, so
setting a strict umask is not applicable and changing to permissive
file mode leaks something. For this case a second code path needs to
be added that derives the permissions from the ones in the image.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Only ignre them if they are in the top most directory, i.e. built
in the source tree. Do not ignore directories named after the
binaries!
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
On the one hand, this commit cleanes the code a bit by splitting
the "scan directory contents" code from the "scan xattrs from
directory contents" and moving the later in a seperate file.
On the other hand, the xattr scanning is now done *after* the fstree
is post processed, which includes sorting it. This way, the xattrs
are always added in a deterministic, reproducible order.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
I forgot to enable this when I copied it over from Mesa. Mesa's
meson configuration system checks that a C program using the uint128_t
type compiles, but I think this is likely unnecessary. Simply check the
macro that clang and gcc define.
This cuts the .text size of hash_table.o by 160 bytes or about 4% on my
system.
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
A common use case for mksquashfs is to simply pack a directory and set
a magic option to force all user/group IDs to root.
This commit adds similar options to gensquashfs to maek it better
suited as a direct replacement for packing an input directory.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Not only does this build the hashtable into libutil.a, it also makes
sure the headers end up in the distribution tarball.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Instead of having the binary programs in randomly named subdirectories,
move all of them to a "bin" subdirectory, similar to the utility
libraries that have subdirectories within "lib" and give the
subdirectories the propper names (e.g. have gensquashfs source in a
directory *actually* named "gensquashfs").
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This fixes a bug in tar2sqfs where the -T option has no effect, because
the block processor flags were propperly generated, but not passed on.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
With `perf record`/`perf report` I saw that 30% of the time was spent in
`sqfs_frag_table_find_tail_end` with tar2sqfs for a tarball containing
the Gentoo ebuild repository (many thousands of small files).
The reason was the bucketing hash table in frag_table.c: too many
elements in too few buckets meant lots of walking over the linked lists.
This patch replaces that hash table with the hash table implementation
from Mesa. Its implementation is more complex (is is an open-addressing,
linear-reprobing) hash table, but it is much better suited for the task.
On my 4c/8t Skylake, the time to run tar2sqfs drops from 7.5s to less
than 3s. CPU usage increases from ~207% to ~356%, presumably indicating
an increase in available parallelism due to the removal of the hash
table as a bottleneck. The `perf report` profile with this patch shows
that the time spent in `sqfs_frag_table_find_tail_end` has dropped from
~30% to 0.01%.
Output from ministat:
x before
+ after
N Min Max Median Avg Stddev
x 20 7.476 7.685 7.5725 7.5615 0.051254268
+ 20 2.79 2.901 2.846 2.84475 0.03543842
Difference at 95.0% confidence
-4.71675 +/- 0.0282015
-62.3785% +/- 0.241477%
(Student's t, pooled s = 0.0440618)
I imported only the bits of the hash table implementation that were
needed for frag_table.c. Among the changes I made after importing are
- removed usage of ralloc, Mesa's recursive memory allocator
- Replaced ralloc -> malloc
ralloc_free -> free
rzalloc_array -> calloc
- Removed mem_ctx parameters
- Added free()s to the appropriate places (valgrind confirms there
are no leaks)
- removed _mesa_-prefix from function names
Fixes: #40
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
Tar archives can contain set two kinds of PAX headers:
- local headers that modify the attributes of the next file
- global headers that set defaults for all files
The later is used "... not widely used", according to tar(5)
and has been deliberately not implemented.
Some programs (e.g. git-archive) *do* generate them (in the case
of git, it stores the commit hash).
This commit adds a code path that skips a PAX global header entirely
and resumes tar parsing, instead of erroneusly reporting it as an
entry.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
In libtar, the sizeof time_t checked when trying to store a time value.
It is pointless using the preprocessor here, as we can simply do an
if (sizeof(time_t) < ...) check and the compiler will take care
optimizing away one or the other branch.
After changing the libtar check and the corresponding unit tests, the
sizeof check can be removed from configure.ac, along with other unused
sizeof checks.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Simply moving the pax header decoding to a separate file and splitting
out the common helper functions should be a good start.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This commit adds a few macros and helper functions for the unit test
programs. Those are used instead of asserts to provide more fine
grained diagnostics on the one hand and on the other hand because
they also work if NDEBUG is defined, unlike asserts that get eliminated
in that case.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This commit changes the tar2sqfs & gensquashfs code to pass the exit
status on to sqfs_writer_cleanup in libcommon.
The function sqfs writer code in libcommon is changed to retain the
output file name and delete it if the status passed to the cleanup
function is anything other than EXIT_SUCCESS.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Otherwise, C++ compilers will scream.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
sqfs2tar is supposed to append slashes to directory names. Until now,
it assumed a tree node to be a directory if it has children. This
simple check obviously fails for empty directories. This commit fixes
the check by actually testing the inode mode.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
lib/common/compress.c: In function 'compressor_get_default':
lib/common/compress.c:39:2: warning: implicit declaration of function 'assert' [-Wimplicit-function-declaration]
39 | assert(0);
| ^~~~~~
lib/common/compress.c:8:1: note: 'assert' is defined in header '<assert.h>'; did you forget to '#include <assert.h>'?
7 | #include "common.h"
+++ |+#include <assert.h>
8 |
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|