Age | Commit message (Collapse) | Author |
|
With `perf record`/`perf report` I saw that 30% of the time was spent in
`sqfs_frag_table_find_tail_end` with tar2sqfs for a tarball containing
the Gentoo ebuild repository (many thousands of small files).
The reason was the bucketing hash table in frag_table.c: too many
elements in too few buckets meant lots of walking over the linked lists.
This patch replaces that hash table with the hash table implementation
from Mesa. Its implementation is more complex (is is an open-addressing,
linear-reprobing) hash table, but it is much better suited for the task.
On my 4c/8t Skylake, the time to run tar2sqfs drops from 7.5s to less
than 3s. CPU usage increases from ~207% to ~356%, presumably indicating
an increase in available parallelism due to the removal of the hash
table as a bottleneck. The `perf report` profile with this patch shows
that the time spent in `sqfs_frag_table_find_tail_end` has dropped from
~30% to 0.01%.
Output from ministat:
x before
+ after
N Min Max Median Avg Stddev
x 20 7.476 7.685 7.5725 7.5615 0.051254268
+ 20 2.79 2.901 2.846 2.84475 0.03543842
Difference at 95.0% confidence
-4.71675 +/- 0.0282015
-62.3785% +/- 0.241477%
(Student's t, pooled s = 0.0440618)
I imported only the bits of the hash table implementation that were
needed for frag_table.c. Among the changes I made after importing are
- removed usage of ralloc, Mesa's recursive memory allocator
- Replaced ralloc -> malloc
ralloc_free -> free
rzalloc_array -> calloc
- Removed mem_ctx parameters
- Added free()s to the appropriate places (valgrind confirms there
are no leaks)
- removed _mesa_-prefix from function names
Fixes: #40
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
This fixes a copy and paste error in the cleanup path, destroying a
previously destroyed object again instead of the one being tested for.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
On Linux, checking for > 0 worked because pthread_t is internally an
integer type. On other platforms (*caugh* Mac OS X *caugh*), it is
typedefed to an opaque pointer, causing a warning if used in an
integer relational comparison.
The intended use is to allow the generic cleanup function to be used
in the error path of the block processor creation function, while
preventing pthread_join being called on threads that haven't been
created at all. Since they are calloc'ed to 0, testing for non-zero
values should suffice in both cases.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Older versions of liblz4 don't define LZ4HC_CLEVEL_MAX. This commit
adds a definition if liblz4 doesn't provide one.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Make sure the function has a way of telling the caller *why* it failed.
This way, the function can convey whether it had an internal error, an
allocation failure, whether the arguments are totaly nonsensical, or
simply that the compressor *or specific configuration* is not supported.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Avoid namespace polution. Make sure all exportet symbols are prefixed
with either sqfs_ or SQFS_.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Instead of creating everything in the "create" function, cleanup and
create/initialize stuff in a "load" function. This allows the xattr
reader to be reset/re-used and adds the benefit of not having to
lug around references to the super block, compressor and file (altough
the later two are hidden inside the meta reader).
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This patch adds a deep-copy callback to sqfs_object_t and removes the
copying mechanism from sqfs_compressor_t. This is also interesting for
other types.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Make sure the function throws an error if a given compressor ID or
flag is not known. This way, libsquahfs supports *exactly* and *only*
what the on-disk format specifies.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This allows getting the compressor configuration back after
creating it, for various purposes.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
With unified payload size counters, copying an inode is now trivial.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
If the block processor allocates and dynamically resizes inodes on
the fly, we can add data indefinitely without knowing the size of
the file ahead of time.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Instead of having seperate counters for blocks, dir index bytes
and having to fiddle out the link target size, simply use a single
value that stores the number of payload bytes used.
A seperate "payload bytes available" is used for dynamically
growing inodes during processing.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Since the merged destructor checks if the objects it destroys were
actually initialized, the pthread implementation can also replace
its error path cleanup with simply calling the destructor.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This function waits for all pending blocks to be written to disk, but
doesn't flush the fragment block, so processing can continue afterwards
as if nothing happened.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This commit removes the allocation helpers and string table functions
out of libsquashfs back into a "libutil.a". The problem of libsquashfs
exporting stuff that it shouldn't is resolved by retaining the internal
attributes and directly adding the source to libsquashfs instead of
trying to somehow link against libutil.la.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
- Merge duplicated code from append_to_work_queue and
sqfs_block_processor_finish
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Implement the io-queue based design as outline in doc/parallelism.txt
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
- Split the worker function up into smaller functions that are
a little more readable.
- Only dequeue one block at a time. Makes the dequeueing a lot
more readable and understandable.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
On the one hand, benchmarking and profiling determined xxhash32 to be
faster than the zlib implementation of crc32, on the other hand
profiling determined that crc32 computation contributed signifficantly
to the overall runtime.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This commit moves all of the fragment/block accounting in the block
processor over to the writing end of the pipeline. In order to do
this, the sparse blocks are allowed to bubble through the pipeline.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Again, the generic init/cleanup functions do way too many things that
are specific to the thread pool implementation. Thanks to the splitting
up of the block processor, they also have become quite trivial. This
commit moves those functions into their respective implementations,
allowing even further simplificiation.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
The "generic" version actually uses specific internals of the thread
pool implementation. Move it back into the thread pool based
implementation and simplify the serial processor.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
If the API is mis-used or allocation fails, return an appropriate
error code, but don't permanently break the block processor. It's
easy to recover from that.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Under the assumption that block processing is CPU bound and not I/O
bound, this is entirely useless.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Profiling on a sample filesystem determined that fragment
deduplication lookups rank third place directly after crc32 and the
actual compression. By using a hash table instead of linear search,
this time can be reduced drastically.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Make every dynamically allocated, opaque data structure inherit from
a common sqfs_object_t structure with common entry points (e.g. destroy).
This removes tons of public API functions and replaces them with a
simple sqfs_destroy instead. If semantics of the (until now implicit)
object system need to be extended, it can be much more conveniantely
done this way.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
It was basically built around the block processor and exposed way too
many internals. Removing it from other places was mostly trivial. This
commit completely removes it from the public API and even most of the
libsquashfs internals.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
The sqfs_block_t structure has been written for the block processor
and exposes way too many internals. This commit removes its usage
from the block writer, cutting it down to the bare essentials, so
the structure can be removed from the public API later on.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Return an error number as document instead of throwing -1.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
There is no obvious non-footgun use for those other than tallying
statistics, which is now done by the data structures themselves.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This commit moves the entire block writing and deduplication of data
blocks over to a different data type named "block writer".
For simplicity, the interfaces of the block processor are left as is
and are turned into warppers. Likewise, most of the code in the block
writer is just verbatim from the block processor, to be cleaned up in
subsequent commits.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|