Age | Commit message (Collapse) | Author |
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Move all the libutil stuff from the toplevel include/ to a util/
sub directory and fix up the includes that make use of them.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This commit mainly serves the static analysis tooling.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Fragment deduplication really doesn't belong into the public API of
the fragment table.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
With `perf record`/`perf report` I saw that 30% of the time was spent in
`sqfs_frag_table_find_tail_end` with tar2sqfs for a tarball containing
the Gentoo ebuild repository (many thousands of small files).
The reason was the bucketing hash table in frag_table.c: too many
elements in too few buckets meant lots of walking over the linked lists.
This patch replaces that hash table with the hash table implementation
from Mesa. Its implementation is more complex (is is an open-addressing,
linear-reprobing) hash table, but it is much better suited for the task.
On my 4c/8t Skylake, the time to run tar2sqfs drops from 7.5s to less
than 3s. CPU usage increases from ~207% to ~356%, presumably indicating
an increase in available parallelism due to the removal of the hash
table as a bottleneck. The `perf report` profile with this patch shows
that the time spent in `sqfs_frag_table_find_tail_end` has dropped from
~30% to 0.01%.
Output from ministat:
x before
+ after
N Min Max Median Avg Stddev
x 20 7.476 7.685 7.5725 7.5615 0.051254268
+ 20 2.79 2.901 2.846 2.84475 0.03543842
Difference at 95.0% confidence
-4.71675 +/- 0.0282015
-62.3785% +/- 0.241477%
(Student's t, pooled s = 0.0440618)
I imported only the bits of the hash table implementation that were
needed for frag_table.c. Among the changes I made after importing are
- removed usage of ralloc, Mesa's recursive memory allocator
- Replaced ralloc -> malloc
ralloc_free -> free
rzalloc_array -> calloc
- Removed mem_ctx parameters
- Added free()s to the appropriate places (valgrind confirms there
are no leaks)
- removed _mesa_-prefix from function names
Fixes: #40
Signed-off-by: Matt Turner <mattst88@gmail.com>
|
|
This patch adds a deep-copy callback to sqfs_object_t and removes the
copying mechanism from sqfs_compressor_t. This is also interesting for
other types.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Profiling on a sample filesystem determined that fragment
deduplication lookups rank third place directly after crc32 and the
actual compression. By using a hash table instead of linear search,
this time can be reduced drastically.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Make every dynamically allocated, opaque data structure inherit from
a common sqfs_object_t structure with common entry points (e.g. destroy).
This removes tons of public API functions and replaces them with a
simple sqfs_destroy instead. If semantics of the (until now implicit)
object system need to be extended, it can be much more conveniantely
done this way.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This removes further clutter from the data writer. Any future efforts
on making fragment by hash lookup faster can focus on that area only
and don't clutter the block processor.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|