aboutsummaryrefslogtreecommitdiff
path: root/lib/util
AgeCommit message (Collapse)Author
2020-04-27Add hash table code to libutil.aDavid Oberhollenzer
Not only does this build the hashtable into libutil.a, it also makes sure the headers end up in the distribution tarball. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-22Import and use Mesa's hash tableMatt Turner
With `perf record`/`perf report` I saw that 30% of the time was spent in `sqfs_frag_table_find_tail_end` with tar2sqfs for a tarball containing the Gentoo ebuild repository (many thousands of small files). The reason was the bucketing hash table in frag_table.c: too many elements in too few buckets meant lots of walking over the linked lists. This patch replaces that hash table with the hash table implementation from Mesa. Its implementation is more complex (is is an open-addressing, linear-reprobing) hash table, but it is much better suited for the task. On my 4c/8t Skylake, the time to run tar2sqfs drops from 7.5s to less than 3s. CPU usage increases from ~207% to ~356%, presumably indicating an increase in available parallelism due to the removal of the hash table as a bottleneck. The `perf report` profile with this patch shows that the time spent in `sqfs_frag_table_find_tail_end` has dropped from ~30% to 0.01%. Output from ministat: x before + after N Min Max Median Avg Stddev x 20 7.476 7.685 7.5725 7.5615 0.051254268 + 20 2.79 2.901 2.846 2.84475 0.03543842 Difference at 95.0% confidence -4.71675 +/- 0.0282015 -62.3785% +/- 0.241477% (Student's t, pooled s = 0.0440618) I imported only the bits of the hash table implementation that were needed for frag_table.c. Among the changes I made after importing are - removed usage of ralloc, Mesa's recursive memory allocator - Replaced ralloc -> malloc ralloc_free -> free rzalloc_array -> calloc - Removed mem_ctx parameters - Added free()s to the appropriate places (valgrind confirms there are no leaks) - removed _mesa_-prefix from function names Fixes: #40 Signed-off-by: Matt Turner <mattst88@gmail.com>
2020-03-18Restore workaround for unaligned reads in xxhashDavid Oberhollenzer
The code was originally used inside the block processor, where 32 bit aligned data could be guaranteed. If it is available in libutil, I cannot possibly guarantee for alignment in future use elsewhere. Even for the block processor it was rather risky "remember this detail very well" buisness. This commit restores the unaligned read treatment of the original. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-18Cleanup: Move xxhash32 code to libutilDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-03-04Add a deep copy function for the str_table_t helperDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-22Add a very simple red-black tree implementationDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-21Cleanup: move utilities back out of libsquashfsDavid Oberhollenzer
This commit removes the allocation helpers and string table functions out of libsquashfs back into a "libutil.a". The problem of libsquashfs exporting stuff that it shouldn't is resolved by retaining the internal attributes and directly adding the source to libsquashfs instead of trying to somehow link against libutil.la. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-25Cleanup: remove what is left of libutilDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-25Cleanup: move overflow safe alloc code into libsquashfsDavid Oberhollenzer
There were only a hand full of instances outside libsquashfs that used the alloc code. In most cases, the thing allocated hat its size derived from something already in memory anyway, so it is safe to assume its size fits into a size_t. At the same time, the opencoded Windows path conversion functions are all unified into a single helper function. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-24Cleanup: completely move str_table into libsquashfs internalsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-24Cleanup: remove unused str_table functionsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-24Cleanup: move canonicalize_name back to libfstree.aDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-22Cleanup: move all the compatibillity fluff to a dedicated "libcompat"David Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-28Add fallback implementation for getsubopt()David Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-28Add a minimal fallback implementation for getline()David Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-28Add libutil implementation for strndup in case it isn't availableDavid Oberhollenzer
"Some" "non-POSIX systems" don't have that. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-23For Windows builds, remove the so-version, link libgcc staticallyDavid Oberhollenzer
The -static-libgcc flag has to be passed through the compiler with a "-Wc," prefix, because libtool tries to be clever about linker flags. If added directly to LDFLAGS, libtool removes it. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-11Add the required libtool magic to build a Windows DLLDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: move libutil related headers to "util" sub directoryDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: move write_data to libtarDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: move directory handling code to libcommonDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: move read_data function to libtarDavid Oberhollenzer
Its the only user. The other code doesn't touch raw file descriptors anymore. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: Move padd_file function to libtarDavid Oberhollenzer
It's only ever used for padding tarballs. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-06Cleanup: move padd_sqfs to helper libraryDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-29Fix str_table_t error behaviour, commentsDavid Oberhollenzer
- str_table_t is used by libsquashfs; Don't write to stderr, report an error code instead. - Fix the comments about that and fix the SPDX license identifier while we're at it. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-28Cleanup: remove string allocation helper functionDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-28Add recoding implementation for the xattr writerDavid Oberhollenzer
This commit implements the part of the API responsible for recoding and deduplicating xattr key-value blocks. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-27Add a header for platform compatibillity fluffDavid Oberhollenzer
- We don't have "endian.h" everywhere. On some BSDs its in sys and on some BSDs the macros have different names. - We definitely don't have sysmacros.h on non-Unix-like systems. - Likewise for sys/types.h, sys/stat.h and their contents. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-27Cleanup: replace fixed with data types with typedefsDavid Oberhollenzer
This is a fully automated search and replace, i.e. I ran this: git grep -l uint8_t | xargs sed -i 's/uint8_t/sqfs_u8/g' git grep -l uint16_t | xargs sed -i 's/uint16_t/sqfs_u16/g' git grep -l uint32_t | xargs sed -i 's/uint32_t/sqfs_u32/g' git grep -l uint64_t | xargs sed -i 's/uint64_t/sqfs_u64/g' git grep -l int8_t | xargs sed -i 's/int8_t/sqfs_s8/g' git grep -l int16_t | xargs sed -i 's/int16_t/sqfs_s16/g' git grep -l int32_t | xargs sed -i 's/int32_t/sqfs_s32/g' git grep -l int64_t | xargs sed -i 's/int64_t/sqfs_s64/g' and than added the appropriate definitions to sqfs/predef.h The whole point being better compatibillity with platforms that may not have an stdint.h with the propper definitions. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-20Large round of dead code removalDavid Oberhollenzer
Remove all the library functions that no longer have any users. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-20Move canonicalize_name back to libutilDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-11Cleanup Automake file for librariesDavid Oberhollenzer
Split the signel file up into several small ones and use a variable for the public headers instead of duplicating them. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-09Change license of libsquashfs.soDavid Oberhollenzer
This this commit, the code that is compiled and linked to produce libsquashfs.so (and *only* libsquashfs.so) is released under the terms and conditions of the GNU Lesser General Public license, allowing 3rd party programs to dynamically link against it and still being able to release their programs under a license of their choosing. The libutil library is also relicensed, because it is statically linked into libsquashfs.so. This does not affect the command line programs that remain under the GPLv3. At the current time, the two libraries do not contain any 3rd party contributions and thus do not require anybodys consent other than mine to change the licensing terms (unlike e.g. libfstree.a). The libfstree.a and libsqfshelper.a utility libraries will also remain under the GPLv3 license. Going forward, some parts of libsqfshelper.a will be moved to libsquashfs.so. The parts absorbed by libsquashfs.so will then also have their licensing conditions changed *at that time*. For the public headers, this commit adds the full boiler-plate comment about the license, for the source files and internal headers, only the SPDX identifier is changed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-08Replace direct file I/O with abstraction layerDavid Oberhollenzer
This should make it easier to use libsquashfs with custom setups that embedd a squashfs image inside something else. Also, it should make it easier to port to non unix-like platforms. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-05Fix "safe" string allocation wrapperDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-01Move some application specific stuff out of libutilDavid Oberhollenzer
This commit does the following: - canonicalize_name is moved to libfstree - source_date_epoch is only used inside libfstree, so it's also moved over and can later be completely internalized - print_version is moved over to sqfshelper. Mainly so it doesn't end up in libsquashfs.so for no sane reason. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-27Merge alloc_flex conditionals into oneDavid Oberhollenzer
It is shorter and less confusing for coverity. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-23Some simple search/replace cases for allocationDavid Oberhollenzer
This commit exchanges some malloc(x + y * z) patterns that can be found with a simple git grep and are obvious for the new wrappers. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-23Add wrappers for calloc style functions with size overflow checkingDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-18Replace update_crc32 helper function with crc32 from zlibDavid Oberhollenzer
It is optimized to the maximum and if we already use zlib anyway, why not use zlib crc32? This also makes zlib a hard dependency which also means the whole "do we have a compressor" sanity check in the build system can be removed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02Implement support for SOURCE_DATE_EPOCH environment variableDavid Oberhollenzer
reproducible-builds.org suggests the use of an environment variable as a source for time stamps: https://reproducible-builds.org/specs/source-date-epoch/ This commit adds support for setting the default mtime from the variable, if it is set and only defaulting to 0 if not. The timestamp given by the command line switch takes precedence. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-30Update print_version textDavid Oberhollenzer
This commit updates the text issued by print_version() to reflect in some way that the software contains contributions from co-authors. The original text was based on the sterotypical --version output of GNU coreutils programs. It may have to be rewritten eventually. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-30Add propper copyright headers to all source filesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Add utility function to compute crc32 check sumsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25libutil: add read_data style wrapper around pread()David Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24Enable largefile supportMatt Turner
Requires that config.h be included before other headers, since the macro _FILE_OFFSET_BITS changes the definitions of things like 'struct stat'. I chose to simply include it at the top of every C file and at immediately after the double-inclusion guards of every header. Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21Keep track of xattr key & value references AFTER deduplicationDavid Oberhollenzer
This commit adds a reference count functionality to the string table implementation and uses this functionality in the fstree code to count how often each key and value is referenced by the deduplicated Xattr blocks. This is needed to support deduplication through out-of-band storage of xattrs. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-19libutil.a: make sure str_table_init initializes all fieldsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16cleanup: move error handling into read_retryDavid Oberhollenzer
If read_retry fails to read the expected amount of data (EOF or otherwise), it is almost always an error. This commit renames read_retry to read_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16cleanup: move error handling into write_retryDavid Oberhollenzer
If write_retry fails to write everything, it is *always* an error. This commit renames write_retry to write_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>