aboutsummaryrefslogtreecommitdiff
path: root/lib/sqfs
AgeCommit message (Collapse)Author
2023-01-19Add a helper function to initialize libsquashfs objectsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2023-01-19libsqfs: add a threshold for extended directory inodes with indexDavid Oberhollenzer
mksquashfs generates extended inodes if a directory contains 256 entries. libsquashfs so far only generated extended inodes if there is no other way to encode it. Mimic the behaviour of mksquashfs by adding a threshold. For this to work, the "sqfs_inode_set_xattr_index" function has to be changed to not immediately try to demote inodes to basic types. The fstree serialization is modified to do that itself if the index is 0xFFFFFFFF and the target is not a directory inode. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-11-22Get rid of the built-in copy of LZ4David Oberhollenzer
On Linux or BSD distributions we have a native version installed via package manager. On Windows, we can just build it from source like the other libraries. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-11-21Make some string functions from libcompat available to libsquashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-11-18libsqfs: Fix an overzealous bounds check in the block processorDavid Oberhollenzer
When (during fragment deduplication) a fragment block is read back from disk and unpacked, it can happen that it is _exactly_ the given block size. The bounds check did '>=' instead of '>' and failed in that case with a "data corruption" error. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-11-18libsqfs: Initialize the return value in sqfs_compressor_createDavid Oberhollenzer
Initialize the output compressor pointer to NULL, so if the function fails, the value is propperly initialized to a NULL pointer instead of relying on the function user to initialize it. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-10-10block writer: further cleanup of the block writer logicDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-09-20block writer: move block comaprison to utility functionDavid Oberhollenzer
Slightly modify the byte-for-byte comparison function to compare an arbitrary range in a file and move it to libutil. Instead of calling it for each block in the block writer, simply let it check an entire range in the block writer and compute the range position/size of the reference ahead, before looking for potential matches. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-09-20block writer: remove open coded arrayDavid Oberhollenzer
Instead of open coding it, use the array_t type from libutil. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-07-08Make sqfs_tree_node_get_path more robustDavid Oberhollenzer
Test against various invariants: - Every non-root node must have a name - The root node muts not have a name - The name must not be ".." or "." - The name must not contain '/' - The loop that chases parent pointers must terminate, i.e. we must never reach the starting state again (link loop). Furthermore, make sure the sum of all path components plus separators does not overflow. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-07-08Move sqfs_tree_node_get_path to libsquashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-07-08Cleanup: move libutil headers to sub directoryDavid Oberhollenzer
Move all the libutil stuff from the toplevel include/ to a util/ sub directory and fix up the includes that make use of them. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-06-02Cleanup: libsqfs: simplify state handling in dir readerDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-06-02Cleanup: libsqfs: sqfs_dir_reader_find_by_pathDavid Oberhollenzer
Split out several repated patterns into helper functions and move the rest of the code back into dir_reader.c Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-06-02Cleanup: libsqfs: merge dir cache code back into dir_reader.cDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-06-02Cleanup: libsqfs: move directory iteration out of the directory readerDavid Oberhollenzer
Add a simple directory state object to the meta data reader and use that to iterate directory entries. The code for reading the directory listing is movde to readdir.c Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-06-01Fix: libsqfs: do not report out of bounds positions from meta readerDavid Oberhollenzer
When asking the meta data reader for its current position and we *just* read to the end of a block, report the start of the next block as the current location. Otherwise, trying to *seek* to the resulting position immediately after reporting throws an out-of-bounds error. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-04-10Remove builtin copy of zlibDavid Oberhollenzer
On GNU/Linux, *BSD or MacOS we can simply use the system default library. The copy was primarily only there for the Windows build. The build script for Windows has now been adapted to download and compile a shared library from a tarball. This removes a huge chunk of code from the git tree as well as the release tarballs. Additionally it gets rid of iffy things like removing the Zlib copyright/version strings, so the libsquashfs DLL doesn't export it. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-04-09Add support for '.' and '..' entries in sqfs_dir_reader_tDavid Oberhollenzer
Two flags are added to the dir reader API, one for the create function that the dir reader should report those entries and one to the open function to suppress that if it was enabled. To implement the feature, a mapping of visited directory inodes is maintained internally, that mapps inode numbers to inode references. When opening a directory, state is maintained to generate the fake entries for '.' and '..'. Since all the other functions are based on the open/read/rewind API, no alterations need to be made. The tree scan function is modified, to use the suppress flag, so it does not accidentally catch those entries. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-04-05libsqfs: move dir reader code to sub directory, add internal headerDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2022-03-30sqfs_dir_tree_destroy/sqfs_destroy: allow NULL inputLuca Boccassi
Many library destructor functions (like free()) allow a NULL pointer as input, and do nothing in that case. This allows easier cleanup patterns: initialize pointers to NULL and then always pass them to the destroyer functions, no need for verbose goto/if-else patterns. Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>
2022-03-10Fix: guard against potential overflow in file size calculationDavid Oberhollenzer
The block_count is a size_t, so on 32 bit platforms the multiplication might be truncated before the comparison with filesz. On 64 bit platforms, it could potentially also overflow the 64 bit bounds of the data type. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-12-05Fix: consistently use the widechar file API on WindowsDavid Oberhollenzer
When opening files on windows, use the widechar versions and convert from (assumed) UTF-8 to UTF-16 as needed. Since the broken, code-page-random API may acutall be intended in some use cases, leave that option in through an additional flag. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-08-22Tighten bounds checks in sqfs_dir_reader_readerDavid Oberhollenzer
Use the same size check as sqfs_dir_reader_open_dir and report EOF, even if it is possible to read the header itself, but nothing beyond that. Also check if it should be possible to read an entry header before attempting and report EOF if not. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-08-22Fix half done initialization of sqfs_dir_reader_open_dirDavid Oberhollenzer
The sqfs_dir_reader_open_dir function tried to take a short-cut by returning early if the target directory is empty. However, this left some field unchanged from the previous directory. If iterating over a directory and then deciding to enter a sub-directory that happens to be empty, the directory reader will keep the settings for the current directory. After calling sqfs_dir_reader_rewind, the sub-directory will suddenly report the contents of the parent. A similar check is added to the rewind function to not track back on the meta data reader in that case. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-07-21Fix libsquashfs directory writer size accountingDavid Oberhollenzer
The squashfs readdir() implementation in the Linux kernel returns non-existing "." and ".." entries for offsets 0 and 1, and after that reads from disk. For convenience, it was decided to store an off-by-3 value on disk instead of doing complex primary school math to adjust for this. This didn't show up until now, because the kernel implementation trusts the value from the directory header more than the actual size in the inode and happily reads 3 more than the inode would allow it to. This only showed up with 7-zip which subtracts 3 from the size and expects the result to be exact and bails if the directory headers suggest otherwise. And yes, I did consider making a "Holy Hand Granade of Antioch" reference, but consciously decided not to. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-06-25Add default cases for every switch blockDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-06-25Remove casual un-const casting in various placesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-06-25libsquashfs: get rid of potentially unaligned access and VLAsDavid Oberhollenzer
The same problem with the meta data header again, 16 bit read from a buffer: copy the buffer data into a 16 bit variable instead of casting to something potentially unaligned. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-06-07libsquashfs: fix: also preserve alignment flag in block processorDavid Oberhollenzer
Currently, when the block processor aggreagtes fragments into a fragment block, it applies the "don't compress" flag if any of the original framgnets has it set, but the "align to device block" flag is lost. This commit ensures that both flags get applied to the fragment block if set. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-06-07libsquashfs: fix block alignment if requestedDavid Oberhollenzer
1) If the block alignment flag is set, the padding bytes must be inserted _before_ recording the start position, otherwise the resulting image is not readable. 2) Also perform alignment if the flag is set on a fragment block. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-04-08Fix: libsquashfs: add sqfs_free() functionDavid Oberhollenzer
On systems like Windows, the dynamic library and applications can easily end up being linked against different runtime libraries, so applications cannot be expected to be able to free() any malloc'd pointer that the library returns. This commit adds an sqfs_free function so the application can pass pointers back to the library to call the correct free() implementation. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-30libsqfs: block processor: Fix account for manually submitted blocksDavid Oberhollenzer
This was already in the original block processor but got dropped by accident when restructuring it. The problem manifests itself when manually submitting fragment blocks. They no longer get correct I/O queue tickets, clog up the queue and the processor eventually throws an internal error. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-25Fix fail branch in block processor fragment backendDavid Oberhollenzer
Only clean up the fragment if it hasn't been re-assigned to the fragment block. The NULL check is definitely wrong, because we no longer re-assign it as NULL. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-24Fix block processor queue accountingDavid Oberhollenzer
Dequeuing won't work if we have a backlog of 1 or 2 and the blocks are used for internal buffering. Take that into account, similar to the sync code. Also bump the minimum backlog to 3, just to make absolutely sure we cannot run into a dequeue loop trying to allocate a block. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-23Fix windows build of the thread pool in libsquashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-23block processor: Re-implement exact fragment matchingDavid Oberhollenzer
In the hash-table equals callback, if the hash and size match, do an exact, byte-for-byte comparison of the fragment in question. The fragment can either be in a fragment block that is in-flight (for which we have the in-flight list), in the current, unfinished fragment block, or it can be on disk. In the later case, the fragment block is resolved through the fragment table and read back from disk into a scratch buffer and decompressed. After that, the fragment is checked for byte-for-byte equality with the one we resolved through the hash table. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-23block processor: keep duplicate copies of in-flight fragment blocksDavid Oberhollenzer
If we want full, byte-for byte, verification of fragments during de-duplication we need to check back with the blocks already written to disk, or with the ones that are in flight. The previous, extremely hacky approach simply locked up the thread pool and investigated the queues. For the new approach, we treat the thread pool as completely opaque and don't try to touch it. This commit modifies the block processor to keep duplicate copies of each submitted fragment block around, that are cleaned up once the block is dequeued and written to disk. So instead of touching the thread pool, we can simply investigate the in-fligth-block list and the current block, before resorting to reading back fragment blocks from the file. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-22block processor: simplify backlog accountingDavid Oberhollenzer
Simply count the number of blocks we hand out (malloc'ed or recycled) and decrease the counter when we put blocks back for recycling. The sync() part becomes a little more complicated, because we can get stuck with a backlog of 1 or 2 because we have a fragment or current block buffer in use. We also need to accout for this when creating the processor, because we need to be able to request at least 2 blocks without stalling. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-22Cleanup the block processor file structureDavid Oberhollenzer
A cleaner separation between common code, frontend code and backend code is made. The "is this byte blob zero" function is moved out to libutil (with test case and everything) with a more optimized implementation. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-21Fix missing error code initializationDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-21Cleanup: Rewrite block processor to use the libutil thread_pool_tDavid Oberhollenzer
Throw out the messy thread pool implementation and temporarily also remove the exact fragment matching for simplicity. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-07Optionally use a pool allocator for rb-tree nodesDavid Oberhollenzer
This commit restructures the rbtree code to optionally use a pool allocator for the nodes. The option is made depenend on the presence of a pre-processor flag. To the configure script is added an option to enable/disable the use of custom allocators. It makes sense to still allow the malloc/free based routes for better ASAN based instrumentation. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-07Rewrite the str_table to internally use the more opimized hash_tableDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-06Fix: meta reader behaviour if accessing block at location 0David Oberhollenzer
Technically, this should *never* **ever** happen, because a SquashFS file always starts with a super block, which isn't wrapped in a meta data block, so a valid SquashFS file will never have a reason to read from offset 0. However, this does bite us when doing unit tests where the meta reader and writer are used on an otherwise empty file. When trying to read from offset 0, the caching code assumes that we already have that block, since tha block_offset got initialized to 0. This commit changes the initialization to set the current block location to the maximum 64 bit integer, a location we are never going to read from, since it will always be after the limit. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-06Cleanup: replace ad-hoc dynamic array in sqfs_xattr_writer_tDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-06Cleanup: repalce ad-hoc dynamic array used for export tableDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-06Cleanup: replace ad-hoc dynamic array in sqfs_id_table_tDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-06Cleanup: replace ad-hoc dynamic array in sqfs_frag_table_tDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2021-03-06Store xattr writer block description in a red-black treeDavid Oberhollenzer
By storing the blocks in a tree, the de-duplication can lookup existing blocks in logartihmic instead of linear time. The linked list is still maintained, because we need to iterate over the blocks in creation order during serialization. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>