Age | Commit message (Collapse) | Author |
|
A `flags` field and `priority` are added to all file information
structs. A news fstree function is introduced for parsing a "sort-file".
Each line in the file is space separated, and has the following format:
priority [flags] filename
Priority is a 64 bit number, flags are optional and filename can be put
in quotes if it is supposed to start or end with spaces. Single line
comments can be used.
The flags can be used to set block-processor flags (e.g. don't fragment,
or don't compress), as well as instructing the parser to use file
globbing to match the filename.
After parsing the file, the list of file info structure is sorted
according to the priority (default is 0) using a stable sort algorithm.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Always insert the tree nodes in the correct oder and remove the
post-process sorting step.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Instead of having a long if-else-if chain, replace the PAX header
field parsing with a table driven approach.
Altough it is more code, it is hopefully more readable, maintainable,
extensible and it dedupliates some of the value parsing code.
The GNU.sparse parsing is left as is, because it requires
maintaining state.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Split the key/value pairs right in the header and terminate the key
name. This way, some of the magic numbers can be eliminated.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Many library destructor functions (like free()) allow a NULL
pointer as input, and do nothing in that case.
This allows easier cleanup patterns: initialize pointers to NULL
and then always pass them to the destroyer functions, no need for
verbose goto/if-else patterns.
Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Abort and retry in situations that should logically _never_
_ever_ happen.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
The block_count is a size_t, so on 32 bit platforms the multiplication
might be truncated before the comparison with filesz.
On 64 bit platforms, it could potentially also overflow the 64 bit
bounds of the data type.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Preprocessor magic is used to redirect putc/fputc/fputs/printf/fprintf
to custom implementations.
The custom implementations try to figure out if we are printing to the
console and, if so, convert the resulting strings to UTF-16 and print
them through ConsoleWriteW. If the output is redirected to a file or
a pipe, the original (presummed) UTF-8 is kept.
Simply setting the console output codepage to UTF-8 does not work,
because the standard I/O facilities of MSVCRT either does not support
unicode (in non-wchar mode), or has half-broken support through fputs,
which can still break up multi-byte sequences through its internal
buffering.
Likewise, changing the codepage and using ConsoleWriteA, or trying to
use fputws did not work in a test VM either.
This approach is the one that worked most consistently among the
ones tried, but also has problems. E.g. it breaks when setting the
codepage to UTF-8 manually (using `chcp 65001`).
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
When piping the output of another program into tar2sqfs.exe, and
the source program terminates, tar2sqfs.exe gets an ERROR_BROKEN_PIPE
when the end is reached and it trys to pre-cache more data. This
commit adds a work around, to propperly handle this as and end-of-file
condition.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Apparently, mingw implicitly included stdlib.h indirectly from either
windows.h or shellapi.h. After an upgrade, the windows build now
fails with EXIT_FAILURE being undefined.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
When opening files on windows, use the widechar versions and convert
from (assumed) UTF-8 to UTF-16 as needed.
Since the broken, code-page-random API may acutall be intended in some
use cases, leave that option in through an additional flag.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
A macro and forward declaration are added to compat.h that rename
the main() function programs using compat.h into sqfs_tools_main.
An actual main() function is added to libcompat.a, that uses the
shell API to get the UTF-16 command line arguments, convert them
to UTF-8 and call sqfs_tools_main.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
The Windows port uses FlushFileBuffers in libfstream for the
implmentation of the file flush method. Unlike other winapi functions,
this function returns a boolean and not an error code.
Previously, the error code path was executed on success, printing a
rather confusing error message, that this file already exists.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Use the same size check as sqfs_dir_reader_open_dir and report EOF,
even if it is possible to read the header itself, but nothing beyond
that.
Also check if it should be possible to read an entry header before
attempting and report EOF if not.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
The sqfs_dir_reader_open_dir function tried to take a short-cut by
returning early if the target directory is empty. However, this left
some field unchanged from the previous directory.
If iterating over a directory and then deciding to enter a sub-directory
that happens to be empty, the directory reader will keep the settings
for the current directory. After calling sqfs_dir_reader_rewind, the
sub-directory will suddenly report the contents of the parent.
A similar check is added to the rewind function to not track back on
the meta data reader in that case.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
The squashfs readdir() implementation in the Linux kernel returns
non-existing "." and ".." entries for offsets 0 and 1, and after
that reads from disk. For convenience, it was decided to store an
off-by-3 value on disk instead of doing complex primary school math
to adjust for this. This didn't show up until now, because the kernel
implementation trusts the value from the directory header more than
the actual size in the inode and happily reads 3 more than the inode
would allow it to. This only showed up with 7-zip which subtracts 3
from the size and expects the result to be exact and bails if the
directory headers suggest otherwise.
And yes, I did consider making a "Holy Hand Granade of Antioch"
reference, but consciously decided not to.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
When processing files > 4G, using "%o" truncates the result and the
tarball is not readable. This should have been discovered when
auto-patching the printf format specifiers, but a cast was added
instead and the issue was overlooked.
This commit replaces the down-cast and printf format specifiers.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
- Store the return value of the page allocation directly into the
pool variable instead of an intermediate unsigned char pointer.
- Make the blob[] array the same type as the bitmap, this saves us
manual alignment trickery.
- Cleanup the pointer arithmetic, let the compiler do the
sizeof() multiplication.
- Use uintptr_t for the manual alignment of the data pointer, so we
don't run into signdness problems there.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
The same problem with the meta data header again, 16 bit read from
a buffer: copy the buffer data into a 16 bit variable instead of
casting to something potentially unaligned.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
When accessing the 16 bit header, don't cast the buffer pointer to an
uint16_t pointer, the result might not be aligned propperly. Instead
memcpy to and from an uint16_t.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
*in theory*, say on a 32 bit system, we could have a 32 bit size_t and
a 64 bit off_t. If the filesystem permitted this, we *could* then have
a symlink with a target > 4G. Or the target is exacetely 4G, but
adding a null-terminator could exceed addressable memory.
This commit adds a check to guard against such an overflow and throw
an error, instead of silently wrapping around.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
If the hard link counter or the inode number counter overflow the
maximum representable value (for SquashFS 16 bit and 32 bit
respecitively), abort with an error message.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
The differen compressor libraries use differnt integer types to tally
the buffer sizes. The libfstream library uses size_t, which may be
bigger than the actualy types, potentially causing an overflow if
trying to compress to much at once.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Currently, when the block processor aggreagtes fragments into a
fragment block, it applies the "don't compress" flag if any of the
original framgnets has it set, but the "align to device block" flag
is lost.
This commit ensures that both flags get applied to the fragment block
if set.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
1) If the block alignment flag is set, the padding bytes must be
inserted _before_ recording the start position, otherwise the
resulting image is not readable.
2) Also perform alignment if the flag is set on a fragment block.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This is a followup to dd4e6ead142e58568aec89d76b0b2e867ee983f2.
Basically the same problem occours with Bzip2, but it so far it wasn't
possible to find a sampel that reproduces it.
Unlike libxz, the libbz2 API does not support concatenated streams by
itself and will choke when trying to decompress after the stream end,
so this commit adds a workaround to simply initialize the decompressor
on-the-fly and tear it down again when and end-of-stream is returned.
The end-of-file condition is only set when there actually is no more
data to read. Otherwise, the decompressor will be re-initialized in
the next round.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Some xz compressed tarballs (e.g. from kernel.org) are not made up of
a single xz stream, but rather contain several, independendly
compressed streams. In that case, the xz decompressor hits
an LZMA_STREAM_END early on and reports EOF. If you are lucky, the tar
reader bails (premature end-of-file). If you are unlucky, it happens
exactely between two records and is interpeted as regular end-of-file.
As this seems to be a normal use case for xz, it has a flag to just
read across the seams and only report end-of-stream if the action
is set to finish.
This commit adds the flag to the initialization propperly sets the
lzma_action depending on whether the underlying stream hit EOF or not.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
On systems like Windows, the dynamic library and applications can
easily end up being linked against different runtime libraries, so
applications cannot be expected to be able to free() any malloc'd
pointer that the library returns.
This commit adds an sqfs_free function so the application can pass
pointers back to the library to call the correct free() implementation.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This indicates that sync isn't possible on the underlying file
descriptor (e.g. a pipe), which currently causes sqfs2tar to err if
the output isn't written directly to a file.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This was already in the original block processor but got dropped by
accident when restructuring it.
The problem manifests itself when manually submitting fragment blocks.
They no longer get correct I/O queue tickets, clog up the queue and
the processor eventually throws an internal error.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
If the path argument is "", we assume that referes to root and set
the *existing* target node to the root node and skip ahead across
the tree search. This leaves "name" uninitialized, which makes
coverity panic, because fs->root could be NULL, going down the wrong
path.
Obviously, this should never, *ever* happen and there is no reasonable
recovery strategy if it suddenly does, so simply add an assertion.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Only clean up the fragment if it hasn't been re-assigned to the
fragment block. The NULL check is definitely wrong, because we
no longer re-assign it as NULL.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
This allows putting globbed files & directories into the filesystem
root, as well as explicitly setting attributes of the root directory
from the file lisiting.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Dequeuing won't work if we have a backlog of 1 or 2 and the blocks
are used for internal buffering. Take that into account, similar to
the sync code. Also bump the minimum backlog to 3, just to make
absolutely sure we cannot run into a dequeue loop trying to allocate
a block.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
It's rather simplistic and doesn't account for junction/reparse
points, which is the closest thing Windows has to symlinks, hard
links and mount points, but it's consistent with the unpacking code
that assumes Windows only has files and directories.
Using the 32 bit mingw toolchain, this seems to satisfy the unit
tests on wine.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
In the hash-table equals callback, if the hash and size match, do an
exact, byte-for-byte comparison of the fragment in question. The
fragment can either be in a fragment block that is in-flight (for which
we have the in-flight list), in the current, unfinished fragment block,
or it can be on disk.
In the later case, the fragment block is resolved through the fragment
table and read back from disk into a scratch buffer and decompressed.
After that, the fragment is checked for byte-for-byte equality with
the one we resolved through the hash table.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
If we want full, byte-for byte, verification of fragments during
de-duplication we need to check back with the blocks already written
to disk, or with the ones that are in flight.
The previous, extremely hacky approach simply locked up the thread
pool and investigated the queues. For the new approach, we treat the
thread pool as completely opaque and don't try to touch it.
This commit modifies the block processor to keep duplicate copies of
each submitted fragment block around, that are cleaned up once the
block is dequeued and written to disk. So instead of touching the
thread pool, we can simply investigate the in-fligth-block list and
the current block, before resorting to reading back fragment blocks
from the file.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|
|
When we already hold the mutex, try to pre-emtively dequeue items into
a "safe queue". When actually asked to dequeue, take blocks from there
first and avoid having to enter the critical section if possible.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
|