| Age | Commit message (Collapse) | Author | 
|---|
|  | If the linked list pointer was already used before, break up the
connection so we don't risk running into a loop or something when
regenerating the list.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | - Fewer lock aquires in worker function
   - There is no point in locking/unlocking for inserting the completed
     block if we are going to lock again immediately in the next iteration
   -> Merge those two critical sections into one
 - Constant time queue insertion
 - Bypass queue entirely if there is nothing to do for a block
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | It is optimized to the maximum and if we already use zlib anyway,
why not use zlib crc32? This also makes zlib a hard dependency which
also means the whole "do we have a compressor" sanity check in the
build system can be removed.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | This change removes the need for passing a list of files around for
deduplication. Also the deduplication code no longer needs to worry
about order, since the file being deduplicated is only added after
deduplication is done.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | The interface is designed for parallel, asynchronuous processing of data
blocks with an I/O callback that handles the serialized result.
The underlying implementation is currently still synchronuous.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Only padd it if the *extracted* size is less then block size. Doing it
with the compressed size results in garbled blocks. Especially because
most of them are less than block size when compressed.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | - Split block reading code out from "dump_blocks" into precache_data_block,
   similar to precache_fragment_block
 - Merge the code paths for fragment/data block reading and uncompression
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | This commit creates a new data structure called 'sqfs_reader_t' that
takes care of all the repetetive tasks like opening the file, reading
the super block, creating the compressor, deserializing an fstree and
creating a data reader.
This in turn makes it possible to remove all the duplicate code from
rdsquashfs and sqfs2tar.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | This commit moves the file unpacking order & job scheduling to a libfstree
function. The ordering is improved by making sure fragment blocks are not
extracted more than once and files with data blocks are extracted in order.
This way, serial unpacking of a 2GiB Debian live image could be reduced
from ~5' on my test machine to ~3.5', whereas parallel unpacking stays
roughly the same (~3' for -j 4).
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | If -DNDEBUG is set, the entire thing is omitted from the output.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | If an extension header is rejected because its too big, the error path
would print the size as size_t, altough it is an uint64_t. On 64 bit
systems, this works because size_t is a 64 bit unsigned integer, on
32 bit systems, not so much.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | If we failed to create the root node, we don't need to cleanup the
fstree_t which would attempt to recursively cleanup the root node.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | reproducible-builds.org suggests the use of an environment variable
as a source for time stamps:
https://reproducible-builds.org/specs/source-date-epoch/
This commit adds support for setting the default mtime from the variable,
if it is set and only defaulting to 0 if not. The timestamp given by the
command line switch takes precedence.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | This commit patches the tar writer to generate a PAX header with SCHILY
xattr key/value pairs if requested.
The Schily format is used for two reasons:
 - It is simple
 - It is apparently more widely supported than the libarchive format
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | We need to get the position _before_ writing the header, otherwise the
reader has no way to know the length of the value.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | This commit updates the text issued by print_version() to reflect in
some way that the software contains contributions from co-authors.
The original text was based on the sterotypical --version output of
GNU coreutils programs. It may have to be rewritten eventually.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Data blocks need to be deduplicated before attempting to write a fragment.
In the current attempt if the data blocks are found to be duplicates but
the fragment isn't, the flushed fragments are purged as well, possibly
damaging other files.
Also, when the deduplication happens, the HAS_FRAGMENT flag needs to be
set, otherwise the deduplication code thinks that there is one more block
than there actually is.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Since it is actually completely independend of libsqfs and only works
on file_info_t lists, it can be safely moved over to libfstree and
the data writer becomes less cluttered as a result.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | For merging, the use of a pointer to a pointer can simplify linked list
operations
For sorting, find the half-way point of the list in a single iteration
over the list
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Despite having a flag for that now, they still need to be initialized
because they are  written straight to disk.
Fixes: d4d1854aaed867d28ebfc97afb3518254ab6fd4b
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | A file is a complete duplicate if:
 - It has no blocks, only a single fragment and that is a duplicate
 - It has blocks but no fragment and the blocks are duplicate
 - It has blocks and a fragment and both are duplicate
The previous version only counted the last one.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | If an entire file is eliminated, we need to reset the "used_bytes" counter,
otherwise, ALL the table positions are way off.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | We didn't allocate the ID table, so we don't need to free() it when
reading from disk fails.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Cut & paste misshap after mergining with fragment reader: If there are
no fragments, data_reader_create should return the data reader, not 0!
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | Simplifies some task if we can just add a flag that a file has a framgent
or that it has already been detected as a duplicate.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | The strategy is as follows:
 - At the beginning of every file, remember the current position
 - Once a file is done scan the list of existing files for the following:
   - Look for an existing file that has a block with the same size and
     checksum as the first non-sparse block of the current file
   - After that, every block in the current file has to match in size and
     checksum the ones in the file that we found, from that point onward
   - sparse blocks in either file are skipped
 - If we found a match, we update the current file to point to the first
   matching block and rewind the squashfs image to remove the newly written
   data
This strategy should in theory be able to find an existing file where the
on-disk data *contains* the on-disk data of the current file.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 
|  | The strategy is simple:
 - The data writer function that write data/fragment blocks get
   access to the list files.
 - When writing a fragment, we look for an already written file that has
   a fragment with the same size and checksum.
 - If we find one, we throw away the fragment and reuse the existing one.
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> |