squashfs-tools-ng.git - A new set of tools and libraries for working with SquashFS images

Age	Commit message (Collapse)	Author
2019-07-28	Fix used bytes accounting when deduplicating file blocks	David Oberhollenzer
	If an entire file is eliminated, we need to reset the "used_bytes" counter, otherwise, ALL the table positions are way off. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Fix free() of stack pointer in id_table_read error path	David Oberhollenzer
	We didn't allocate the ID table, so we don't need to free() it when reading from disk fails. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Fix: return the correct value from data_reader_create	David Oberhollenzer
	Cut & paste misshap after mergining with fragment reader: If there are no fragments, data_reader_create should return the data reader, not 0! Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Add some nice statistics output to tar2sqfs and gensquashfs	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Add general purpose flags field to file_info_t	David Oberhollenzer
	Simplifies some task if we can just add a flag that a file has a framgent or that it has already been detected as a duplicate. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Implement data block deduplication	David Oberhollenzer
	The strategy is as follows: - At the beginning of every file, remember the current position - Once a file is done scan the list of existing files for the following: - Look for an existing file that has a block with the same size and checksum as the first non-sparse block of the current file - After that, every block in the current file has to match in size and checksum the ones in the file that we found, from that point onward - sparse blocks in either file are skipped - If we found a match, we update the current file to point to the first matching block and rewind the squashfs image to remove the newly written data This strategy should in theory be able to find an existing file where the on-disk data contains the on-disk data of the current file. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Implement fragment deduplication in data writer	David Oberhollenzer
	The strategy is simple: - The data writer function that write data/fragment blocks get access to the list files. - When writing a fragment, we look for an already written file that has a fragment with the same size and checksum. - If we find one, we throw away the fragment and reuse the existing one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Unify common file start/end code from data writer in helper functions	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Compute per-block and per-fragment checksums in data wrtier	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Add fragment and block checksum fields to file_info_t	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Add utility function to compute crc32 check sums	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Merge remaining code of fragment reader into data reader	David Oberhollenzer
	After the table read unification, there wasn't much left of the fragment reader and the remains could easily be moved over to the data reader. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Split data_reader_dump_file into smaller functions	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Use safer string copy function to fill tar header	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Make sure symlink in fstree_mknode is always set when creating a symlink	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Enforce reasonable upper and low bounds on the size of tar headers	David Oberhollenzer
	Since they are read directly into memory, blindly allocating the size from the tar ball is probably a bad idea. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Fix potential resource leak in deserialize_tree	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Fix checks of super block block size	David Oberhollenzer
	Make sure range is checked when reading a block and that the check is made correctly. Also make the block log check a little more strict. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Fix acciedental usage of left over local variable instead struct member	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Generate linear file list in fstree	David Oberhollenzer
	Instead of doing DFS on the fly in gensquashfs, churn out a linked list of all files in an archive. Future improvements in packing strategies can go into this file. This can also be usefull for other purposes in the future, such as file deduplication or as a work queue for the unpacker. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Add generic read_table function similar to write_table	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Cleanup sqfs_write_table	David Oberhollenzer
	This commit attempts to make the generic table writer more readable. A few changes are made, including heap allocation of the block list. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Rename table.c to write_table.c in accordance to function it contains	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Fix fragment reader out of bounds read when loading table	David Oberhollenzer
	This commit fixes a bug in the fragment table reader where the reader tries to read data into an out of bounds location due to an oversight in size calculation. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Replace reads in squashfs with positional reads	David Oberhollenzer
	In most cases, we know exactely where the data that we want to read is on disk, so instead of using read() on the squashfs (or lseek + read), the code can in many places be cleaned up to use the pread wrapper read_data_at instead. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	libutil: add read_data style wrapper around pread()	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24	libfstree: fix signed/unsigned comparisons	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24	Fix processing of tar mtime on 32 bit systems	David Oberhollenzer
	struct stat uses time_t to store time values. On some 32 bit systems, this may be a 32 bit integer. This patch adds a broken-out 64 bit time value to tar_header_decoded_t and makes sure to clamp the value to +/- (2^32 - 1) if required when writing it back to a struct stat. Reported-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24	cleanup: remove atime/ctime processing code	David Oberhollenzer
	This commit removes all the code for parsing and processing atime/ctime and values and related test code. Caring about those is kind of pointless because squashfs can only store mtime in inodes. The only relevant place is when generating a struct stat from a squashfs inode or an fstree node. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24	Include the reason ZSTD gave us for the error	Matt Turner
	Without it you're left guessing or using a debugger to figure out what's wrong. Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24	Enable largefile support	Matt Turner
	Requires that config.h be included before other headers, since the macro _FILE_OFFSET_BITS changes the definitions of things like 'struct stat'. I chose to simply include it at the top of every C file and at immediately after the double-inclusion guards of every header. Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24	libtar: more lenient tar checksum verification	David Oberhollenzer
	Until now, the tar checksum verification simply copied the header, recomputed the checksum and compared it byte-for-byte with the original. However, not all implementations store the checksum the same way. For instance, git-archive generates tar balls that use the same format as for other octal numbers. This patch makes the checksum verification more lenient by parsing the checksum from the header and comparing it with the computed value instead of copying the entire block and insisting on byte-for-byte equivalence. The result is better interoperabillity with existing tools and perhaps slightly faster processing since the block doesn't have to be copied. Reported-by: Matt Turner <mattst88@gmail.com> Suggested-by: René Scharfe <l.s.r@web.de> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-23	Fix tree node scanning	David Oberhollenzer
	- Bail early on empty directories without touching the meta readers. - Aport the directory read loop if we can't even read a header anymore, no matter if there are bytes remaining. - Also add that same condition to the inner loop. The later two actually caused a numeric overflow on some particularly malformed squashfs images, going into a RAM filling infinite loop. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-22	Add a way to optionally keep the original time stamps	David Oberhollenzer
	First of all, this commit adds a mod_time field to a tree node. When creating the tree node, the field is set from the struct stat. When scanning a directory, the time stamps from the input are used if set. Second, the libsqfs code that reads inodes is modified to store the mod_time from the inode in the fstree node and to write the tree node into a generated inode. Finally, tar2sqfs is modified to optionally keep the timestamps from the tar archive instead of setting defaults. gensquashfs is similarly modified to keep the input timestamps if specified. The result is as follows: - sqfs2tar will always carry the timestamps from the squashfs over to the tar ball. - tar2sqfs will set defaults, unless explicitly asked to preserve the mtime from the tar ball. - gensquashfs can optionally preserve the mtime from the input hierarchy it processes if only --pack-dir is specified. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	Fix indexing into export table	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	Implement generating an inode table for NFS export	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	Add support for storing xattr values out-of-line	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	Cleanup xattr handling	David Oberhollenzer
	- Store them in a struct instead of a hacky uint64_t with magic shifts - Split up key/value pair write function to write_key and write_value - Move the size accounting into those functions respectively Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	cleanup: remove left over, unused assignment	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	Keep track of xattr key & value references AFTER deduplication	David Oberhollenzer
	This commit adds a reference count functionality to the string table implementation and uses this functionality in the fstree code to count how often each key and value is referenced by the deduplicated Xattr blocks. This is needed to support deduplication through out-of-band storage of xattrs. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-20	Make raw fragment table accessible through fragment/data readers	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-19	libutil.a: make sure str_table_init initializes all fields	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-17	fstree: add support for spaces in filenames	David Oberhollenzer
	This commit adds a mechanism to fstree_from_file to support filenames with spaces in them by quoting the entire string. Quote marks can still be used inside file names by escaping them with a backslash. Back slashes (if that is your thing) can also be escaped. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16	cleanup: move error handling into read_retry	David Oberhollenzer
	If read_retry fails to read the expected amount of data (EOF or otherwise), it is almost always an error. This commit renames read_retry to read_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16	cleanup: move error handling into write_retry	David Oberhollenzer
	If write_retry fails to write everything, it is always an error. This commit renames write_retry to write_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16	Fix directory index creation	David Oberhollenzer
	Digging around in kernel internals and mksquashfs reveals that it is actually a buffer offset into the raw directory buffer. The error hasn't been noted until now because of the bug fixed by a5428e0. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-15	Add flags to data writer to micro manage behaviour	David Oberhollenzer
	The added flags allow controlling the following on a per file level: - forcing a file to be written uncompressed - forcing a file to not have a fragment, i.e. the last truncated block actually being written as a block - padding a file to be alligned to device block size The flags are not yet exposed to anything user controllable (such as command line flags). Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-12	Add generic support for reading files without fragments	David Oberhollenzer
	This commit extends the special case handling for sparse files to generically support reading files that don't have a fragment but instead have a trunkated final block. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-12	Simplify writer code for files without a fragment	David Oberhollenzer
	The data writer sparse block code can take advantage that it can add a block size instead of a fragment and doesn't have to initialize the framgent location. In return, the tree node to inode serialization code doesn't need a special case for sparse file anymore and can now also handle files that are forced to not have a fragment. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-12	fstree: mknode: initialize fragment data, add extra blocksize slot	David Oberhollenzer
	This commit makes sure that regular file tree nodes have one more slot than necessary to support files that don't have fragments. Also, it initializes the fragment data, which should help to deduplicate some code ahead. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>