squashfs-tools-ng.git - A new set of tools and libraries for working with SquashFS images

Age	Commit message (Collapse)	Author
2019-08-05	cleanup: unify all the code that reads squashfs images	David Oberhollenzer
	This commit creates a new data structure called 'sqfs_reader_t' that takes care of all the repetetive tasks like opening the file, reading the super block, creating the compressor, deserializing an fstree and creating a data reader. This in turn makes it possible to remove all the duplicate code from rdsquashfs and sqfs2tar. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-04	Improve file unpacking order	David Oberhollenzer
	This commit moves the file unpacking order & job scheduling to a libfstree function. The ordering is improved by making sure fragment blocks are not extracted more than once and files with data blocks are extracted in order. This way, serial unpacking of a 2GiB Debian live image could be reduced from ~5' on my test machine to ~3.5', whereas parallel unpacking stays roughly the same (~3' for -j 4). Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02	Implement support for SOURCE_DATE_EPOCH environment variable	David Oberhollenzer
	reproducible-builds.org suggests the use of an environment variable as a source for time stamps: https://reproducible-builds.org/specs/source-date-epoch/ This commit adds support for setting the default mtime from the variable, if it is set and only defaulting to 0 if not. The timestamp given by the command line switch takes precedence. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01	Add ability to write_tar_header to embedd extended attributes	David Oberhollenzer
	This commit patches the tar writer to generate a PAX header with SCHILY xattr key/value pairs if requested. The Schily format is used for two reasons: - It is simple - It is apparently more widely supported than the libarchive format Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01	Add option to restore xattrs to deserialize_fstree	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01	Add xattr reader implementation to recover xattrs from squashfs	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-30	Add propper copyright headers to all source files	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-29	Cleanup: move deduplication code from data writer to fstree	David Oberhollenzer
	Since it is actually completely independend of libsqfs and only works on file_info_t lists, it can be safely moved over to libfstree and the data writer becomes less cluttered as a result. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Add some nice statistics output to tar2sqfs and gensquashfs	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Add general purpose flags field to file_info_t	David Oberhollenzer
	Simplifies some task if we can just add a flag that a file has a framgent or that it has already been detected as a duplicate. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Implement fragment deduplication in data writer	David Oberhollenzer
	The strategy is simple: - The data writer function that write data/fragment blocks get access to the list files. - When writing a fragment, we look for an already written file that has a fragment with the same size and checksum. - If we find one, we throw away the fragment and reuse the existing one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Add fragment and block checksum fields to file_info_t	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Add utility function to compute crc32 check sums	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28	Merge remaining code of fragment reader into data reader	David Oberhollenzer
	After the table read unification, there wasn't much left of the fragment reader and the remains could easily be moved over to the data reader. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Generate linear file list in fstree	David Oberhollenzer
	Instead of doing DFS on the fly in gensquashfs, churn out a linked list of all files in an archive. Future improvements in packing strategies can go into this file. This can also be usefull for other purposes in the future, such as file deduplication or as a work queue for the unpacker. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Add generic read_table function similar to write_table	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	Cleanup sqfs_write_table	David Oberhollenzer
	This commit attempts to make the generic table writer more readable. A few changes are made, including heap allocation of the block list. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25	libutil: add read_data style wrapper around pread()	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24	Fix processing of tar mtime on 32 bit systems	David Oberhollenzer
	struct stat uses time_t to store time values. On some 32 bit systems, this may be a 32 bit integer. This patch adds a broken-out 64 bit time value to tar_header_decoded_t and makes sure to clamp the value to +/- (2^32 - 1) if required when writing it back to a struct stat. Reported-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24	Enable largefile support	Matt Turner
	Requires that config.h be included before other headers, since the macro _FILE_OFFSET_BITS changes the definitions of things like 'struct stat'. I chose to simply include it at the top of every C file and at immediately after the double-inclusion guards of every header. Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-22	Add a way to optionally keep the original time stamps	David Oberhollenzer
	First of all, this commit adds a mod_time field to a tree node. When creating the tree node, the field is set from the struct stat. When scanning a directory, the time stamps from the input are used if set. Second, the libsqfs code that reads inodes is modified to store the mod_time from the inode in the fstree node and to write the tree node into a generated inode. Finally, tar2sqfs is modified to optionally keep the timestamps from the tar archive instead of setting defaults. gensquashfs is similarly modified to keep the input timestamps if specified. The result is as follows: - sqfs2tar will always carry the timestamps from the squashfs over to the tar ball. - tar2sqfs will set defaults, unless explicitly asked to preserve the mtime from the tar ball. - gensquashfs can optionally preserve the mtime from the input hierarchy it processes if only --pack-dir is specified. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	Implement generating an inode table for NFS export	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	Cleanup xattr handling	David Oberhollenzer
	- Store them in a struct instead of a hacky uint64_t with magic shifts - Split up key/value pair write function to write_key and write_value - Move the size accounting into those functions respectively Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21	Keep track of xattr key & value references AFTER deduplication	David Oberhollenzer
	This commit adds a reference count functionality to the string table implementation and uses this functionality in the fstree code to count how often each key and value is referenced by the deduplicated Xattr blocks. This is needed to support deduplication through out-of-band storage of xattrs. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-20	Make raw fragment table accessible through fragment/data readers	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16	cleanup: move error handling into read_retry	David Oberhollenzer
	If read_retry fails to read the expected amount of data (EOF or otherwise), it is almost always an error. This commit renames read_retry to read_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16	cleanup: move error handling into write_retry	David Oberhollenzer
	If write_retry fails to write everything, it is always an error. This commit renames write_retry to write_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16	Fix directory index creation	David Oberhollenzer
	Digging around in kernel internals and mksquashfs reveals that it is actually a buffer offset into the raw directory buffer. The error hasn't been noted until now because of the bug fixed by a5428e0. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-15	Add flags to data writer to micro manage behaviour	David Oberhollenzer
	The added flags allow controlling the following on a per file level: - forcing a file to be written uncompressed - forcing a file to not have a fragment, i.e. the last truncated block actually being written as a block - padding a file to be alligned to device block size The flags are not yet exposed to anything user controllable (such as command line flags). Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-10	Add a way to keep meta data blocks in memory	David Oberhollenzer
	Instead of writing meta data blocks directly to disk, the writer can now alternatively keep the blocks in memory until explicitly told to write to disk. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-07	Actually encode/decode directory inode difference as signed	David Oberhollenzer
	The directory listing stores a signed difference of the inode number. Actually treating it as signed saves emitting extra headers if hard links or file deduplication are finally implemented. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-04	Fix: simplify deduction logic for squashfs inode type	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-04	tar2sqfs: repack extended attributes into squashfs filesystem	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-04	libtar: add support for xattr extensions	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-03	cleanup: move tree node from path function to libfstree.a	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-03	tar writer: replace PAX headers with GNU extensions	David Oberhollenzer
	Some experiments seem to indicate that the various GNU extensions are more widely supported than their POSIX equivalents[1]. Possibly because they are easier to implement and possibly because of the wide spread use of GNU tar. This commit replaces the PAX writer in the write_tar_header implementation with a GNU extension based writer. The writer is also cleaned up by removing all global state. The record counter is moved outside into the tar2sqfs program and passed in as function argument. [1] https://dev.gentoo.org/~mgorny/articles/portability-of-tar-features.html Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-01	cleanup: split tar code up, remove some duplications	David Oberhollenzer
	This commit attempts to split some of the monolitic tar parsing code up into multiple functions in seperate files. Also, some code duplication (like reading a record into memory which was implemented twice) is removed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-30	libtar: clarify actual size vs on-disk record size	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-30	Add support for repacking condensed sparse files	David Oberhollenzer
	This commit broadly does the following things: - Rename and move the sparse mapping structure to libutil - Add a function to the data writer for writing condensed versions of sparse files, given the mapping. - This shares code with the already existing function for regular files. The shared code is moved to a common helper function. - Add support to tar2sqfs for repacking sparse files. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-30	tar reader: also store condensed size of sparse files	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-29	Add support for reading old style GNU sparse tar file format	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-28	Add support for unpacking sparse files as sparse files	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-28	Add support for packing sparse files	David Oberhollenzer
	This commit adds support for packing sparse files into squashfs images as follows: - In the data writer: simply detect zero blocks and write a zero to the block size field and don't emit any data. Record the number of bytes saved this way. For fragments, set the fragment offset to invalid. - In the inode writer: write out the number of bytes saved for sparse files. If there should be a fragment but there is none, append a block count of 0. - In the data reader: if the block size is 0, read nothing from disk and emit an empty block. Do the same if the fragment is missing. - In the inode reader: restore the number of bytes saved for sparse files. The sparse files can be packed and unpacked, but the unpacking will not create sparse files for now. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-28	Add basic support for the GNU tar format	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-27	Relax tar header parser to accept pre-posix formats	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-24	Add missing stdint.h header to tar.h	David Oberhollenzer
	Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-23	Move fstree default option processing to fstree code	David Oberhollenzer
	Instead of decomposing a default string in gensquashfs option processing, move that to fstree_init instead and pass the option string directly to fstree_init. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-23	Move all handling of compressor names to libcompress.a	David Oberhollenzer
	This commit removes handling of compressor names from gensquashfs. Instead, functions are added to libcompress to obtain name from ID, ID from name and to print out defaults. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-22	Cleanup: unify packdir/packfile based directory changes in gensquashfs	David Oberhollenzer
	This commit removes the packdir/packfile based directory setup magic from fstree_from_file and moves it to gensquashfs. Over there, the common parts are deduplicated. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-06-22	Cleanup: split fstree sort into 2 fstree independend functions	David Oberhollenzer
	Make tree node list sort and recursive variant available and independend of the fstree_t. This is considered cleaner, since the fstree_t actually isn't needed for any of this and we can just call the recusvie sort on the root instead, and we can use the sort implementation directly for things like the upcoming unit test. Also this commit splits up the merge/sort implementation into a seperate split and merge functions to make the code somewhat more readable. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>