summaryrefslogtreecommitdiff
path: root/lib/sqfs
AgeCommit message (Collapse)Author
2019-08-18cleanup: internalize deduplication list in data_writerDavid Oberhollenzer
This change removes the need for passing a list of files around for deduplication. Also the deduplication code no longer needs to worry about order, since the file being deduplicated is only added after deduplication is done. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-16Fix: don't try to read xattrs if there are noneDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-07Add pread(2) like function to data_readerDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-07Fix forward seek when unpacking sparse filesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-07Fix zero padding of extracted data blocksDavid Oberhollenzer
Only padd it if the *extracted* size is less then block size. Doing it with the compressed size results in garbled blocks. Especially because most of them are less than block size when compressed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-05cleanup data readerDavid Oberhollenzer
- Split block reading code out from "dump_blocks" into precache_data_block, similar to precache_fragment_block - Merge the code paths for fragment/data block reading and uncompression Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-05cleanup: unify all the code that reads squashfs imagesDavid Oberhollenzer
This commit creates a new data structure called 'sqfs_reader_t' that takes care of all the repetetive tasks like opening the file, reading the super block, creating the compressor, deserializing an fstree and creating a data reader. This in turn makes it possible to remove all the duplicate code from rdsquashfs and sqfs2tar. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02Fix explicit NULL dereference in deserialize_fstree failure pathDavid Oberhollenzer
If we failed to create the root node, we don't need to cleanup the fstree_t which would attempt to recursively cleanup the root node. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02cleanup: merge error paths in xattr reader restore_kv_pairsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-02Fix potential double free of xattr reader id_block_startsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Add option to restore xattrs to deserialize_fstreeDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Add xattr reader implementation to recover xattrs from squashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Fix xattr writer size accountingDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Fix super block flags: clear "no xattr" flag when writing xattrsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Fix xattr OOL positionDavid Oberhollenzer
We need to get the position _before_ writing the header, otherwise the reader has no way to know the length of the value. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-30Add propper copyright headers to all source filesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-29Fix order of data block deduplicationDavid Oberhollenzer
Data blocks need to be deduplicated before attempting to write a fragment. In the current attempt if the data blocks are found to be duplicates but the fragment isn't, the flushed fragments are purged as well, possibly damaging other files. Also, when the deduplication happens, the HAS_FRAGMENT flag needs to be set, otherwise the deduplication code thinks that there is one more block than there actually is. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-29Cleanup: move deduplication code from data writer to fstreeDavid Oberhollenzer
Since it is actually completely independend of libsqfs and only works on file_info_t lists, it can be safely moved over to libfstree and the data writer becomes less cluttered as a result. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix duplicate file accountingDavid Oberhollenzer
A file is a complete duplicate if: - It has no blocks, only a single fragment and that is a duplicate - It has blocks but no fragment and the blocks are duplicate - It has blocks and a fragment and both are duplicate The previous version only counted the last one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix used bytes accounting when deduplicating file blocksDavid Oberhollenzer
If an entire file is eliminated, we need to reset the "used_bytes" counter, otherwise, ALL the table positions are way off. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix free() of stack pointer in id_table_read error pathDavid Oberhollenzer
We didn't allocate the ID table, so we don't need to free() it when reading from disk fails. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Fix: return the correct value from data_reader_createDavid Oberhollenzer
Cut & paste misshap after mergining with fragment reader: If there are no fragments, data_reader_create should return the data reader, not 0! Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Add some nice statistics output to tar2sqfs and gensquashfsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Add general purpose flags field to file_info_tDavid Oberhollenzer
Simplifies some task if we can just add a flag that a file has a framgent or that it has already been detected as a duplicate. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Implement data block deduplicationDavid Oberhollenzer
The strategy is as follows: - At the beginning of every file, remember the current position - Once a file is done scan the list of existing files for the following: - Look for an existing file that has a block with the same size and checksum as the first non-sparse block of the current file - After that, every block in the current file has to match in size and checksum the ones in the file that we found, from that point onward - sparse blocks in either file are skipped - If we found a match, we update the current file to point to the first matching block and rewind the squashfs image to remove the newly written data This strategy should in theory be able to find an existing file where the on-disk data *contains* the on-disk data of the current file. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Implement fragment deduplication in data writerDavid Oberhollenzer
The strategy is simple: - The data writer function that write data/fragment blocks get access to the list files. - When writing a fragment, we look for an already written file that has a fragment with the same size and checksum. - If we find one, we throw away the fragment and reuse the existing one. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Unify common file start/end code from data writer in helper functionsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Compute per-block and per-fragment checksums in data wrtierDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Add fragment and block checksum fields to file_info_tDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Merge remaining code of fragment reader into data readerDavid Oberhollenzer
After the table read unification, there wasn't much left of the fragment reader and the remains could easily be moved over to the data reader. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-28Split data_reader_dump_file into smaller functionsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Fix potential resource leak in deserialize_treeDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Fix checks of super block block sizeDavid Oberhollenzer
Make sure range is checked when reading a block and that the check is made correctly. Also make the block log check a little more strict. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Fix acciedental usage of left over local variable instead struct memberDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Add generic read_table function similar to write_tableDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Cleanup sqfs_write_tableDavid Oberhollenzer
This commit attempts to make the generic table writer more readable. A few changes are made, including heap allocation of the block list. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Rename table.c to write_table.c in accordance to function it containsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Fix fragment reader out of bounds read when loading tableDavid Oberhollenzer
This commit fixes a bug in the fragment table reader where the reader tries to read data into an out of bounds location due to an oversight in size calculation. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Replace reads in squashfs with positional readsDavid Oberhollenzer
In most cases, we know exactely where the data that we want to read is on disk, so instead of using read() on the squashfs (or lseek + read), the code can in many places be cleaned up to use the pread wrapper read_data_at instead. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24cleanup: remove atime/ctime processing codeDavid Oberhollenzer
This commit removes all the code for parsing and processing atime/ctime and values and related test code. Caring about those is kind of pointless because squashfs can only store mtime in inodes. The only relevant place is when generating a struct stat from a squashfs inode or an fstree node. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24Enable largefile supportMatt Turner
Requires that config.h be included before other headers, since the macro _FILE_OFFSET_BITS changes the definitions of things like 'struct stat'. I chose to simply include it at the top of every C file and at immediately after the double-inclusion guards of every header. Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-23Fix tree node scanningDavid Oberhollenzer
- Bail early on empty directories without touching the meta readers. - Aport the directory read loop if we can't even read a header anymore, no matter if there are bytes remaining. - Also add that same condition to the inner loop. The later two actually caused a numeric overflow on some particularly malformed squashfs images, going into a RAM filling infinite loop. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-22Add a way to optionally keep the original time stampsDavid Oberhollenzer
First of all, this commit adds a mod_time field to a tree node. When creating the tree node, the field is set from the struct stat. When scanning a directory, the time stamps from the input are used if set. Second, the libsqfs code that reads inodes is modified to store the mod_time from the inode in the fstree node and to write the tree node into a generated inode. Finally, tar2sqfs is modified to optionally keep the timestamps from the tar archive instead of setting defaults. gensquashfs is similarly modified to keep the input timestamps if specified. The result is as follows: - sqfs2tar will always carry the timestamps from the squashfs over to the tar ball. - tar2sqfs will set defaults, unless explicitly asked to preserve the mtime from the tar ball. - gensquashfs can optionally preserve the mtime from the input hierarchy it processes if only --pack-dir is specified. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21Fix indexing into export tableDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21Implement generating an inode table for NFS exportDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21Add support for storing xattr values out-of-lineDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21Cleanup xattr handlingDavid Oberhollenzer
- Store them in a struct instead of a hacky uint64_t with magic shifts - Split up key/value pair write function to write_key and write_value - Move the size accounting into those functions respectively Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-21cleanup: remove left over, unused assignmentDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-20Make raw fragment table accessible through fragment/data readersDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16cleanup: move error handling into read_retryDavid Oberhollenzer
If read_retry fails to read the expected amount of data (EOF or otherwise), it is almost always an error. This commit renames read_retry to read_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>