aboutsummaryrefslogtreecommitdiff
path: root/lib/tar
AgeCommit message (Collapse)Author
2021-06-25Remove casual un-const casting in various placesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-09-16Remodel libtar/tar2sqfs to read data from an istream_tDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-09-16Remodel file extraction tools to use libfstreamDavid Oberhollenzer
This commit rewrites the libtar write paths to use libfstream insead of a FILE pointer. Also, the libcommon file extraction function is remodeled to use libfstream. In accordance, rdsquashfs, sqfs2tar and sqfsdiff have some minor adjustments made to work with the ported libtar and libcommon. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-09-03Fix integer bounds checking in GNU tar sparse format 1.0 parserDavid Oberhollenzer
- Make sure the file actually has that many records before trying to read one and fail if not. - Use the helper macros for size_t overflow checking instead of assuming size_t == uint64_t. - Impose a "reasonable" upper bound on the number of data segments and insist that there is at least one entry. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-09-02Fix nonexistant gnu tar sparse format 1.0 supportDavid Oberhollenzer
Contrary to previous claims, support for the GNU tar sparse format 1.0 was missing entirely (the newest of their 3 different sparse mapping formats). This oversight wasn't caught, because the unit test was compiling the wrong source file and tar2sqfs had no problem processing the test file because it is still a valid POSIX-ish tar archive (but the sparse part was missing and the mapping embedded in the file). Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-08-16Fix libtar treatment of link targets that fill the header fieldDavid Oberhollenzer
The tar header has a 100 byte field for symlink and hard link targets. If the target is longer than 100 bytes, an extension header has to be used. However, it is perfectly valid to fill all 100 bytes to the brim without adding a null terminator. In case of a symlink, this can result in garbage link targets, while for hard links it results in an immediate error since the target cannot be resolved later on. This commit attempts to fix the problem by replacing the strdup of the link target with an strndup that copies at most the size of the target header field. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-30Cleanup: sqfs2tar: break up and simplify the repacking codeDavid Oberhollenzer
- Move the xattr extraction and repacking to xattr.c - Don't on-the-fly delete the tar xattr list, use the function from libtar.a - Split minor tasks into static helper functions - creating a libtar xattr struct from libsqfs xattr data - finding a hard link entry from current path and inode number Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-05-18libtar: fix size computation of PAX line lengthDavid Oberhollenzer
This commit attempts to fix the following two problems: - The number of digits computation returning an off-by-one result if the number is 10, or the resulting digit string starts with "10". This results in one-too-many padding bytes, corrupting the rest of the archive since the headers now don't start at multiples of 512 anymore. - Adding the line length prefix affects the line length (duh). If it grows far enough to require more digits, the result is a similar problem. This is a converging series that we need to compute the limit of. Unit tests for this still need to be added. Or maybe I can convince a bored undergrad student to provide an induction proof. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-22Skip PAX global headersDavid Oberhollenzer
Tar archives can contain set two kinds of PAX headers: - local headers that modify the attributes of the next file - global headers that set defaults for all files The later is used "... not widely used", according to tar(5) and has been deliberately not implemented. Some programs (e.g. git-archive) *do* generate them (in the case of git, it stores the commit hash). This commit adds a code path that skips a PAX global header entirely and resumes tar parsing, instead of erroneusly reporting it as an entry. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-17Remove some configure time sizeof checksDavid Oberhollenzer
In libtar, the sizeof time_t checked when trying to store a time value. It is pointless using the preprocessor here, as we can simply do an if (sizeof(time_t) < ...) check and the compiler will take care optimizing away one or the other branch. After changing the libtar check and the corresponding unit tests, the sizeof check can be removed from configure.ac, along with other unused sizeof checks. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-04-17Cleanup: split read_header.c in libtar.aDavid Oberhollenzer
Simply moving the pax header decoding to a separate file and splitting out the common helper functions should be a good start. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2020-02-28Cleanup pax header parser a littleDavid Oberhollenzer
This commit tries to untangle the logic of parsing and sanitizing the pax header length field and the associated bounds checks. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-12-23Add libtar.a function to create hardlink recordsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-12-22Add hard link support to gensquashfs and tar2sqfsDavid Oberhollenzer
In libtar, set a special flag if the header is actually a hard link. In tar2sqfs, create a hard link node and skip the rest for hard links. Also refues to set the root attributes from a hard link, it may refere to a node that we have missed earlier, there is nothing else that we can do here. In fstree_from_file, add a "link" command for adding hard links. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-12-15Fix tar GNU sparse header parsingDavid Oberhollenzer
Extract the filename correctly. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-12-13Better support for reading/writing non-ASCII xattr values from/to tarDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-12-13Make the PAX header parser more strictDavid Oberhollenzer
Actually parse the length field and insist it matches the line length, then use the length field to advance to the next line. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-25Cleanup: remove what is left of libutilDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-23Move some unix header inclusions to compat.hDavid Oberhollenzer
In most cases, including unistd.h and fcntl.h was a left over anyway. In the cases where it was not, move it to compat.h. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-06Fix: use size_t printf macro in tar PAX header writerDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-06Remove raw file descriptors from tar read pathDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-11-06Remove raw file descriptors from unpack write pathsDavid Oberhollenzer
Instead, use stdio FILE pointers. On POSIX systems, use fileno to get the file descriptor and hopefully create sparase files. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: move libutil related headers to "util" sub directoryDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: move write_data to libtarDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: move read_data function to libtarDavid Oberhollenzer
Its the only user. The other code doesn't touch raw file descriptors anymore. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-10-07Cleanup: Move padd_file function to libtarDavid Oberhollenzer
It's only ever used for padding tarballs. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-27Add a header for platform compatibillity fluffDavid Oberhollenzer
- We don't have "endian.h" everywhere. On some BSDs its in sys and on some BSDs the macros have different names. - We definitely don't have sysmacros.h on non-Unix-like systems. - Likewise for sys/types.h, sys/stat.h and their contents. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-27Cleanup: replace fixed with data types with typedefsDavid Oberhollenzer
This is a fully automated search and replace, i.e. I ran this: git grep -l uint8_t | xargs sed -i 's/uint8_t/sqfs_u8/g' git grep -l uint16_t | xargs sed -i 's/uint16_t/sqfs_u16/g' git grep -l uint32_t | xargs sed -i 's/uint32_t/sqfs_u32/g' git grep -l uint64_t | xargs sed -i 's/uint64_t/sqfs_u64/g' git grep -l int8_t | xargs sed -i 's/int8_t/sqfs_s8/g' git grep -l int16_t | xargs sed -i 's/int16_t/sqfs_s16/g' git grep -l int32_t | xargs sed -i 's/int32_t/sqfs_s32/g' git grep -l int64_t | xargs sed -i 's/int64_t/sqfs_s64/g' and than added the appropriate definitions to sqfs/predef.h The whole point being better compatibillity with platforms that may not have an stdint.h with the propper definitions. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-25Remove condensed sparse file handling from libsquashfsDavid Oberhollenzer
This only exists for tar2sqfs. Move the sparse file map to libtar and add the ability to do this into the stind sqfs_file_t abstraction, so it acts like a normal file but internally stitches the data together from the sparse implementation. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-15Move sparse_map_t to libsquashfs headers, rename it to sqfs_sparse_map_tDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-09-11Cleanup Automake file for librariesDavid Oberhollenzer
Split the signel file up into several small ones and use a variable for the public headers instead of duplicating them. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-03Fix tar header error reporting on 32 bit systemsDavid Oberhollenzer
If an extension header is rejected because its too big, the error path would print the size as size_t, altough it is an uint64_t. On 64 bit systems, this works because size_t is a 64 bit unsigned integer, on 32 bit systems, not so much. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-08-01Add ability to write_tar_header to embedd extended attributesDavid Oberhollenzer
This commit patches the tar writer to generate a PAX header with SCHILY xattr key/value pairs if requested. The Schily format is used for two reasons: - It is simple - It is apparently more widely supported than the libarchive format Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-30Add propper copyright headers to all source filesDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Use safer string copy function to fill tar headerDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-25Enforce reasonable upper and low bounds on the size of tar headersDavid Oberhollenzer
Since they are read directly into memory, blindly allocating the size from the tar ball is probably a bad idea. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24Fix processing of tar mtime on 32 bit systemsDavid Oberhollenzer
struct stat uses time_t to store time values. On some 32 bit systems, this may be a 32 bit integer. This patch adds a broken-out 64 bit time value to tar_header_decoded_t and makes sure to clamp the value to +/- (2^32 - 1) if required when writing it back to a struct stat. Reported-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24cleanup: remove atime/ctime processing codeDavid Oberhollenzer
This commit removes all the code for parsing and processing atime/ctime and values and related test code. Caring about those is kind of pointless because squashfs can only store mtime in inodes. The only relevant place is when generating a struct stat from a squashfs inode or an fstree node. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24Enable largefile supportMatt Turner
Requires that config.h be included before other headers, since the macro _FILE_OFFSET_BITS changes the definitions of things like 'struct stat'. I chose to simply include it at the top of every C file and at immediately after the double-inclusion guards of every header. Signed-off-by: Matt Turner <mattst88@gmail.com> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-24libtar: more lenient tar checksum verificationDavid Oberhollenzer
Until now, the tar checksum verification simply copied the header, recomputed the checksum and compared it byte-for-byte with the original. However, not all implementations store the checksum the same way. For instance, git-archive generates tar balls that use the same format as for other octal numbers. This patch makes the checksum verification more lenient by parsing the checksum from the header and comparing it with the computed value instead of copying the entire block and insisting on byte-for-byte equivalence. The result is better interoperabillity with existing tools and perhaps slightly faster processing since the block doesn't have to be copied. Reported-by: Matt Turner <mattst88@gmail.com> Suggested-by: René Scharfe <l.s.r@web.de> Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16cleanup: move error handling into read_retryDavid Oberhollenzer
If read_retry fails to read the expected amount of data (EOF or otherwise), it is almost always an error. This commit renames read_retry to read_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-16cleanup: move error handling into write_retryDavid Oberhollenzer
If write_retry fails to write everything, it is *always* an error. This commit renames write_retry to write_data and moves error handling into the function, making a lot of error handling code redundant. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-04libtar: add support for xattr extensionsDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-04Fix composition order of prefix + name for ustarDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-03Add no-skip option to sqfs2tarDavid Oberhollenzer
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-03tar writer: replace PAX headers with GNU extensionsDavid Oberhollenzer
Some experiments seem to indicate that the various GNU extensions are more widely supported than their POSIX equivalents[1]. Possibly because they are easier to implement and possibly because of the wide spread use of GNU tar. This commit replaces the PAX writer in the write_tar_header implementation with a GNU extension based writer. The writer is also cleaned up by removing all global state. The record counter is moved outside into the tar2sqfs program and passed in as function argument. [1] https://dev.gentoo.org/~mgorny/articles/portability-of-tar-features.html Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-03Fix: don't blindly strcpy() the tar header nameDavid Oberhollenzer
If the tar header name is exactely 100 bytes long, it does not have a trailing null byte. Using strlen/strcpy on it effectively concatenates the following fields contents to it. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-01cleanup: split tar code up, remove some duplicationsDavid Oberhollenzer
This commit attempts to split some of the monolitic tar parsing code up into multiple functions in seperate files. Also, some code duplication (like reading a record into memory which was implemented twice) is removed. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-01Remove unused variable assignment from tar header writerDavid Oberhollenzer
Mainly to make scan-build happy. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
2019-07-01Fix double free in GNU tar sparse file parserDavid Oberhollenzer
The intention was to free the temporary object in addition to the list. Since the temp pointer is also used for iterating the list, this resulted in a double free instead of the intended. This commit fixes the double free by replacing the recycled variable based list free with the helper function intended to do this. Bug found using scan-build. Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>