summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorZachary Dremann <dremann@gmail.com>2021-08-01 11:50:38 -0400
committerDavid Oberhollenzer <david.oberhollenzer@sigma-star.at>2021-08-12 18:24:12 +0200
commit23b81d2554973fed261f51bfe878a9d983caf53d (patch)
tree29df2ac99aa8ec38dd785334c411414f05cb4b4f /doc
parent915808aaf0660b03fc2e3fa5e95a2f2e2aaa6daf (diff)
Replace format.txt with an asciidoc version
Diffstat (limited to 'doc')
-rw-r--r--doc/format.adoc1067
-rw-r--r--doc/format.txt1216
2 files changed, 1067 insertions, 1216 deletions
diff --git a/doc/format.adoc b/doc/format.adoc
new file mode 100644
index 0000000..f81b6f8
--- /dev/null
+++ b/doc/format.adoc
@@ -0,0 +1,1067 @@
+= Squashfs Binary Format
+:toc: left
+:toclevels: 4
+:sectnums:
+
+== About
+
+SquashFS is a compressed, read-only filesystem for Linux that can also be used
+as a flexible, general purpose, compressed archive format, optimized for fast
+random access with support for Unix permissions, sparse files and extended
+attributes.
+
+SquashFS supports data and metadata compression through zlib, lz4, lzo, lzma,
+xz or zstd.
+
+For fast random access, compressed files are split up in fixed size blocks
+that are compressed separately.
+The block size can be set between 4k and 1M (default for squashfs-tools and
+squashfs-tools-ng is 128K).
+
+This document attempts to specify the on-disk format in detail.
+It is based on a previous on-line version that was originally written by
+Zachary Dremann and subsequently expanded by David Oberhollenzer during
+reverse engineering attempts and available here: https://dr-emann.github.io/squashfs/.
+
+== Overview
+
+SquashFS always stores integers in little endian format.
+The data blocks that make up the SquashFS archive are byte aligned,
+i.e. they typically do not care for alignment.
+The implementation in the Linux kernel requires the archive itself to
+be a multiple of either 1k or 4k in size (called the device block size)
+and user space tools typically use 4k to be compatible with both.
+
+A SquashFS archive consists of a maximum of nine parts:
+
+[%nowrap]
+----
+ _______________
+| | Important information about the archive, including
+| Superblock | locations of other sections.
+|_______________|
+| | If non-default compression options have been used,
+| Compression | they can optionally be stored here, to facilitate
+| options | later, offline editing of the archive.
+|_______________|
+| |
+| Data blocks | The contents of the files in the archive,
+| & fragments | split into separately compressed blocks.
+|_______________|
+| | Metadata (ownership, permissions, etc) for
+| Inode table | items in the archive.
+|_______________|
+| |
+| Directory | Directory listings, including file names, and
+| table | references to inodes.
+|_______________|
+| |
+| Fragment | Description of fragment locations within the
+| table | Datablocks & Fragments section.
+|_______________|
+| | A mapping from inode numbers to disk locations,
+| Export table | required for NFS export.
+|_______________|
+| |
+| UID/GID | A list of unique UID/GIDs. Inodes use an index into
+| lookup table | this table to save memory.
+|_______________|
+| |
+| Xattr | Extended attributes for items in the archive.
+| table |
+|_______________|
+----
+
+Although the super block details the exact positions of each section, most
+implementations, including the one in the Linux kernel, insist on this exact
+order.
+
+=== Packing File Data
+
+The file data is packed into the archive after the super block (and optional
+compressor options).
+
+Files are divided into fixed size blocks that are separately compressed and
+stored in order. SquashFS supports optional tail-end-packing of files that
+are not an exact multiple of the block size. The remaining ends can either
+be treated as a short block, or can be packed together with the tail ends of
+other files in a single "fragment block". Files that are less than block size
+are treated the same way.
+
+If the size of a data or fragment block would exceed the input size after
+compression, the original, uncompressed data is stored, so that the size of a
+block after compression never exceeds the input block size.
+
+=== Packing Metadata
+
+Metadata (e.g. inodes, directory listings, etc...) is treated as a continuous
+stream of records that is chopped up into 8KiB blocks that are separately
+compressed into special metadata blocks.
+
+The input size of 8KiB is fixed and independent of the data block size.
+Similar to data blocks, if the compressed size would exceed 8KiB, the
+uncompressed block is stored instead, so the on-disk size of a metadata
+block never exceeds 8KiB.
+
+Individual entries are allowed to cross the block boundary, so e.g. an inode
+may be located at the end of a metadata block with some part of it located at
+the start of the next block. Both have to be read and decompressed when
+reading this inode. If an entry is written across block boundaries, there
+*MUST NOT* be any gap between the compressed metadata blocks on-disk.
+
+
+In contrast to data blocks, every metadata block is preceded by a single,
+16 bit unsigned integer. This integer holds the on-disk size of the block
+that follows. The MSB is set if the block is stored uncompressed. Whenever
+a metadata block is referenced, the position of this integer is given.
+
+To read a metadata block, seek to the indicated position and read the 16 bit
+header. Sanity check that the lower 15 bit are less than 8KiB and proceed
+to read that many bytes. If the highest bit of the header is cleared,
+uncompress the data into an 8KiB buffer that *MUST NOT* overflow.
+
+
+In the SquashFS archive format, metadata entries (e.g. inodes) are often
+referenced using a 64 bit integer. The lower 16 bit hold an offset into the
+uncompressed block and the upper 48 bit point to the on-disk location of the
+block.
+
+The on-disk location is relative to the type of metadata entry, e.g. for
+inodes it is relative to the start of the inode table given by the
+super block.
+
+=== Storing Lookup Tables
+
+Lookup tables are arrays (i.e. sequences of identical sized records) that are
+addressed by an index.
+
+Such tables are stored in the SquashFS format as metadata blocks, i.e. by
+dividing the table data into 8KiB chunks that are separately compressed and
+stored in sequence.
+
+To allow constant time lookup, a list of 64 bit unsigned integers is stored,
+holding the on-disk locations of each metadata block.
+
+This list itself is stored uncompressed and not preceded by a header.
+
+When referring to a lookup table, the superblock gives the number of table
+entries and points to this location list.
+
+Since the table entry size is a known, fixed value, the required number of
+metadata blocks can be computed:
+
+ block_count = ceil(table_count * entry_size / 8192)
+
+Which is also the number of 64 bit integers in the location list.
+
+When resolving a lookup table index, first work out the index of the
+metadata block:
+
+ meta_index = floor(index * entry_size / 8192)
+
+Using this index on the location list yields the on-disk location of
+the metadata block containing the entry.
+
+After reading this metadata block, the byte offset into the block can
+be computed to get the entry:
+
+ offset = index * entry_size % 8192
+
+The location list can be cached in memory. Resolving an index requires at
+worst a single metadata block read (at most 8194 bytes fetched from an
+unaligned on-disk location).
+
+=== Supported Compressors
+
+The SquashFS format supports the following compressors:
+
+* zlib deflate (referred to as "gzip" but only uses raw zlib streams)
+* lzo
+* lzma 1 (considered deprecated)
+* lzma 2 (referred to as "xz")
+* lz4
+* zstd
+
+The archive can only specify one compressor in the super block and has to use
+it for both file data and metadata compression. Using one compressor for data
+and switching to a different compressor for e.g. inodes is not supported.
+
+While it is technically not possible to pick a "null" compressor in the super
+block, an implementation can still deliberately write only uncompressed blocks
+to a SquashFS archive, or choose to store certain metadata blocks without
+compression.
+
+The lzma 2 aka xz compressor *MUST* use `CRC32` checksums only. Using `SHA-256` is
+not supported.
+
+== The Superblock
+
+The superblock is the first section of a SquashFS archive. It is always
+96 bytes in size and contains important information about the archive,
+including the locations of other sections.
+
+[cols="1,3,13a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | magic | Must be set to `0x73717368` ("hsqs" on disk).
+| u32 | inode count | The number of inodes stored in the archive.
+| u32 | mod time | Last modification time of the archive. Count seconds
+ since 00:00, Jan 1st 1970 UTC (not counting leap
+ seconds). This is unsigned, so it expires in the
+ year 2106 (as opposed to 2038).
+| u32 | block size | The size of a data block in bytes. Must be a power
+ of two between 4096 (4k) and 1048576 (1 MiB).
+| u32 | frag count | The number of entries in the fragment table.
+| u16 | compressor | An ID designating the compressor used for both data
+ and meta data blocks.
+
+[cols=">1,2,8",frame="none",grid="none",options="header"]
+!===
+! Value ! Name ! Comment
+! 1 ! GZIP ! just zlib streams (no gzip headers\!)
+! 2 ! LZO !
+! 3 ! LZMA ! LZMA version 1
+! 4 ! XZ ! LZMA version 2 as used by xz-utils
+! 5 ! LZ4 !
+! 6 ! ZSTD !
+!===
+
+| u16 | block log | The log~2~ of the block size. If the two fields do not
+ agree, the archive is considered corrupted.
+| u16 | flags | Bit wise *OR* of the flag bits below.
+
+
+[cols=">1m,10",frame="none",grid="none",options="header"]
+!===
+! Value ! Meaning
+! 0x0001 ! Inodes are stored uncompressed.
+! 0x0002 ! Data blocks are stored uncompressed.
+! 0x0004 ! Unused, should always be unset.
+! 0x0008 ! Fragments are stored uncompressed.
+! 0x0010 ! Fragments are not used.
+! 0x0020 ! Fragments are always generated.
+! 0x0040 ! Data has been deduplicated.
+! 0x0080 ! NFS export table exists.
+! 0x0100 ! Xattrs are stored uncompressed.
+! 0x0200 ! There are no Xattrs in the archive.
+! 0x0400 ! Compressor options are present.
+! 0x0800 ! The ID table is uncompressed.
+!===
+
+| u16 | id count | The number of entries in the ID lookup table.
+| u16 | version major | Major version of the format. Must be set to 4.
+| u16 | version minor | Minor version of the format. Must be set to 0.
+| u64 | root inode | A reference to the inode of the root directory.
+| u64 | bytes used | The number of bytes used by the archive. Because
+ SquashFS archives must be padded to a multiple of the underlying
+ device block size, this can be less than the actual file size.
+| u64 | ID table | The byte offset at which the id table starts.
+| u64 | Xattr table | The byte offset at which the xattr id table starts.
+| u64 | Inode table | The byte offset at which the inode table starts.
+| u64 | Dir. table | The byte offset at which the directory table starts.
+| u64 | Frag table | The byte offset at which the fragment table starts.
+| u64 | Export table | The byte offset at which the export table starts.
+|===
+
+
+The Xattr table, fragment table and export table are optional. If they are
+omitted from the archive, the respective fields indicating their position
+must be set to `0xFFFFFFFFFFFFFFFF` (i.e. all bits set).
+
+Most of the flags only serve an informational purpose and are only useful
+when editing the archive to convey the original packer settings.
+
+The only flag that actually carries information is the "Compressor options are
+present" flag. In fact, this is the only flag that the Linux kernel
+implementation actually tests for.
+
+The compressor options, however, are also only there for informal purpose, as
+most compression libraries understand their own stream format irregardless of
+the options used to compress and in fact don't provide any options for the
+decompressor. In the Linux kernel, the XZ decompressor is currently the only
+one that processes those options to pre-allocate the LZMA dictionary if a
+non-default size was used.
+
+=== Compression Options
+
+If the compressor options flag is set in the superblock, the superblock is
+immediately followed by a single metadata block, which is always uncompressed.
+
+The data stored in this block is compressor dependent.
+
+There are two special cases:
+
+* For LZ4, the compressor options always have to be present.
+* The LZMA compressor does not support compressor options, so this section
+ must never be present.
+
+For the compressors currently implemented, a 4 to 8 byte payload follows.
+
+The following sub sections outline the contents for each compressor that
+supports options. The default values if the options are missing are outlined
+as well.
+
+==== GZIP
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | compression level | In the range 1 to 9 (inclusive). Defaults to 9.
+| u16 | window size | In the rage 8 to 15 (inclusive). Defaults to 15.
+| u16 | strategies | A bit field describing the enabled strategies.
+ If no flags are set, the default strategy is
+ implicitly used. Please consult the ZLIB manual
+ for details on specific strategies.
+
+[cols=">1m,10",frame="none",grid="none",options="header"]
+!===
+! Value ! Comment
+! 0x0001 ! Default strategy.
+! 0x0002 ! Filtered.
+! 0x0004 ! Huffman Only.
+! 0x0008 ! Run Length Encoded.
+! 0x0010 ! Fixed.
+!===
+|===
+
+NOTE: The SquashFS writer typically tries all selected strategies (including
+not setting any and letting zlib work with defaults) and stores the result
+with the smallest size.
+
+==== XZ
+
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | dictionary size | *SHOULD* be >= 8KiB, and must be either a power of
+ 2, or the sum of two consecutive powers of 2.
+| u32 | Filters | A bit field describing the additional enabled
+ filters attempted to better compress executable
+ code.
+
+[cols=">1m,10",frame="none",grid="none",options="header"]
+!===
+! Value ! Comment
+! 0x0001 ! x86
+! 0x0002 ! PowerPC
+! 0x0004 ! IA64
+! 0x0008 ! ARM
+! 0x0010 ! ARM thumb
+! 0x0020 ! SPARC
+!===
+|===
+
+NOTE: A SquashFS writer typically tries all selected VLI filters (including
+not setting any and letting libxz work with defaults) and stores the resulting
+block that has the smallest size.
+
+Also note that further options, such as XZ presets, are not included. The
+compressor typically uses the libxz defaults, i.e. level 6 and not using the
+extreme flag. Likewise for `lc`, `lp` and `pb` (defaults are 3, 0 and 2
+respectively).
+
+If the encoder chooses to change those values, the decoder will still be
+able to read the data, but there is currently no way to convey that those
+values were changed.
+
+This is specifically problematic for the compression level, since increasing
+the level can result in drastically increasing the decoders memory consumption.
+
+==== LZ4
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | Version | *MUST* be set to 1.
+| u32 | Flags | A bit field describing the enabled LZ4 flags.
+ There is currently only one possible flag:
+
+
+[cols=">1m,10",frame="none",grid="none",options="header"]
+!===
+! Value ! Comment
+! 0x0001 ! Use LZ4 High Compression(HC) mode.
+!===
+|===
+
+==== ZSTD
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | compression level | Should be in range 1 to 22 (inclusive). The real
+ maximum is the zstd defined ZSTD_maxCLevel().
+ +
+ +
+ The default value is 15.
+|===
+
+==== LZO
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | algorithm | Which variant of LZO to use.
+
+[cols=">1m,10",frame="none",grid="none",options="header"]
+!===
+! Value ! Comment
+! 0 ! lzo1x_1
+! 1 ! lzo1x_1_11
+! 2 ! lzo1x_1_12
+! 3 ! lzo1x_1_15
+! 4 ! lzo1x_999 (default)
+!===
+
+| u32 | compression level | For lzo1x_999, this can be a value between 0
+ and 9 inclusive (defaults to 8). *MUST* be 0
+ for all other algorithms.
+|===
+
+== Data and Fragment Blocks
+
+As outlined in 2.1, file data is packed by dividing the input files into fixed
+size chunks (the block size from the super block) that are stored in sequence.
+
+The picture below tries to illustrate this concept:
+
+.Packing of File Data
+[%nowrap]
+....
+ _____ _____ _____ _ _____ _____ _ _
+File A: |__A__|__A__|__A__|A| File B: |__B__|__B__|B| File C: |C|
+ | | | | | | | |
+ | +---+ | | | | | |
+ | | +------+ | | | | |
+ | | | | | | | |
+ | | | +------|---------------+ | | |
+ | | | | +--|---------------------+ | |
+ | | | | | | | |
+ | | | | | +-----------------------+ | +------------+
+ | | | | | | | |
+ V V V V V V V V
+ __ _ ___ ___ ___ __ Fragment block: |A|B|C|
+ Output: |_A|A|_A_|_B_|_B_|_F| |
+ __V__
+ A |__F__|
+ | |
+ +------------------------+
+....
+
+In the above diagram, file A consists of 3 blocks and a single tail end, file B has
+2 blocks and one tail end while file C is smaller than block size.
+
+
+For each file, the blocks are individually compressed and stored on disk
+in order.
+
+The tail ends of A and B, together with the entire contents of C are packed
+together into a fragment block F, that is compressed and stored on disk once
+it is full.
+
+This tail-end-packing is completely optional. The tail ends (or in case of C
+the entire file) can also be treated as truncated blocks that expand to less
+than block size when uncompressed.
+
+
+There are no headers in front of data or fragment blocks and there *MUST NOT* be
+any gaps between data blocks from a single file, but a SquashFS packer is free
+to leave gaps between two different files or fragment blocks. The packer is
+also free to decide how to arrange fragments within a fragment block and what
+fragments to pack together.
+
+To locate file data, the file inodes store the following information:
+
+* The uncompressed size of the file. From this, the number of blocks can
+ be computed:
+
+ block_count = floor(file_size / block_size) # if tail end packing is used
+ block_count = ceil(file_size / block_size) # otherwise
+
+* The exact location of the first block, if one exists.
+* For each consecutive block, the on-disk size.
++
+A 32 bit integer is used with bit 24 (i.e. `1 << 24`) set if the block
+is stored uncompressed.
+
+* If tail-end-packing was done, the location of the fragment block and a
+ byte offset into the uncompressed fragment block. The size of the tail
+ end can be computed easily:
+
+ tail_end_size = file_size % block_size
+
+Since a fragment block will likely be referred to by multiple files, inodes
+don't store its on-disk location and size directly, but instead use a 32 bit
+index into a fragment block lookup table (see the <<Fragment Table>>).
+
+If a data block other than the last one unpacks to less than block size, the
+rest of the buffer is filled with 0 bytes. This way, sparse files are
+implemented. Specifically if a block has an on-disk size of 0 this translates
+to an entire block filled with 0 bytes without having to retrieve any data
+from disk.
+
+The on-disk locations of file blocks *MAY* overlap and different file inodes are
+free to refer to the same fragment. Typical SquashFS packers would explicitly
+use this to for files that are duplicates of others. Doing so is NOT counted
+as a hard link.
+
+If an inode references on-disk locations outside the data area, the result is
+undefined.
+
+== Inode Table
+
+Inodes are packed into metadata blocks and are not aligned, i.e. they can span
+the boundary between metadata blocks. To save space, there are different
+inodes for each type (regular file, directory, device, etc.) of varying
+contents and size.
+
+To further save more space, inodes come in two flavors: simple inode types
+optimized for a simple, standard use case, and extended inode types where
+extra information has to be stored.
+
+SquashFS more or less supports 32 bit UIDs and GIDs. As an optimization, those
+IDs are stored in a lookup table (see <<ID Table>>) and the inodes themselves
+hold a 16 bit index into this table. This allows to 32 bit UIDs/GIDs, but only
+among 2^16^ unique values.
+
+The location of the first metadata block is indicated by the inode table start
+in the superblock. The inode table ends at the start of the directory table.
+
+=== Common Inode Header
+
+All Inodes share a common header, which contains some common information,
+as well as describing the type of Inode which follows. This header has the
+following structure:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u16 | type | The type of item described by the inode which follows this header
+
+[cols=">1,10",frame="none",grid="none",options="header"]
+!===
+! Value ! Comment
+! 1 ! Basic Directory
+! 2 ! Basic File
+! 3 ! Basic Symlink
+! 4 ! Basic Block Device
+! 5 ! Basic Character Device
+! 6 ! Basic Named Pipe (FIFO)
+! 7 ! Basic Socked
+! 8 ! Extended Directory
+! 9 ! Extended File
+! 10 ! Extended Symlink
+! 11 ! Extended Block Device
+! 12 ! Extended Character Device
+! 13 ! Extended Named Pipe (FIFO)
+! 14 ! Extended Socked
+!===
+
+| u16 | permissions | A bit mask representing Unix file system permissions
+ for the inode. This only stores permissions, not the
+ type. The type is reconstructed from the field above.
+| u16 | uid | An index into the <<ID Table>>, giving the user ID of the owner.
+| u16 | gid | An index into the <<ID Table>>, giving the group ID of the owner.
+| u32 | mtime | The unsigned number of seconds (not counting leap
+ seconds) since 00:00, Jan 1st, 1970 UTC when the item
+ described by the inode was last modified.
+| u32 | inode number | Unique node number. Must be at least 1 and at most
+ the inode count from the super block.
+|===
+
+=== Directory Inodes
+
+Directory inodes mainly contain a reference into the directory table where
+the listing of entries is stored.
+
+A basic directory has an entry listing of at most 64k (uncompressed) and
+no extended attributes. The layout of the inode data is as follows:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | block index | The location of the metadata block in the directory
+ table where the entry information starts. This is
+ relative to the directory table location.
+| u32 | link count | The number of hard links to this directory.
+| u16 | file size | Total (uncompressed) size in bytes of the entry
+ listing in the directory table, including headers.
+ +
+ +
+ This value is 3 bytes larger than the real listing.
+ The Linux kernel creates "." and ".." entries for
+ offsets 0 and 1, and only after 3 looks into the
+ listing, subtracting 3 from the size.
+| u16 | block offset | The (uncompressed) offset within the metadata block
+ in the directory table where the directory listing
+ starts.
+| u32 | parent inode | The inode number of the parent of this directory. If
+ this is the root directory, this *SHOULD* be 0.
+|===
+
+NOTE: For historical reasons, the hard link count of a directory includes
+the number of entries in the directory and is initialized to 2 for an empty
+directory. I.e. a directory with N entries has at least N + 2 link count.
+
+If the "file size" is set to a value < 4, the directory is empty and there is
+no corresponding listing in the directory table.
+
+An extended directory can have a listing that is at most 4GiB in size, may
+have extended attributes and can have an optional index for faster lookup:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | link count | Same as above.
+| u32 | file size | Same as above.
+| u32 | block index | Same as above.
+| u32 | parent inode | Same as above.
+| u16 | index count | The number of directory index entries following the
+ inode structure.
+| u16 | block offset | Same as above.
+| u32 | xattr index | An index into the <<Xattr Table>> or `0xFFFFFFFF`
+ if the inode has no extended attributes.
+|===
+
+
+The index follows directly after the inode. See <<Directory Index>> for details on
+how the directory index is structured.
+
+=== File Inodes
+
+Basic files can be at most 4 GiB in size (uncompressed), must be located
+within the first 4 GiB of the SquashFS image, cannot have any extended
+attributes and don't support hard-link or sparse file accounting:
+
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | blocks start | The offset from the start of the archive to the first
+ data block.
+| u32 | frag index | An index into the <<Fragment Table>> which describes the fragment
+ block that the tail end of this file is stored in. If not used,
+ this is set to `0xFFFFFFFF`.
+| u32 | block offset | The (uncompressed) offset within the fragment block
+ where the tail end of this file is. See <<Data and Fragment Blocks>>
+ for details.
+| u32 | file size | The (uncompressed) size of this file.
+| u32[] | block sizes | An array of consecutive block sizes. See <<Data and Fragment Blocks>> for details.
+|===
+
+If 'frag index' is set to `0xFFFFFFFF`, the number of blocks is computed as
+
+ ceil(file_size / block_size)
+
+otherwise, if 'frag index' is a valid fragment index, the block count is
+computed as
+
+ floor(file_size / block_size)
+
+and the size of the tail end is
+
+ file_size % block_size
+
+
+To access a data block, first compute the block index as
+
+ index = floor(offset / block_size)
+
+then compute the on-disk location of the block by summing up the sizes of the
+blocks that come before it:
+
+ location = block_start
+
+ for i = 0; i < index; i++
+ location += block_sizes[i] & 0x00FFFFFF
+
+
+The tail end, if present, is accessed by resolving the fragment index through
+the fragment lookup table (see the <<Fragment Table>>), loading the fragment block and
+using the given 'block offset' into the fragment block.
+
+Extended files have a 64 bit location and size, have additional counters for
+sparse file accounting and hard links, and can have extended attributes:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u64 | blocks start | Same as above (but larger).
+| u64 | file size | Same as above (but larger).
+| u64 | sparse | The number of bytes saved by omitting zero bytes.
+ Used in the kernel for sparse file accounting.
+| u32 | link count | The number of hard links to this node.
+| u32 | frag index | Same as above.
+| u32 | block offset | Same as above.
+| u32 | xattr index | An index into the <<Xattr Table>> or `0xFFFFFFFF`
+ if the inode has no extended attributes.
+| u32[] | block sizes | Same as above.
+|===
+
+=== Symbolic Links
+
+Symbolic links mainly have a target path stored directly after the inode
+header, as well as a hard-link counter (yes, you can have hard links to
+symlinks):
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | link count | The number of hard links to this symlink.
+| u32 | target size | The size in bytes of the target path this symlink
+ points to.
+| u8[] | target path | An array of bytes holding the target path this
+ symlink points to. The path is 'target size' bytes
+ long and NOT null-terminated.
+|===
+
+The extended symlink type adds an additional extended attribute index:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | link count | Same as above.
+| u32 | target size | Same as above.
+| u8[] | target path | Same as above.
+| u32 | xattr index | An index into the <<Xattr Table>>
+|===
+
+=== Device Special Files
+
+Basic device special files only store a hard-link counter and a device number.
+The layout is identical for both character and block devices:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | link count | The number of hard links to this entry.
+| u32 | device number | The system specific device number.
+ +
+ +
+ On Linux, this consists of major and minor device
+ numbers that can be extracted as follows:
+
+ major = (dev & 0xFFF00) >> 8.
+ minor = (dev & 0x000FF) | ((dev >> 12) & 0xFFF00)
+|===
+
+The extended device file inode adds an additional extended attribute index:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | link count | Same as above.
+| u32 | device number | Same as above.
+| u32 | xattr index | An index into the <<Xattr Table>>
+|===
+
+=== IPC Inodes (FIFO or Socket)
+
+Named pipe (FIFO) and socket special files only add a hard-link counter
+after the inode header:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | link count | The number of hard links to this entry.
+|===
+
+The extended versions add an additional extended attribute index:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | link count | Same as above.
+| u32 | xattr index | An index into the <<Xattr Table>>
+|===
+
+== Directory Table
+
+For each directory inode, the directory table stores a linear list of all
+entries, with references back to the inodes that describe those entries.
+
+The entry list itself is sorted ASCIIbetically by entry name and split into
+multiple runs, each preceded by a short header.
+
+The directory inodes store the total, uncompressed size of the entire listing,
+including headers. Using this size, a SquashFS reader can determine if another
+header with further entries should be following once it reaches the end of a
+run.
+
+To save space, the header indicates a metadata block and a reference inode
+number. The entries that follow simply store a difference to that inode number
+and an offset into the specified metadata block.
+
+Every time, the inode block changes or the difference of the inode number
+to the reference in the header cannot be encoded in 16 bits anymore, a new
+header is emitted.
+
+A header must be followed by *AT MOST* 256 entries. If there are more entries,
+a new header *MUST* be emitted.
+
+Typically, inode allocation strategies would sort the children of a directory
+and then allocate inode numbers incrementally, to optimize directory entry
+listings.
+
+Since hard links might be further further away than ±32k of the reference
+number, they might require a new header to be emitted. Inode number allocation
+and picking of the reference could of course be optimized to prevent this.
+
+The directory header has the following structure:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | count | Number of entries following the header.
+| u32 | start | The location of the metadata block in the inode table
+ where the inodes are stored. This is relative to the
+ inode table start from the super block.
+| s32 | inode number | An arbitrary inode number. The entries that follow
+ store their inode number as a difference to this.
+|===
+
+The counter is stored off-by-one, i.e. a value of 0 indicates 1 entry follows.
+This also makes it impossible to encode a size of 0, which wouldn't make any
+sense. Empty directories simply have their size set to 0 in the inode instead,
+so no extra dummy header has to be stored or looked up.
+
+The header is followed by multiple entries that each have this structure:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u16 | offset | An offset into the uncompressed inode metadata block.
+| s16 | inode offset | The difference of this inode's number to the reference
+ stored in the header.
+| u16 | type | The inode type. For extended inodes, the basic type is stored
+ here instead.
+| u16 | name size | One less than the size of the entry name.
+| u8[] | name | The file name of the entry without a trailing null byte. Has
+ `name size` + 1 bytes.
+|===
+
+In the entry structure itself, the file names are stored without trailing null
+bytes. Since a zero length name makes no sense, the name length is stored
+off-by-one, i.e. the value 0 cannot be encoded.
+
+The inode type is stored in the entry, but always as the corresponding
+basic type.
+
+While the field is technically 16 bits, the kernel implementation currently
+imposes an arbitrary limit of 255 on the name size field. Since the field is
+off-by-one, this means that a file name in SquashFS can be at most 256
+characters long.
+
+=== Directory Index
+
+To speed up lookups on directories with lots of entries, the extended
+directory inode can store an index, holding the locations of all directory
+headers and the name of the first entry after the header.
+
+When searching for an entry, the reader can then iterate over the index to
+find a range of metadata blocks that should contain a given entry and then
+only scan over the given range.
+
+To allow for even faster lookups, a new header should be emitted every time
+the entry list crosses a metadata block boundary. This narrows the boundary
+down to a single metadata block lookup in most cases.
+
+The index entries have the following structure:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | index | This stores a byte offset from the first directory
+ header to the current header, as if the uncompressed
+ directory metadata blocks were laid out in memory
+ consecutively.
+| u32 | start | Start offset of a directory table metadata block,
+ relative to the directory table start.
+| u32 | name size | One less than the size of the entry name.
+| u8[] | name | The name of the first entry following the header
+ without a trailing null byte.
+|===
+
+== Fragment Table
+
+Tail-ends and smaller than block size files can be combined into fragment
+blocks that are at most 'block size' bytes long.
+
+The fragment table describes the location and size of the fragment blocks
+(not the tail-ends within them).
+
+This is a lookup table which stores entries of the following shape:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u64 | start | The offset within the archive where the fragment block starts
+| u32 | size | The on-disk size of the fragment block. If the block is
+ uncompressed, bit 24 (i.e. `1 << 24`) is set.
+| u32 | unused | *SHOULD* be set to 0.
+|===
+
+The table is stored on-disk as described in <<Storing Lookup Tables>>.
+
+The fragment table location in the superblock points to an array of 64 bit
+integers that store the on-disk locations of the metadata blocks containing
+the lookup table.
+
+Each metadata block can store up to 512 entries (`8129 / 16`).
+
+The "unused" field is there for alignment and *SHOULD* be set to 0, however the
+Linux kernel currently ignores this field completely, making it impossible for
+Linux to ever re-purpose this field.
+
+== Export Table
+
+To support NFS exports, SquashFS needs a fast way to resolve an inode number
+to an inode structure.
+
+For this purpose, a SquashFS archive can optionally contain an export table,
+which is basically a flat array of 64 bit inode references, with the inode
+number being used as an index into the array.
+
+Because the inode number 0 is not used (reserved as a sentinel value), the
+array actually starts at inode number 1 and the index is thus
+inode_number - 1.
+
+The array itself is stored in a series of metadata blocks, as outlined in
+<<Storing Lookup Tables>>.
+
+Since each block can store 1024 references (`8192 / 8`), there will be
+`ceil(inode_count / 1024)` metadata blocks for the entire array.
+
+== ID Table
+
+As outlined in <<Common Inode Header>>, SquashFS supports 32 bit user and group IDs. To
+compact the inode table, the unique UID/GID values are collected in a lookup
+table and a 16 bit table index is stored in the inode instead.
+
+This lookup table is stored as outlined in <<Storing Lookup Tables>>.
+
+Each metadata block can store up to 2048 IDs (`8192 / 4`).
+
+[[_xattr_table,Xattr Table]]
+== Extended Attribute Table
+
+Extended attributes are arbitrary key value pairs attached to inodes. The key
+names use dots as separators to create a hierarchy of name spaces.
+
+The key value pairs of all inodes are stored consecutively in a series of
+metadata blocks.
+
+The values can either be stored inline, i.e. a key entry is directly followed
+by a value, or out-of-line to deduplicate identical values and use a reference
+instead. Typically, the first occurrence of a value is stored in line and
+every consecutive use of the same value uses an out-of-line reference back to
+the first one.
+
+The keys are stored using the following data structure:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u16 | type | A prefix ID for the key name. If the value that follows
+ is stored out-of-line, the flag `0x0100` is **OR**ed to the
+ type ID.
+
+[cols=">1,10",frame="none",grid="none",options="header"]
+!===
+! Value ! Comment
+! 0 ! Prefix the name with `"user."`
+! 1 ! Prefix the name with `"trusted."`
+! 2 ! Prefix the name with `"security."`
+!===
+
+| u16 | name size | The number of key bytes the follows.
+| u8[] | name | The remainder of the key without the prefix and without a
+ trailing null byte.
+|===
+
+Following the key, this structure is used to store the value:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u32 | value size | The size of the value string. If the value is stored
+ out of line, this is always 8, i.e. the size of an
+ unsigned 64 bit integer.
+| u8[] | value | This is 'value size' bytes of arbitrary binary data.
+ If the value is stored out-of-line, this is a 64 bit
+ reference, i.e. a location of a metadata block,
+ shifted left by 16 and **OR**ed with an offset into the
+ uncompressed block, giving the location of another
+ value structure.
+|===
+
+The metadata block location given by an out-of-line reference is relative to
+the location of the first block.
+
+To actually address a block of key value pairs associated with an inode, a
+lookup table is used that specifies the start and size of a sequence of key
+value pairs.
+
+All an inode needs to store is a 32 bit index into this table. If two inodes
+have an identical attribute sets, the key/value sequence is only written once,
+there is only one lookup table entry and both inodes have the same index.
+
+Each lookup table entry has the following structure:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u64 | xattr ref | A reference to the start of the key value block, i.e.
+ the metadata block location shifted left by 16, **OR**ed
+ with an offset into the uncompressed block.
+| u32 | count | The number of key value pairs.
+| u32 | size | The exact, uncompressed size in bytes of the entire
+ block of key value pairs, counting what has been
+ written to disk and including the key/value entry
+ structures.
+|===
+
+This lookup table is stored as outlined in <<Storing Lookup Tables>>
+
+Each metadata block can hold 512 (`8192 / 16`) entries.
+
+However, in contrast to <<Storing Lookup Tables>>, additional data is given before
+the list of metadata block locations, to locate the key-value pairs, as well as the
+actual number of lookup table entries that are not specified in the super
+block.
+
+The 'Xattr table' entry in the superblock gives the absolute location of the
+following data structure which is stored on-disk as is, uncompressed:
+
+[cols="1,4,20a",frame="none",grid="none",options="header"]
+|===
+| Type | Name | Description
+| u64 | kv start | The absolute position of the first metadata block holding the
+ key/value pairs.
+| u32 | count | The number of entries in the lookup table.
+| u32 | unused | *SHOULD* be set to 0, however Linux currently ignores
+ this field completely and squashfs-tools used to leak
+ stack data here, making it impossible for Linux to
+ ever re-purpose this field.
+| u64[] | locations | An array holding the absolute on-disk location of each
+ metadata block of the lookup table.
+|===
+
+If an inode has a a valid xattr index (i.e. not `0xFFFFFFFF`), the metadata
+block index is computed as
+
+ block_idx = floor(index / 512)
+
+which is then used to retrieve the metadata block index from the locations
+array.
+
+Once the block has been read from disk and uncompressed, the byte offset into
+the metadata block can be computed as
+
+ offset = (index * 16) % 8192
+
+From this position, the structure can be read that holds a reference to the
+metadata block that contains the key/value pairs (and byte offset into the
+uncompressed block where the pairs start), as well as the number of key/value
+pairs and their total, uncompressed size.
diff --git a/doc/format.txt b/doc/format.txt
deleted file mode 100644
index 3eb4932..0000000
--- a/doc/format.txt
+++ /dev/null
@@ -1,1216 +0,0 @@
-
- Squashfs Binary Format
- **********************
-
-0) Index
-********
-
- 0............Index
- 1............About
- 2............Overview
- 2.1........Packing File Data
- 2.2........Packing Metadata
- 2.3........Storing Lookup Tables
- 3............The Superblock
- 3.1........Compression Options
- 3.1.1....GZIP
- 3.1.2....XZ
- 3.1.3....LZ4
- 3.1.4....ZSTD
- 3.1.5....LZO
- 4............Data and Fragment Blocks
- 5............Inode Table
- 5.1........Common Inode Header
- 5.2........Directory inodes
- 5.3........File Inodes
- 5.4........Symbolic Links
- 5.5........Device Special Files
- 5.6........IPC inodes (FIFO or Socket)
- 6............Directory Table
- 6.1........Directory Index
- 7............Fragment Table
- 8............Export Table
- 9............ID Table
- 10...........Extended Attribute Table
-
-1) About
-********
-
-SquashFS is a compressed, read-only filesystem for Linux that can also be used
-as a flexible, general purpose, compressed archive format, optimized for fast
-random access with support for Unix permissions, sparse files and extended
-attributes.
-
-SquashFS supports data and metadata compression through zlib, lz4, lzo, lzma,
-xz or zstd.
-
-For fast random access, compressed files are split up in fixed size blocks
-that are compressed separately. The block size can be set between 4k and 1M
-(default for squashfs-tools and squashfs-tools-ng is 128K).
-
-
-This document attempts to specify the on-disk format in detail.
-
-It is based on a previous on-line version that was originally written by
-Zachary Dremann and subsequently expanded by David Oberhollenzer during
-reverse engineering attempts and available here:
-
- https://dr-emann.github.io/squashfs/
-
-
-2) Overview
-***********
-
-SquashFS always stores integers in little endian format. The data blocks that
-make up the SquashFS archive are byte aligned, i.e. they typically do not care
-for alignment. The implementation in the Linux kernel requires the archive
-itself to be a multiple of either 1k or 4k in size (called the device block
-size) and user space tools typically use 4k to be compatible with both.
-
-A SquashFS archive consists of a maximum of nine parts:
-
- _______________
- | | Important information about the archive, including
- | Superblock | locations of other sections.
- |_______________|
- | | If non-default compression options have been used,
- | Compression | they can optionally be stored here, to facilitate
- | options | later, offline editing of the archive.
- |_______________|
- | |
- | Data blocks | The contents of the files in the archive,
- | & fragments | split into separately compressed blocks.
- |_______________|
- | | Metadata (ownership, permissions, etc) for
- | Inode table | items in the archive.
- |_______________|
- | |
- | Directory | Directory listings, including file names, and
- | table | references to inodes.
- |_______________|
- | |
- | Fragment | Description of fragment locations within the
- | table | Datablocks & Fragments section.
- |_______________|
- | | A mapping from inode numbers to disk locations,
- | Export table | required for NFS export.
- |_______________|
- | |
- | UID/GID | A list of unique UID/GIDs. Inodes use an index into
- | lookup table | this table to save memory.
- |_______________|
- | |
- | Xattr | Extended attributes for items in the archive.
- | table |
- |_______________|
-
-
-Although the super block details the exact positions of each section, most
-implementations, including the one in the Linux kernel, insist on this exact
-order.
-
-
-2.1) Packing File Data
-
-The file data is packed into the archive after the super block (and optional
-compressor options).
-
-Files are divided into fixed size blocks that are separately compressed and
-stored in order. SquashFS supports optional tail-end-packing of files that
-are not an exact multiple of the block size. The remaining ends can either
-be treated as a short block, or can be packed together with the tail ends of
-other files in a single "fragment block". Files that are less than block size
-are treated the same way.
-
-If the size of a data or fragment block would exceed the input size after
-compression, the original, uncompressed data is stored, so that the size of a
-block after compression never exceeds the input block size.
-
-
-2.2) Packing Metadata
-
-Metadata (e.g. inodes, directory listings, etc...) is treated as a continuous
-stream of records that is chopped up into 8KiB blocks that are separately
-compressed into special metadata blocks.
-
-The input size of 8KiB is fixed and independent of the data block size.
-Similar to data blocks, if the compressed size would exceed 8KiB, the
-uncompressed block is stored instead, so the on-disk size of a metadata
-block never exceeds 8KiB.
-
-Individual entries are allowed to cross the block boundary, so e.g. an inode
-may be located at the end of a metadata block with some part of it located at
-the start of the next block. Both have to be read and decompressed when
-reading this inode. If an entry is written across block boundaries, there
-MUST NOT be any gap between the compressed metadata blocks on-disk.
-
-
-In contrast to data blocks, every metadata block is preceded by a single,
-16 bit unsigned integer. This integer holds the on-disk size of the block
-that follows. The MSB is set if the block is stored uncompressed. Whenever
-a metadata block is referenced, the position of this integer is given.
-
-To read a metadata block, seek to the indicated position and read the 16 bit
-header. Sanity check that the lower 15 bit are less than 8KiB and proceed
-to read that many bytes. If the highest bit of the header is cleared,
-uncompress the data into an 8KiB buffer that MUST NOT overflow.
-
-
-In the SquashFS archive format, metadata entries (e.g. inodes) are often
-referenced using a 64 bit integer. The lower 16 bit hold an offset into the
-uncompressed block and the upper 48 bit point to the on-disk location of the
-block.
-
-The on-disk location is relative to the type of metadata entry, e.g. for
-inodes it is relative to the start of the inode table given by the
-super block.
-
-
-2.3) Storing Lookup Tables
-
-Lookup tables are arrays (i.e. sequences of identical sized records) that are
-addressed by an index.
-
-Such tables are stored in the SquashFS format as metadata blocks, i.e. by
-dividing the table data into 8KiB chunks that are separately compressed and
-stored in sequence.
-
-To allow constant time lookup, a list of 64 bit unsigned integers is stored,
-holding the on-disk locations of each metadata block.
-
-This list itself is stored uncompressed and not preceded by a header.
-
-When referring to a lookup table, the superblock gives the number of table
-entries and points to this location list.
-
-Since the table entry size is a known, fixed value, the required number of
-metadata blocks can be computed:
-
- block_count = ceil(table_count * entry_size / 8192)
-
-Which is also the number of 64 bit integers in the location list.
-
-
-When resolving a lookup table index, first work out the index of the
-metadata block:
-
- meta_index = floor(index * entry_size / 8192)
-
-Using this index on the location list yields the on-disk location of
-the metadata block containing the entry.
-
-After reading this metadata block, the byte offset into the block can
-be computed to get the entry:
-
- offset = index * entry_size % 8192
-
-
-The location list can be cached in memory. Resolving an index requires at
-worst a single metadata block read (at most 8194 bytes fetched from an
-unaligned on-disk location).
-
-
-2.4) Supported Compressors
-
-The SquashFS format supports the following compressors:
-
- - zlib deflate (referred to as "gzip" but only uses raw zlib streams)
- - lzo
- - lzma 1 (considered deprecated)
- - lzma 2 (referred to as "xz")
- - lz4
- - zstd
-
-The archive can only specify one compressor in the super block and has to use
-it for both file data and metadata compression. Using one compressor for data
-and switching to a different compressor for e.g. inodes is not supported.
-
-While it is technically not possible to pick a "null" compressor in the super
-block, an implementation can still deliberately write only uncompressed blocks
-to a SquashFS archive, or choose to store certain metadata blocks without
-compression.
-
-The lzma 2 aka xz compressor MUST use CRC32 checksums only. Using SHA-256 is
-not supported.
-
-
-3) The superblock
-*****************
-
-The superblock is the first section of a SquashFS archive. It is always
-96 bytes in size and contains important information about the archive,
-including the locations of other sections.
-
-+======+===============+=====================================================+
-| Type | Name | Description |
-+======+===============+=====================================================+
-| u32 | magic | Must be set to 0x73717368 ("hsqs" on disk). |
-+------+---------------+-----------------------------------------------------+
-| u32 | inode count | The number of inodes stored in the archive. |
-+------+---------------+-----------------------------------------------------+
-| u32 | mod time | Last modification time of the archive. Count seconds|
-| | | since 00:00, Jan 1st 1970 UTC (not counting leap |
-| | | seconds). This is unsigned, so it expires in the |
-| | | year 2106 (as opposed to 2038). |
-+------+---------------+-----------------------------------------------------+
-| u32 | block size | The size of a data block in bytes. Must be a power |
-| | | of two between 4096 (4k) and 1048576 (1 MiB) |
-+------+---------------+-----------------------------------------------------+
-| u32 | frag count | The number of entries in the fragment table |
-+------+---------------+-----------------------------------------------------+
-| u16 | compressor | An ID designating the compressor used for both data |
-| | | and meta data blocks. |
-| | | |
-| | +-------+------+--------------------------------------+
-| | | Value | Name | Comment |
-| | +-------+------+--------------------------------------+
-| | | 1 | GZIP | just zlib streams (no gzip headers!) |
-| | | 2 | LZO | |
-| | | 3 | LZMA | LZMA version 1 |
-| | | 4 | XZ | LZMA version 2 as used by xz-utils |
-| | | 5 | LZ4 | |
-| | | 6 | ZSTD | |
-+------+---------------+-------+------+--------------------------------------+
-| u16 | block log | The log2 of the block size. If the two fields do not|
-| | | agree, the archive is considered corrupted. |
-+------+---------------+-----------------------------------------------------+
-| u16 | flags | Bit wise OR of the flag bits below. |
-| | | |
-| | +--------+--------------------------------------------+
-| | | Value | Meaing |
-| | +--------+--------------------------------------------+
-| | | 0x0001 | Inodes are stored uncompressed. |
-| | | 0x0002 | Data blocks are stored uncompressed. |
-| | | 0x0008 | Fragments are stored uncompressed. |
-| | | 0x0010 | Fragments are not used. |
-| | | 0x0020 | Fragments are always generated. |
-| | | 0x0040 | Data has been deduplicated. |
-| | | 0x0080 | NFS export table exists. |
-| | | 0x0100 | Xattrs are stored uncompressed. |
-| | | 0x0200 | There are no Xattrs in the archive. |
-| | | 0x0400 | Compressor options are present. |
-| | | 0x0800 | The ID table is uncompressed. |
-+------+---------------+--------+--------------------------------------------+
-| u16 | id count | The number of entries in the ID lookup table. |
-+------+---------------+-----------------------------------------------------+
-| u16 | version major | Major version of the format. Must be set to 4. |
-+------+---------------+-----------------------------------------------------+
-| u16 | version minor | Minor version of the format. Must be set to 0. |
-+------+---------------+-----------------------------------------------------+
-| u64 | root inode | A reference to the inode of the root directory. |
-+------+---------------+-----------------------------------------------------+
-| u64 | bytes used | The number of bytes used by the archive. Because |
-| | | SquashFS archives must be padded to a multiple of |
-| | | the underlying device block size, this can be less |
-| | | than the actual file size. |
-+------+---------------+-----------------------------------------------------+
-| u64 | ID table | The byte offset at which the id table starts. |
-+------+---------------+-----------------------------------------------------+
-| u64 | Xattr table | The byte offset at which the xattr id table starts. |
-+------+---------------+-----------------------------------------------------+
-| u64 | Inode table | The byte offset at which the inode table starts. |
-+------+---------------+-----------------------------------------------------+
-| u64 | Dir. table | The byte offset at which the directory table starts.|
-+------+---------------+-----------------------------------------------------+
-| u64 | Frag table | The byte offset at which the fragment table starts. |
-+------+---------------+-----------------------------------------------------+
-| u64 | Export table | The byte offset at which the export table starts. |
-+------+---------------+-----------------------------------------------------+
-
-The Xattr table, fragment table and export table are optional. If they are
-omitted from the archive, the respective fields indicating their position
-must be set to 0xFFFFFFFFFFFFFFFF (i.e. all bits set).
-
-Most of the flags only serve an informational purpose and are only useful
-when editing the archive to convey the original packer settings.
-
-The only flag that actually carries information is the "Compressor options are
-present" flag. In fact, this is the only flag that the Linux kernel
-implementation actually tests for.
-
-The compressor options, however, are also only there for informal purpose, as
-most compression libraries understand their own stream format irregardless of
-the options used to compress and in fact don't provide any options for the
-decompressor. In the Linux kernel, the XZ decompressor is currently the only
-one that processes those options to pre-allocate the LZMA dictionary if a
-non-default size was used.
-
-
-3.1) Compression Options
-
-If the compressor options flag is set in the superblock, the superblock is
-immediately followed by a single metadata block, which is always uncompressed.
-
-The data stored in this block is compressor dependent.
-
-There are two special cases:
- - For LZ4, the compressor options always have to be present.
- - The LZMA compressor does not support compressor options, so this section
- must never be present.
-
-For the compressors currently implemented, a 4 to 8 byte payload follows.
-
-The following sub sections outline the contents for each compressor that
-supports options. The default values if the options are missing are outlined
-as well.
-
-
-3.1.1) GZIP
-
-+======+===================+=================================================+
-| Type | Name | Description |
-+======+===================+=================================================+
-| u32 | compression level | In the range 1 to 9 (inclusive). Defaults to 9. |
-+------+-------------------+-------------------------------------------------+
-| u16 | window size | In the range 8 to 15 (inclusive) Defaults to 15.|
-+------+-------------------+-------------------------------------------------+
-| u16 | strategies | A bit field describing the enabled strategies. |
-| | | If no flags are set, the default strategy is |
-| | | implicitly used. Please consult the ZLIB manual |
-| | | for details on specific strategies. |
-| | | |
-| | +--------+----------------------------------------+
-| | | Value | Comment |
-| | +--------+----------------------------------------+
-| | | 0x0001 | Default strategy. |
-| | | 0x0002 | Filtered. |
-| | | 0x0004 | Huffman Only. |
-| | | 0x0008 | Run Length Encoded. |
-| | | 0x0010 | Fixed. |
-+------+-------------------+--------+----------------------------------------+
-
-Note: The SquashFS writer typically tries all selected strategies (including
-not setting any and letting zlib work with defaults) and stores the result
-with the smallest size.
-
-
-3.1.2) XZ
-
-+======+===================+=================================================+
-| Type | Name | Description |
-+======+===================+=================================================+
-| u32 | dictionary size | SHOULD be >= 8KiB, and must be either a power of|
-| | | 2, or the sum of two consecutive powers of 2. |
-+------+-------------------+-------------------------------------------------+
-| u32 | Filters | A bit field describing the additional enabled |
-| | | filters attempted to better compress executable |
-| | | code. |
-| | | |
-| | +--------+----------------------------------------+
-| | | Value | Comment |
-| | +--------+----------------------------------------+
-| | | 0x0001 | x86 |
-| | | 0x0002 | PowerPC |
-| | | 0x0004 | IA64 |
-| | | 0x0008 | ARM |
-| | | 0x0010 | ARM thumb |
-| | | 0x0020 | SPARC |
-+------+-------------------+--------+----------------------------------------+
-
-Note: A SquashFS writer typically tries all selected VLI filters (including
-not setting any and letting libxz work with defaults) and stores the resulting
-block that has the smallest size.
-
-Also note that further options, such as XZ presets, are not included. The
-compressor typically uses the libxz defaults, i.e. level 6 and not using the
-extreme flag. Likewise for lc, lp and pb (defaults are 3, 0 and 2
-respectively).
-
-If the encoder chooses to change those values, the decoder will still be
-able to read the data, but there is currently no way to convey that those
-values were changed.
-
-This is specifically problematic for the compression level, since increasing
-the level can result in drastically increasing the decoders memory consumption.
-
-
-3.1.3) LZ4
-
-+======+===================+=================================================+
-| Type | Name | Description |
-+======+===================+=================================================+
-| u32 | Version | MUST be set to 1. |
-+------+-------------------+-------------------------------------------------+
-| u32 | Flags | A bit field describing the enabled LZ4 flags. |
-| | | There is currently only one possible flag: |
-| | | |
-| | +--------+----------------------------------------+
-| | | Value | Comment |
-| | +--------+----------------------------------------+
-| | | 0x0001 | Use LZ4 High Compression(HC) mode. |
-+------+-------------------+--------+----------------------------------------+
-
-3.1.4) ZSTD
-
-+======+===================+=================================================+
-| Type | Name | Description |
-+======+===================+=================================================+
-| u32 | compression level | Should be in range 1 to 22 (inclusive). The real|
-| | | maximum is the zstd defined ZSTD_maxCLevel(). |
-| | | |
-| | | The default value is 15. |
-+------+-------------------+-------------------------------------------------+
-
-3.1.5) LZO
-
-+======+===================+=================================================+
-| Type | Name | Description |
-+======+===================+=================================================+
-| u32 | algorithm | Which variant of LZO to use. |
-| | | |
-| | +--------+----------------------------------------+
-| | | Value | Comment |
-| | +--------+----------------------------------------+
-| | | 0 | lzo1x_1 |
-| | | 1 | lzo1x_1_11 |
-| | | 2 | lzo1x_1_12 |
-| | | 3 | lzo1x_1_15 |
-| | | 4 | lzo1x_999 (default) |
-+------+-------------------+--------+----------------------------------------+
-| u32 | compression level | For lzo1x_999, this can be a value between 0 |
-| | | and 9 inclusive (defaults to 8). MUST be 0 |
-| | | for all other algorithms. |
-+------+-------------------+-------------------------------------------------+
-
-
-
-4) Data and Fragment Blocks
-***************************
-
-As outlined in 2.1, file data is packed by dividing the input files into fixed
-size chunks (the block size from the super block) that are stored in sequence.
-
-The picture below tries to illustrate this concept:
-
-
- _____ _____ _____ _ _____ _____ _ _
- File A: |__A__|__A__|__A__|A| File B: |__B__|__B__|B| File C: |C|
- | | | | | | | |
- | +---+ | | | | | |
- | | +------+ | | | | |
- | | | | | | | |
- | | | +------|---------------+ | | |
- | | | | +--|---------------------+ | |
- | | | | | | | |
- | | | | | +-----------------------+ | +------------+
- | | | | | | | |
- V V V V V V V V
- __ _ ___ ___ ___ __ Fragment block: |A|B|C|
- Output: |_A|A|_A_|_B_|_B_|_F| |
- __V__
- A |__F__|
- | |
- +------------------------+
-
-Figure 2.1: Packing of File Data.
-
-
-In Figure 1, file A consists of 3 blocks and a single tail end, file B has
-2 blocks and one tail end while file C is smaller than block size.
-
-For each file, the blocks are individually compressed and stored on disk
-in order.
-
-The tail ends of A and B, together with the entire contents of C are packed
-together into a fragment block F, that is compressed and stored on disk once
-it is full.
-
-This tail-end-packing is completely optional. The tail ends (or in case of C
-the entire file) can also be treated as truncated blocks that expand to less
-than block size when uncompressed.
-
-
-There are no headers in front of data or fragment blocks and there MUST NOT be
-any gaps between data blocks from a single file, but a SquashFS packer is free
-to leave gaps between two different files or fragment blocks. The packer is
-also free to decide how to arrange fragments within a fragment block and what
-fragments to pack together.
-
-
-
-To locate file data, the file inodes store the following information:
-
- - The uncompressed size of the file. From this, the number of blocks can
- be computed:
-
- block_count = floor(file_size / block_size) if tail end packing is used
- block_count = ceil(file_size / block_size) otherwise
-
- - The exact location of the first block, if one exists.
- - For each consecutive block, the on-disk size.
-
- A 32 bit integer is used with bit 24 (i.e. 1 << 24) set if the block
- is stored uncompressed.
-
- - If tail-end-packing was done, the location of the fragment block and a
- byte offset into the uncompressed fragment block. The size of the tail
- end can be computed easily:
-
- tail_end_size = file_size % block_size
-
-
-Since a fragment block will likely be referred to by multiple files, inodes
-don't store its on-disk location and size directly, but instead use a 32 bit
-index into a fragment block lookup table (see section 7).
-
-
-If a data block other than the last one unpacks to less than block size, the
-rest of the buffer is filled with 0 bytes. This way, sparse files are
-implemented. Specifically if a block has an on-disk size of 0 this translates
-to an entire block filled with 0 bytes without having to retrieve any data
-from disk.
-
-
-The on-disk locations of file blocks MAY overlap and different file inodes are
-free to refer to the same fragment. Typical SquashFS packers would explicitly
-use this to for files that are duplicates of others. Doing so is NOT counted
-as a hard link.
-
-If an inode references on-disk locations outside the data area, the result is
-undefined.
-
-
-5) Inode Table
-**************
-
-Inodes are packed into metadata blocks and are not aligned, i.e. they can span
-the boundary between metadata blocks. To save space, there are different
-inodes for each type (regular file, directory, device, etc.) of varying
-contents and size.
-
-To further save more space, inodes come in two flavors: simple inode types
-optimized for a simple, standard use case, and extended inode types where
-extra information has to be stored.
-
-SquashFS more or less supports 32 bit UIDs and GIDs. As an optimization, those
-IDs are stored in a lookup table (see section 9) and the inodes themselves
-hold a 16 bit index into this table. This allows to 32 bit UIDs/GIDs, but only
-among 2^16 unique values.
-
-
-The location of the first metadata block is indicated by the inode table start
-in the superblock. The inode table ends at the start of the directory table.
-
-
-5.1) Common Inode Header
-
-All Inodes share a common header, which contains some common information,
-as well as describing the type of Inode which follows. This header has the
-following structure:
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u16 | type | The type of item described by the inode which follows|
-| | | this header. |
-| | | |
-| | +-------+----------------------------------------------+
-| | | Value | Comment |
-| | +-------+----------------------------------------------+
-| | | 1 | Basic Directory |
-| | | 2 | Basic File |
-| | | 3 | Basic Symlink |
-| | | 4 | Basic Block Device |
-| | | 5 | Basic Character Device |
-| | | 6 | Basic Named Pipe (FIFO) |
-| | | 7 | Basic Socket |
-| | | 8 | Extended Directory |
-| | | 9 | Extended File |
-| | | 10 | Extended Symlink |
-| | | 11 | Extended Block Device |
-| | | 12 | Extended Character Device |
-| | | 13 | Extended Named Pipe (FIFO) |
-| | | 14 | Extended Socket |
-+------+--------------+-------+----------------------------------------------+
-| u16 | permissions | A bit mask representing Unix file system permissions |
-| | | for the inode. This only stores permissions, not the |
-| | | type. The type is reconstructed from the field above.|
-+------+--------------+------------------------------------------------------+
-| u16 | uid | An index into the ID table, giving the user ID |
-| | | of the owner. |
-+------+--------------+------------------------------------------------------+
-| u16 | gid | An index into the ID table, giving the group ID |
-| | | of the owner. |
-+------+--------------+------------------------------------------------------+
-| u32 | mtime | The unsigned number of seconds (not counting leap |
-| | | seconds) since 00:00, Jan 1st, 1970 UTC when the item|
-| | | described by the inode was last modified. |
-+------+--------------+------------------------------------------------------+
-| u32 | inode number | Unique node number. Must be at least 1 and at most |
-| | | the inode count from the super block. |
-+------+--------------+------------------------------------------------------+
-
-Depending on the type, additional data follows, outlined in sections 5.2
-to 5.6.
-
-
-
-5.2) Directory inodes
-
-Directory inodes mainly contain a reference into the directory table where
-the listing of entries is stored.
-
-A basic directory has an entry listing of at most 64k (uncompressed) and
-no extended attributes. The layout of the inode data is as follows:
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u32 | block index | The location of the metadata block in the directory |
-| | | table where the entry information starts. This is |
-| | | relative to the directory table location. |
-+------+--------------+------------------------------------------------------+
-| u32 | link count | The number of hard links to this directory. |
-+------+--------------+------------------------------------------------------+
-| u16 | file size | Total (uncompressed) size in bytes of the entry |
-| | | listing in the directory table, including headers. |
-| | | |
-| | | This value is 3 bytes larger than the real listing. |
-| | | The Linux kernel creates "." and ".." entries for |
-| | | offsets 0 and 1, and only after 3 looks into the |
-| | | listing, subtracting 3 from the size. |
-+------+--------------+------------------------------------------------------+
-| u16 | block offset | The (uncompressed) offset within the metadata block |
-| | | in the directory table where the directory listing |
-| | | starts. |
-+------+--------------+------------------------------------------------------+
-| u32 | parent inode | The inode number of the parent of this directory. If |
-| | | this is the root directory, this SHOULD be 0. |
-+------+--------------+------------------------------------------------------+
-
-
-Note that for historical reasons, the hard link count of a directory includes
-the number of entries in the directory and is initialized to 2 for an empty
-directory. I.e. a directory with N entries has at least N + 2 link count.
-
-
-If the "file size" is set to a value < 4, the directory is empty and there is
-no corresponding listing in the directory table.
-
-
-An extended directory can have a listing that is at most 4GiB in size, may
-have extended attributes and can have an optional index for faster lookup:
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u32 | link count | Same as above. |
-+------+--------------+------------------------------------------------------+
-| u32 | file size | Same as above. |
-+------+--------------+------------------------------------------------------+
-| u32 | block index | Same as above. |
-+------+--------------+------------------------------------------------------+
-| u32 | parent inode | Same as above. |
-+------+--------------+------------------------------------------------------+
-| u16 | index count | The number of directory index entries following the |
-| | | inode structure. |
-+------+--------------+------------------------------------------------------+
-| u16 | block offset | Same as above. |
-+------+--------------+------------------------------------------------------+
-| u32 | xattr index | An index into the xattr lookup table or 0xFFFFFFFF |
-| | | if the inode has no extended attributes. |
-+------+--------------+------------------------------------------------------+
-
-The index follows directly after the inode. See section 6.1 for details on
-how the directory index is structured.
-
-
-5.3) File Inodes
-
-Basic files can be at most 4 GiB in size (uncompressed), must be located
-within the first 4 GiB of the SquashFS image, cannot have any extended
-attributes and don't support hard-link or sparse file accounting:
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u32 | blocks start | The offset from the start of the archive to the first|
-| | | data block. |
-+------+--------------+------------------------------------------------------+
-| u32 | frag index | An index into the fragment table which describes the |
-| | | fragment block that the tail end of this file is |
-| | | stored in. If not used, this is set to 0xFFFFFFFF. |
-+------+--------------+------------------------------------------------------+
-| u32 | block offset | The (uncompressed) offset within the fragment block |
-| | | where the tail end of this file is. See section 4 |
-| | | for details. |
-+------+--------------+------------------------------------------------------+
-| u32 | file size | The (uncompressed) size of this file. |
-+------+--------------+------------------------------------------------------+
-| u32[]| block sizes | An array of consecutive block sizes. See section 4 |
-| | | for details. |
-+------+--------------+------------------------------------------------------+
-
-If 'frag index' is set to 0xFFFFFFFF, the number of blocks is computed as
-
- ceil(file_size / block_size)
-
-otherwise, if 'frag index' is a valid fragment index, the block count is
-computed as
-
- floor(file_size / block_size)
-
-and the size of the tail end is
-
- file_size % block_size
-
-
-To access a data block, first compute the block index as
-
- index = floor(offset / block_size)
-
-then compute the on-disk location of the block by summing up the sizes of the
-blocks that come before it:
-
- location = block_start
-
- for i = 0; i < index; i++
- location += block_sizes[i] & 0x00FFFFFF
-
-
-The tail end, if present, is accessed by resolving the fragment index through
-the fragment lookup table (see section 7), loading the fragment block and
-using the given 'block offset' into the fragment block.
-
-
-
-Extended files have a 64 bit location and size, have additional counters for
-sparse file accounting and hard links, and can have extended attributes:
-
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u64 | blocks start | Same as above (but larger). |
-+------+--------------+------------------------------------------------------+
-| u64 | file size | Same as above (but larger). |
-+------+--------------+------------------------------------------------------+
-| u64 | sparse | The number of bytes saved by omitting zero bytes. |
-| | | Used in the kernel for sparse file accounting. |
-+------+--------------+------------------------------------------------------+
-| u32 | link count | The number of hard links to this node. |
-+------+--------------+------------------------------------------------------+
-| u32 | frag index | Same as above. |
-+------+--------------+------------------------------------------------------+
-| u32 | block offset | Same as above. |
-+------+--------------+------------------------------------------------------+
-| u32 | xattr index | An index into the xattr lookup table or 0xFFFFFFFF |
-| | | if the inode has no extended attributes. |
-+------+--------------+------------------------------------------------------+
-| u32[]| block sizes | Same as above. |
-+------+--------------+------------------------------------------------------+
-
-
-5.4) Symbolic Links
-
-Symbolic links mainly have a target path stored directly after the inode
-header, as well as a hard-link counter (yes, you can have hard links to
-symlinks):
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u32 | link count | The number of hard links to this symlink. |
-+------+--------------+------------------------------------------------------+
-| u32 | target size | The size in bytes of the target path this symlink |
-| | | points to. |
-+------+--------------+------------------------------------------------------+
-| u8[] | target path | An array of bytes holding the target path this |
-| | | symlink points to. The path is 'target size' bytes |
-| | | long and NOT null-terminated. |
-+------+--------------+------------------------------------------------------+
-
-The extended symlink type adds an additional extended attribute index:
-
-+======+==============+=======================================+
-| Type | Name | Description |
-+======+==============+=======================================+
-| u32 | link count | Same as above. |
-+------+--------------+---------------------------------------+
-| u32 | target size | Same as above. |
-+------+--------------+---------------------------------------+
-| u8[] | target path | Same as above. |
-+------+--------------+---------------------------------------+
-| u32 | xattr index | An index into the xattr lookup table. |
-+------+--------------+---------------------------------------+
-
-
-5.5) Device Special Files
-
-Basic device special files only store a hard-link counter and a device number.
-The layout is identical for both character and block devices:
-
-+======+===============+=====================================================+
-| Type | Name | Description |
-+======+===============+=====================================================+
-| u32 | link count | The number of hard links to this entry. |
-+------+---------------+-----------------------------------------------------+
-| u32 | device number | The system specific device number. |
-| | | |
-| | | On Linux, this consists of major and minor device |
-| | | numbers that can be extracted as follows: |
-| | | major = (dev & 0xFFF00) >> 8. |
-| | | minor = (dev & 0x000FF) | ((dev >> 12) & 0xFFF00) |
-+------+---------------+-----------------------------------------------------+
-
-The extended device file inode adds an additional extended attribute index:
-
-+======+===============+=========================================+
-| Type | Name | Description |
-+======+===============+=========================================+
-| u32 | link count | Same as above. |
-+------+---------------+-----------------------------------------+
-| u32 | device number | Same as above. |
-+------+---------------+-----------------------------------------+
-| u32 | xattr index | An index into the xattr lookup table. |
-+------+---------------+-----------------------------------------+
-
-
-5.6) IPC inodes (FIFO or Socket)
-
-Named pipe (FIFO) and socket special files only add a hard-link counter
-after the inode header:
-
-+======+=============+=========================================+
-| Type | Name | Description |
-+======+=============+=========================================+
-| u32 | link count | The number of hard links to this entry. |
-+------+-------------+-----------------------------------------+
-
-The extended versions add an additional extended attribute index:
-
-+======+=============+=========================================+
-| Type | Name | Description |
-+======+=============+=========================================+
-| u32 | link count | Same as above. |
-+------+-------------+-----------------------------------------+
-| u32 | xattr index | An index into the xattr lookup table. |
-+------+-------------+-----------------------------------------+
-
-
-
-6) Directory Table
-******************
-
-For each directory inode, the directory table stores a linear list of all
-entries, with references back to the inodes that describe those entries.
-
-The entry list itself is sorted ASCIIbetically by entry name and split into
-multiple runs, each preceded by a short header.
-
-The directory inodes store the total, uncompressed size of the entire listing,
-including headers. Using this size, a SquashFS reader can determine if another
-header with further entries should be following once it reaches the end of a
-run.
-
-To save space, the header indicates a metadata block and a reference inode
-number. The entries that follow simply store a difference to that inode number
-and an offset into the specified metadata block.
-
-Every time, the inode block changes or the difference of the inode number
-to the reference in the header cannot be encoded in 16 bits anymore, a new
-header is emitted.
-
-A header must be followed by AT MOST 256 entries. If there are more entries,
-a new header MUST be emitted.
-
-Typically, inode allocation strategies would sort the children of a directory
-and then allocate inode numbers incrementally, to optimize directory entry
-listings.
-
-Since hard links might be further further away than +/- 32k of the reference
-number, they might require a new header to be emitted. Inode number allocation
-and picking of the reference could of course be optimized to prevent this.
-
-The directory header has the following structure:
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u32 | count | Number of entries following the header |
-+------+--------------+------------------------------------------------------+
-| u32 | start | The location of the metadata block in the inode table|
-| | | where the inodes are stored. This is relative to the |
-| | | inode table start from the super block. |
-+------+--------------+------------------------------------------------------+
-| s32 | inode number | An arbitrary inode number. The entries that follow |
-| | | store their inode number as a difference to this. |
-+======+==============+======================================================+
-
-The counter is stored off-by-one, i.e. a value of 0 indicates 1 entry follows.
-This also makes it impossible to encode a size of 0, which wouldn't make any
-sense. Empty directories simply have their size set to 0 in the inode instead,
-so no extra dummy header has to be stored or looked up.
-
-
-The header is followed by multiple entries that each have this structure:
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u16 | offset | An offset into the uncompressed inode metadata block.|
-+------+--------------+------------------------------------------------------+
-| s16 | inode offset | The difference of this inodes number to the reference|
-| | | stored in the header. |
-+------+--------------+------------------------------------------------------+
-| u16 | type | The inode type. For extended inodes, the basic type |
-| | | is stored here instead. |
-+------+--------------+------------------------------------------------------+
-| u16 | name size | One less than the size of the entry name. |
-+------+--------------+------------------------------------------------------+
-| u8[] | name | The file name of the entry without a trailing null |
-| | | byte. Has 'name size' + 1 bytes. |
-+------+--------------+------------------------------------------------------+
-
-In the entry structure itself, the file names are stored without trailing null
-bytes. Since a zero length name makes no sense, the name length is stored
-off-by-one, i.e. the value 0 cannot be encoded.
-
-The inode type is stored in the entry, but always as the corresponding
-basic type.
-
-While the field is technically 16 bits, the kernel implementation currently
-imposes an arbitrary limit of 255 on the name size field. Since the field is
-off-by-one, this means that a file name in SquashFS can be at most 256
-characters long.
-
-
-6.1) Directory Index
-
-To speed up lookups on directories with lots of entries, the extended
-directory inode can store an index, holding the locations of all directory
-headers and the name of the first entry after the header.
-
-When searching for an entry, the reader can then iterate over the index to
-find a range of metadata blocks that should contain a given entry and then
-only scan over the given range.
-
-To allow for even faster lookups, a new header should be emitted every time
-the entry list crosses a metadata block boundary. This narrows the boundary
-down to a single metadata block lookup in most cases.
-
-The index entries have the following structure:
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u32 | index | This stores a byte offset from the first directory |
-| | | header to the current header, as if the uncompressed |
-| | | directory metadata blocks were laid out in memory |
-| | | consecutively. |
-+------+--------------+------------------------------------------------------+
-| u32 | start | Start offset of a directory table metadata block, |
-| | | relative to the directory table start. |
-+------+--------------+------------------------------------------------------+
-| u32 | name size | One less than the size of the entry name. |
-+------+--------------+------------------------------------------------------+
-| u8[] | name | The name of the first entry following the header |
-| | | without a trailing null byte. |
-+------+--------------+------------------------------------------------------+
-
-
-7) Fragment Table
-*****************
-
-Tail-ends and smaller than block size files can be combined into fragment
-blocks that are at most 'block size' bytes long.
-
-The fragment table describes the location and size of the fragment blocks
-(not the tail-ends within them).
-
-
-This is a lookup table which stores entries of the following shape:
-
-+======+==============+======================================================+
-| Type | Name | Description |
-+======+==============+======================================================+
-| u64 | start | The offset within the archive where the fragment |
-| | | block starts. |
-+------+--------------+------------------------------------------------------+
-| u32 | size | The on-disk size of the fragment block. If the block |
-| | | is uncompressed, bit 24 (i.e. 1 << 24) is set. |
-+------+--------------+------------------------------------------------------+
-| u32 | unused | SHOULD be set to 0. |
-+------+--------------+------------------------------------------------------+
-
-
-The table is stored on-disk as described in section 2.3.
-
-The fragment table location in the superblock points to an array of 64 bit
-integers that store the on-disk locations of the metadata blocks containing
-the lookup table.
-
-Each metadata block can store up to 512 entries (= 8129 / 16).
-
-The "unused" field is there for alignment and SHOULD be set to 0, however the
-Linux kernel currently ignores this field completely, making it impossible for
-Linux to ever re-purpose this field.
-
-
-8) Export Table
-***************
-
-To support NFS exports, SquashFS needs a fast way to resolve an inode number
-to an inode structure.
-
-For this purpose, a SquashFS archive can optionally contain an export table,
-which is basically a flat array of 64 bit inode references, with the inode
-number being used as an index into the array.
-
-Because the inode number 0 is not used (reserved as a sentinel value), the
-array actually starts at inode number 1 and the index is thus
-inode_number - 1.
-
-The array itself is stored in a series of metadata blocks, as outlined in
-section 2.3.
-
-Since each block can store 1024 references (= 8192 / 8), there will be
-ceil(inode_count / 1024) metadata blocks for the entire array.
-
-
-9) ID Table
-***********
-
-As outlined in section 5.1, SquashFS supports 32 bit user and group IDs. To
-compact the inode table, the unique UID/GID values are collected in a lookup
-table and a 16 bit table index is stored in the inode instead.
-
-This lookup table is stored as outlined in section 2.3.
-
-Each metadata block can store up to 2048 IDs (=8192 / 4).
-
-
-10) Extended Attribute Table
-****************************
-
-Extended attributes are arbitrary key value pairs attached to inodes. The key
-names use dots as separators to create a hierarchy of name spaces.
-
-The key value pairs of all inodes are stored consecutively in a series of
-metadata blocks.
-
-The values can either be stored inline, i.e. a key entry is directly followed
-by a value, or out-of-line to deduplicate identical values and use a reference
-instead. Typically, the first occurrence of a value is stored in line and
-every consecutive use of the same value uses an out-of-line reference back to
-the first one.
-
-
-The keys are stored using the following data structure:
-
-+======+===========+=========================================================+
-| Type | Name | Description |
-+======+===========+=========================================================+
-| u16 | type | A prefix ID for the key name. If the value that follows |
-| | | is stored out-of-line, the flag 0x0100 is ORed to the |
-| | | type ID. |
-| | | |
-| | +-------+-------------------------------------------------+
-| | | Value | Comment |
-| | +-------+-------------------------------------------------+
-| | | 0 | Prefix the name with "user." |
-| | | 1 | Prefix the name with "trusted." |
-| | | 2 | Prefix the name with "security." |
-+------+-----------+-------+-------------------------------------------------+
-| u16 | name size | The number of key bytes that follows. |
-+------+-----------+---------------------------------------------------------+
-| u8[] | name | The remainder of the key without the prefix and without |
-| | | trailing null byte. |
-+------+-----------+---------------------------------------------------------+
-
-
-Following the key, this structure is used to store the value:
-
-+======+============+========================================================+
-| Type | Name | Description |
-+======+============+========================================================+
-| u32 | value size | The size of the value string. If the value is stored |
-| | | out of line, this is always 8, i.e. the size of an |
-| | | unsigned 64 bit integer. |
-+------+------------+--------------------------------------------------------+
-| u8[] | value | This is 'value size' bytes of arbitrary binary data. |
-| | | If the value is stored out-of-line, this is a 64 bit |
-| | | reference, i.e. a location of a metadata block, |
-| | | shifted left by 16 and OR-ed with an offset into the |
-| | | uncompressed block, giving the location of another |
-| | | value structure. |
-+------+------------+--------------------------------------------------------+
-
-
-The metadata block location given by an out-of-line reference is relative to
-the location of the first block.
-
-
-To actually address a block of key value pairs associated with an inode, a
-lookup table is used that specifies the start and size of a sequence of key
-value pairs.
-
-All an inode needs to store is a 32 bit index into this table. If two inodes
-have an identical attribute sets, the key/value sequence is only written once,
-there is only one lookup table entry and both inodes have the same index.
-
-Each lookup table entry has the following structure:
-
-+======+============+========================================================+
-| Type | Name | Description |
-+======+============+========================================================+
-| u64 | xattr ref | A reference to the start of the key value block, i.e. |
-| | | the metadata block location shifted left by 16, OR-ed |
-| | | with am offset into the uncompressed block. |
-+------+------------+--------------------------------------------------------+
-| u32 | count | The number of key value pairs. |
-+------+------------+--------------------------------------------------------+
-| u32 | size | The exact, uncompressed size in bytes of the entire |
-| | | block of key value pairs, counting what has been |
-| | | written to disk and including the key/value entry |
-| | | structures. |
-+------+------------+--------------------------------------------------------+
-
-This lookup table is stored as outlined in section 2.3.
-
-Each metadata block can hold 512 (= 8192 / 16) entries.
-
-However, in contrast to section 2.3, additional data is given before the list
-of metdata block locations, to locate the key-value pairs, as well as the
-actual number of lookup table entries that are not specified in the super
-block.
-
-
-The 'Xattr table' entry in the superblock gives the absolute location of the
-following data structure which is stored on-disk as is, uncompressed:
-
-+=======+===========+========================================================+
-| Type | Name | Description |
-+=======+===========+========================================================+
-| u64 | kv start | The absolute position of the first metadata block |
-| | | holding the key/value pairs. |
-+-------+-----------+--------------------------------------------------------+
-| u32 | count | The number of entries in the lookup table. |
-+-------+-----------+--------------------------------------------------------+
-| u32 | unused | SHOULD be set to 0, however Linux currently ignores |
-| | | this field completely and squashfs-tools used to leak |
-| | | stack data here, making it impossible for Linux to |
-| | | ever re-purpose this field. |
-+-------+-----------+--------------------------------------------------------+
-| u64[] | locations | An array holding the absolute on-disk location of each |
-| | | metadata block of the lookup table. |
-+-------+-----------+--------------------------------------------------------+
-
-If an inode has a a valid xattr index (i.e. not 0xFFFFFFFF), the metadata
-block index is computed as
-
- block_idx = floor(index / 512)
-
-which is then used to retrieve the metadata block index from the locations
-array.
-
-Once the block has been read from disk and uncompressed, the byte offset into
-the metadata block can be computed as
-
- offset = (index * 16) % 8192
-
-From this position, the structure can be read that holds a reference to the
-metadata block that contains the key/value pairs (and byte offset into the
-uncompressed block where the pairs start), as well as the number of key/value
-pairs and their total, uncompressed size.
-