aboutsummaryrefslogtreecommitdiff

Squashfs-tools-ng Software Architecture

Generally speaking, the package tries to somewhat imitate a typical Unix filesystem structure.

The source code for an executable program is located in bin/<program-name>/, while the source code for a library is located in lib/<library-name>/, without the typical lib prefix.

Shared header files for the libraries are in include/. So far, a header sub-directory is only used for libsquashfs, since those headers are somewhat more numerous and are installed on the system in the same sub-directory.

If a binary program comes with a man page, the man page is located at the same location as the program source (i.e. bin/<program-name>/<program-name>.1).

Extra documentation (like this file) is located in the doc directory, and source code for example programs which are not installed is in extras.

Unit tests for the libraries are in tests/<library-name>, with a lib prefix and tests for programs are in tests/<program-name>.

Library Stacking

To achieve loose coupling, core functionality is implemented by libraries in a reasonably generic way, and may in-turn make use of other libraries to implement their functionality.

To the extent possible, the actual application programs are merely frontends for the underlying libraries.

The following diagram tries to illustrate how the libraries are stacked:

 _______________________________________
|                                       |
|         Application Programs          |
|_______________________________________|
            ____________________________
           |                            |
           |         libcommon          |
 __________|____________________________|
|          |              |             |
|  libtar  |  libfstree   |             |
|__________|_______       |   libsqfs   |
|                  |      |             |
|    libfstream    |      |             |
|__________________|______|_____________|
|                         |             |
|        libcompat        |   libutil   |
|_________________________|_____________|

At the bottom, libutil contains common helper functions and container data structures (dynamic array, hash table, rb-tree, et cetera) used by both libsqfs and the application programs.

The libcompat library contains fallback implementations for OS library functions that might not be available everywhere (e.g. POSIX stuff missing on Windows or GNU stuff missing on BSD).

The libfstream library implements stream based I/O abstraction, i.e. it has an abstract data structure for a non-seek-able read-only input streams and write-only output streams. It has concrete implementations wrapping the underlying OS functionality, as well as stream-compressor based implementations that wrap an existing interface instance.

On top of libfstream, the libtar library is implemented. It supports simple reading & decoding of tar headers from an input stream, as well as generating and writing tar headers to an output stream and supports various extensions (e.g. GNU, SCHILY). Thanks to the libfstream compressor wrappers, it supports transparent compression/decompression of tar headers and data.

The libfstree library contains functionality related to managing a and manipulating a hierarchical filesystem tree. It can directly parse the description format for gensquashfs or scan a directory.

The libsqfs (actually libsquashfs) library implements the bulk of the squashfs reading/writing functionality. It is built as a shared library and is installed on the target system along with the application programs and a bunch of public headers.

Finally, libcommon contains miscellaneous stuff shared between the application programs, such as common CLI handling code, some higher level utilities, higher level wrappers around libsqfs for some tool specific tasks.

Licensing Implications

The application programs and the static libraries are GPL licensed, while libsquashfs is licensed under the LGPL. Because the code of libutil is compiled into libsquashfs, it also needs to be under the LGPL and can only contain 3rd party code under a compatible license.

Furthermore, since the LZO compressor library is GPL licensed, libsquashfs cannot use it directly and thus does not support LZO compression. Instead, the libcommon library contains an implementation of the libsquashfs compressor interface that uses the LZO library, so the application programs do support LZO, but the library doesn't.

Managing Symbols and Visiblity

All symbols exported from libsquashfs must start with a sqfs_ prefix. Likewise, all data structures and typedefs in the public header use this prefix and macros use an SQFS_ prefix, in order to prevent namespace pollution.

The sqfs/predef.h header contains a macro called SQFS_API for marking exported symbols. Whether the symbols are imported or exported, depends on the presence of the SQFS_BUILDING_DLL macro.

To mark symbols as explicitly not exported (required on some platforms), the macro SQFS_INTERNAL is used (e.g. on all libutil functions to keep the internal).

An additional SQFS_INLINE macro is provided for inline functions declared in headers.

The public headers of libsquashfs also must not include any headers of the other libraries, as they are not installed on the target system.

However, somewhat contradictory to the diagtram, a number of the libraries outlined above need declarations, typedefs and macros from sqfs/predef.h and simply include thta header.

Object Oriented Design

Anybody who has done C programming to a reasonable degree should be familiar with the technique. An interface is basically a struct with function pointers where the first argument is a pointer to the instance (this pointer).

Single inheritance basically means making the base struct the first member of the extended struct. A pointer to the extended object can be down cast to a pointer to the base struct and used as such.

To the extent possible, concrete implementations are made completely opaque and only have a factory function to instantiate them, for a more loose coupling.

The sqfs/predef.h defines and typedefs a sqfs_object_t structure, which is at the bottom of the inheritance hierarchy.

It contains two function pointers destroy and copy. The former destroys and frees the object itself, the later creates an exact copy of the object. The copy callback may be NULL, if creating copies is not applicable for a particular type of object.

For convenience, two inline helpers sqfs_destroy and sqfs_copy are provided that cast a void pointer into an object pointer and call the respecive callbacks. The later also checks if the callback is NULL.

The libsquashfs malloc/free Issue

While most code in libsquashfs works with objects that have a destroy hook, some functions return pointers to data blobs or dumb structures that have been allocated with malloc and expect the caller to free them again.

This turned out to be a design issue, since the shared library could in theory end up being linked against a different C runtime than the application using it. On Unix like systems this would require a rather freakish circumstances, but on Windows this actually happens fairly easily.

As a result, a sqfs_free function was added to libsquashfs to expose access to the free function of the libraries run-time. All new code using libsquashfs should use that function, but to maintain backward compatibility with existing code, the library has to continue using regular malloc at those places, so programs that currently work with a simple free continue to work in the future.