From ca476afbebda985e777ebf5009c980aee923b4da Mon Sep 17 00:00:00 2001 From: David Oberhollenzer Date: Thu, 15 Jul 2021 13:58:36 +0200 Subject: Add a separate architecture/structure writeup Signed-off-by: David Oberhollenzer --- README.md | 44 +------------- doc/architecture.md | 168 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 170 insertions(+), 42 deletions(-) create mode 100644 doc/architecture.md diff --git a/README.md b/README.md index fcb69e2..c5066f2 100644 --- a/README.md +++ b/README.md @@ -111,48 +111,8 @@ required dependencies: ./mkwinbins.sh -## Structure of the Source Code - -The main functionality of the package is split up into a number of libraries. -The actual tools are mainly wrappers around the libraries that combine their -functionality in a useful way. - -The headers of all the libraries can be found in the `include` directory, -whereas the source code is in a per-library sub-directory within `lib`. The -tools themselves are in sub-directories within `bin`. - -The `include` directory has a sub-directory `sqfs` which contains the public -headers of `libsquashfs.so` which are installed along with the library. All -other headers are private to this package. - -The following components exist: - - `libfstree.a` built from files in `lib/fstree` contains functions for - manipulating a file system tree. - - `libtar.a` built from files in `lib/tar` contains data structures and - functions for parsing and creating tar files. - - `libsquashfs.so` built from files in `lib/sqfs` contains all kinds of - data structures for reading and writing SquashFS archives. Abstractions - for data compression and so on. It contains the actual brains of this - package. - - `libcommon.a` built from files in `lib/common` contains a bunch - of commonly used code shared across the utilities. - - `libcompat.a` built from files in `lib/compat` contains minimal - implementations of POSIX or GNU functions that are not available on some - platforms. - - `libutil.a` contains common utilities that are used internally by both the - programs and `libsquashfs.so`. - -Optionally, `libsquashfs` can be compiled with builtin, custom versions of zlib -and lz4. The configure options `--with-builtin-zlib` and `--with-builtin-lz4` -can be used. The respective library sources are also in the `lib` directory. - -The `tests` sub-directory contains unit tests for the libraries. - -The `extras` sub-directory contains a few demo programs that use `libsquashfs`. - -To allow 3rd party applications to use `libsquashfs.so` without restricting -their choice of license, the code in the `lib/sqfs` and `lib/util` directories -is licensed under the LGPLv3, in contrast to the rest of this package. +An high-level overview of the source code and +architecture [can be found here](doc/architecture.md). ## A Note on LZO Support diff --git a/doc/architecture.md b/doc/architecture.md new file mode 100644 index 0000000..8acf641 --- /dev/null +++ b/doc/architecture.md @@ -0,0 +1,168 @@ +# Squashfs-tools-ng Software Architecture + +Generally speaking, the package tries to somewhat imitate a typical Unix +filesystem structure. + +The source code for an executable program is located in `bin//`, +while the source code for a library is located in `lib//`, +without the typical `lib` prefix. + +Shared header files for the libraries are in `include/`. So far, a header +sub-directory is only used for `libsquashfs`, since those headers are somewhat +more numerous and are installed on the system in the same sub-directory. + +If a binray program comes with a man page, the man page is located at the same +location as the program source (i.e. `bin//.1`). + +Extra documentation (like this file) is located in the `doc` directory, and +source code for example programs which are not installed is in `extras`. + +Unit tests for the libraries are in `tests/`, with a `lib` prefix +and tests for programs are in `tests/`. + +## Library Stacking + +To achieve loose coupling, core functionality is implemented by libraries in a +reasonably generic way, and may in-turn make use of other libraries to implement +their functionality. + +To the extent possible, the actual application programs are merely frontends +for the underlying libraries. + +The following diagram tries to illustrate how the libraries are stacked: + + _______________________________________ + | | + | Application Programs | + |_______________________________________| + ____________________________ + | | + | libcommon | + __________|____________________________| + | | | | + | libtar | libfstree | | + |__________|_______ | libsqfs | + | | | | + | libfstream | | | + |__________________|______|_____________| + | | | + | libcompat | libutil | + |_________________________|_____________| + + +At the bottom, `libutil` contains common helper functions and container +data structures (dynamic array, hash table, rb-tree, et cetera) used by +both `libsqfs` and the application programs. + +The `libcompat` library contains fallback implementations for OS library +functions that might not be available everywhere (e.g. POSIX stuff missing +on Windows or GNU stuff missing on BSD). + +The `libfstream` library implements stream based I/O abstraction, i.e. it has +an abstract data structure for a non-seek-able read-only input streams and +write-only output streams. It has concrete implementations wrapping the +underlying OS functionality, as well as stream-compressor based implementations +that wrap an existing interface instance. + +On top of `libfstream`, the `libtar` library is implemented. It supports +simple reading & decoding of tar headers from an input stream, as well as +generating and writing tar headers to an output stream and supports various +extensions (e.g. GNU, SCHILY). Thanks to the `libfstream` compressor wrappers, +it supports transparent compression/decompression of tar headers and data. + +The `libfstree` library contains functionality related to managing a +and manipulating a hierarchical filesystem tree. It can directly parse the +description format for `gensquashfs` or scan a directory. + +The `libsqfs` (actually `libsquashfs`) library implements the bulk of the +squashfs reading/writing functionality. It is built as a shared library and +is installed on the target system along with the application programs and a +bunch of public headers. + +Finally, `libcommon` contains miscellaneous stuff shared between the +application programs, such as common CLI handling code, some higher level +utilities, higher level wrappers around `libsqfs` for some tool specific +tasks. + +### Licensing Implications + +The application programs and the static libraries are GPL licensed, +while `libsquashfs` is licensed under the LGPL. Because the code +of `libutil` is compiled into `libsquashfs`, it also needs to be under +the LGPL and only contain 3rd party code under a compatible license. + +Furthermore, since the LZO compressor library is GPL licensed, `libsquashfs` +cannot use it directly and thus does not support LZO compression. Instead, +the `libcommon` library contains an implementation of the `libsquashfs` +compressor interface that uses the LZO library, so the application +programs *do support* LZO, but the library doesn't. + + +### Managing Symbols and Visiblity + +All symbols exported from `libsquashfs` must start with a `sqfs_` prefix. +Likewise, all data structures and typedefs in the public header use this prefix +and macros use an `SQFS_` prefix, in order to prevent namespace pollution. + +The `sqfs/predef.h` header contains a macro called `SQFS_API` for marking +exported symbols. Whether the symbols are imported or exported, depends on +the presence of the `SQFS_BUILDING_DLL` macro. + +To mark symbols as explicitly not exported (required on some platforms), the +macro `SQFS_INTERNAL` is used (e.g. on all `libutil` functions to keep +the internal). + +An additional `SQFS_INLINE` macro is provided for inline functions declared +in headers. + +The public headers of `libsquashfs` also must not include any headers of the +other libraries, as they are not installed on the target system. + +However, somewhat contradictory to the diagtram, a number of the libraries +outlined above need declarations, typedefs and macros from `sqfs/predef.h` +and simply include thta header. + + +## Object Oriented Design + +Anybody who has done C programming to a reasonable degree should be familiar +with the technique. An interface is basically a `struct` with function pointers +where the first argument is a pointer to the instance (`this` pointer). + +Single inheritance basically means making the base struct the first member of +the extended struct. A pointer to the extended object can be down cast to a +pointer to the base struct and used as such. + +To the extent possible, concrete implementations are made completely opaque and +only have a factory function to instantiate them, for a more loose coupling. + +The `sqfs/predef.h` defines and typedefs a `sqfs_object_t` structure, which +is at the bottom of the inheritance hierarchy. + +It contains two function pointers `delete` and `copy`. The former destroys and +frees the object itself, the later creates an exact copy of the object. +The `copy` callbacks may be `NULL`, if creating copies is not applicable for a +particular type of object. + +For convenience, two inline helpers `sqfs_destroy` and `sqfs_copy` are provided +that cast a `void` pointer into an object pointer and call the respecive +callbacks. The later also checks if the callback is `NULL`. + + +## The libsquashfs malloc/free Issue + +While most code in `libsquashfs` works with objects that have a `destroy` hook, +some functions return pointers to data blobs or dumb structures that have been +allocated with `malloc` and expect the caller to free them again. + +This turned out to be a design issue, since the shared library could in theory +end up being linked against a different C runtime then an application using it. +On Unix like systems this would require a rather freakish circumstances, but +on Windows this actually happens fairly easily. + +As a result, a `sqfs_free` function was added to `libsquashfs` to expose access +to the `free` function of the libraries run-time. All new code +using `libsquashfs` should use that function, but to maintain backward +compatibility with existing code, the library has to continue using regular +malloc at those places, so programs that currently work with a simple `free` +also continue to work. -- cgit v1.2.3