diff options
author | David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 2021-07-15 13:58:36 +0200 |
---|---|---|
committer | David Oberhollenzer <david.oberhollenzer@sigma-star.at> | 2021-07-21 10:00:03 +0200 |
commit | ca476afbebda985e777ebf5009c980aee923b4da (patch) | |
tree | f64aba8f9b4186b7cc5c264465e19c21d7b80ffd /doc | |
parent | 270fac93b9bdc2da16126817d48ce109ef9073bb (diff) |
Add a separate architecture/structure writeup
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/architecture.md | 168 |
1 files changed, 168 insertions, 0 deletions
diff --git a/doc/architecture.md b/doc/architecture.md new file mode 100644 index 0000000..8acf641 --- /dev/null +++ b/doc/architecture.md @@ -0,0 +1,168 @@ +# Squashfs-tools-ng Software Architecture + +Generally speaking, the package tries to somewhat imitate a typical Unix +filesystem structure. + +The source code for an executable program is located in `bin/<program-name>/`, +while the source code for a library is located in `lib/<library-name>/`, +without the typical `lib` prefix. + +Shared header files for the libraries are in `include/`. So far, a header +sub-directory is only used for `libsquashfs`, since those headers are somewhat +more numerous and are installed on the system in the same sub-directory. + +If a binray program comes with a man page, the man page is located at the same +location as the program source (i.e. `bin/<program-name>/<program-name>.1`). + +Extra documentation (like this file) is located in the `doc` directory, and +source code for example programs which are not installed is in `extras`. + +Unit tests for the libraries are in `tests/<library-name>`, with a `lib` prefix +and tests for programs are in `tests/<program-name>`. + +## Library Stacking + +To achieve loose coupling, core functionality is implemented by libraries in a +reasonably generic way, and may in-turn make use of other libraries to implement +their functionality. + +To the extent possible, the actual application programs are merely frontends +for the underlying libraries. + +The following diagram tries to illustrate how the libraries are stacked: + + _______________________________________ + | | + | Application Programs | + |_______________________________________| + ____________________________ + | | + | libcommon | + __________|____________________________| + | | | | + | libtar | libfstree | | + |__________|_______ | libsqfs | + | | | | + | libfstream | | | + |__________________|______|_____________| + | | | + | libcompat | libutil | + |_________________________|_____________| + + +At the bottom, `libutil` contains common helper functions and container +data structures (dynamic array, hash table, rb-tree, et cetera) used by +both `libsqfs` and the application programs. + +The `libcompat` library contains fallback implementations for OS library +functions that might not be available everywhere (e.g. POSIX stuff missing +on Windows or GNU stuff missing on BSD). + +The `libfstream` library implements stream based I/O abstraction, i.e. it has +an abstract data structure for a non-seek-able read-only input streams and +write-only output streams. It has concrete implementations wrapping the +underlying OS functionality, as well as stream-compressor based implementations +that wrap an existing interface instance. + +On top of `libfstream`, the `libtar` library is implemented. It supports +simple reading & decoding of tar headers from an input stream, as well as +generating and writing tar headers to an output stream and supports various +extensions (e.g. GNU, SCHILY). Thanks to the `libfstream` compressor wrappers, +it supports transparent compression/decompression of tar headers and data. + +The `libfstree` library contains functionality related to managing a +and manipulating a hierarchical filesystem tree. It can directly parse the +description format for `gensquashfs` or scan a directory. + +The `libsqfs` (actually `libsquashfs`) library implements the bulk of the +squashfs reading/writing functionality. It is built as a shared library and +is installed on the target system along with the application programs and a +bunch of public headers. + +Finally, `libcommon` contains miscellaneous stuff shared between the +application programs, such as common CLI handling code, some higher level +utilities, higher level wrappers around `libsqfs` for some tool specific +tasks. + +### Licensing Implications + +The application programs and the static libraries are GPL licensed, +while `libsquashfs` is licensed under the LGPL. Because the code +of `libutil` is compiled into `libsquashfs`, it also needs to be under +the LGPL and only contain 3rd party code under a compatible license. + +Furthermore, since the LZO compressor library is GPL licensed, `libsquashfs` +cannot use it directly and thus does not support LZO compression. Instead, +the `libcommon` library contains an implementation of the `libsquashfs` +compressor interface that uses the LZO library, so the application +programs *do support* LZO, but the library doesn't. + + +### Managing Symbols and Visiblity + +All symbols exported from `libsquashfs` must start with a `sqfs_` prefix. +Likewise, all data structures and typedefs in the public header use this prefix +and macros use an `SQFS_` prefix, in order to prevent namespace pollution. + +The `sqfs/predef.h` header contains a macro called `SQFS_API` for marking +exported symbols. Whether the symbols are imported or exported, depends on +the presence of the `SQFS_BUILDING_DLL` macro. + +To mark symbols as explicitly not exported (required on some platforms), the +macro `SQFS_INTERNAL` is used (e.g. on all `libutil` functions to keep +the internal). + +An additional `SQFS_INLINE` macro is provided for inline functions declared +in headers. + +The public headers of `libsquashfs` also must not include any headers of the +other libraries, as they are not installed on the target system. + +However, somewhat contradictory to the diagtram, a number of the libraries +outlined above need declarations, typedefs and macros from `sqfs/predef.h` +and simply include thta header. + + +## Object Oriented Design + +Anybody who has done C programming to a reasonable degree should be familiar +with the technique. An interface is basically a `struct` with function pointers +where the first argument is a pointer to the instance (`this` pointer). + +Single inheritance basically means making the base struct the first member of +the extended struct. A pointer to the extended object can be down cast to a +pointer to the base struct and used as such. + +To the extent possible, concrete implementations are made completely opaque and +only have a factory function to instantiate them, for a more loose coupling. + +The `sqfs/predef.h` defines and typedefs a `sqfs_object_t` structure, which +is at the bottom of the inheritance hierarchy. + +It contains two function pointers `delete` and `copy`. The former destroys and +frees the object itself, the later creates an exact copy of the object. +The `copy` callbacks may be `NULL`, if creating copies is not applicable for a +particular type of object. + +For convenience, two inline helpers `sqfs_destroy` and `sqfs_copy` are provided +that cast a `void` pointer into an object pointer and call the respecive +callbacks. The later also checks if the callback is `NULL`. + + +## The libsquashfs malloc/free Issue + +While most code in `libsquashfs` works with objects that have a `destroy` hook, +some functions return pointers to data blobs or dumb structures that have been +allocated with `malloc` and expect the caller to free them again. + +This turned out to be a design issue, since the shared library could in theory +end up being linked against a different C runtime then an application using it. +On Unix like systems this would require a rather freakish circumstances, but +on Windows this actually happens fairly easily. + +As a result, a `sqfs_free` function was added to `libsquashfs` to expose access +to the `free` function of the libraries run-time. All new code +using `libsquashfs` should use that function, but to maintain backward +compatibility with existing code, the library has to continue using regular +malloc at those places, so programs that currently work with a simple `free` +also continue to work. |