summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorDavid Oberhollenzer <david.oberhollenzer@sigma-star.at>2021-07-15 13:58:36 +0200
committerDavid Oberhollenzer <david.oberhollenzer@sigma-star.at>2021-07-21 09:56:25 +0200
commit5333fbe46bcbf70b4888bcc6655681f2cd0f161b (patch)
tree33af95a299d8d7102537473825cdec62b7ff5b60 /doc
parentd2458bf40383d8e89772727c64ff83322bcb53d3 (diff)
Add a separate architecture/structure writeup
Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at>
Diffstat (limited to 'doc')
-rw-r--r--doc/architecture.md168
1 files changed, 168 insertions, 0 deletions
diff --git a/doc/architecture.md b/doc/architecture.md
new file mode 100644
index 0000000..8acf641
--- /dev/null
+++ b/doc/architecture.md
@@ -0,0 +1,168 @@
+# Squashfs-tools-ng Software Architecture
+
+Generally speaking, the package tries to somewhat imitate a typical Unix
+filesystem structure.
+
+The source code for an executable program is located in `bin/<program-name>/`,
+while the source code for a library is located in `lib/<library-name>/`,
+without the typical `lib` prefix.
+
+Shared header files for the libraries are in `include/`. So far, a header
+sub-directory is only used for `libsquashfs`, since those headers are somewhat
+more numerous and are installed on the system in the same sub-directory.
+
+If a binray program comes with a man page, the man page is located at the same
+location as the program source (i.e. `bin/<program-name>/<program-name>.1`).
+
+Extra documentation (like this file) is located in the `doc` directory, and
+source code for example programs which are not installed is in `extras`.
+
+Unit tests for the libraries are in `tests/<library-name>`, with a `lib` prefix
+and tests for programs are in `tests/<program-name>`.
+
+## Library Stacking
+
+To achieve loose coupling, core functionality is implemented by libraries in a
+reasonably generic way, and may in-turn make use of other libraries to implement
+their functionality.
+
+To the extent possible, the actual application programs are merely frontends
+for the underlying libraries.
+
+The following diagram tries to illustrate how the libraries are stacked:
+
+ _______________________________________
+ | |
+ | Application Programs |
+ |_______________________________________|
+ ____________________________
+ | |
+ | libcommon |
+ __________|____________________________|
+ | | | |
+ | libtar | libfstree | |
+ |__________|_______ | libsqfs |
+ | | | |
+ | libfstream | | |
+ |__________________|______|_____________|
+ | | |
+ | libcompat | libutil |
+ |_________________________|_____________|
+
+
+At the bottom, `libutil` contains common helper functions and container
+data structures (dynamic array, hash table, rb-tree, et cetera) used by
+both `libsqfs` and the application programs.
+
+The `libcompat` library contains fallback implementations for OS library
+functions that might not be available everywhere (e.g. POSIX stuff missing
+on Windows or GNU stuff missing on BSD).
+
+The `libfstream` library implements stream based I/O abstraction, i.e. it has
+an abstract data structure for a non-seek-able read-only input streams and
+write-only output streams. It has concrete implementations wrapping the
+underlying OS functionality, as well as stream-compressor based implementations
+that wrap an existing interface instance.
+
+On top of `libfstream`, the `libtar` library is implemented. It supports
+simple reading & decoding of tar headers from an input stream, as well as
+generating and writing tar headers to an output stream and supports various
+extensions (e.g. GNU, SCHILY). Thanks to the `libfstream` compressor wrappers,
+it supports transparent compression/decompression of tar headers and data.
+
+The `libfstree` library contains functionality related to managing a
+and manipulating a hierarchical filesystem tree. It can directly parse the
+description format for `gensquashfs` or scan a directory.
+
+The `libsqfs` (actually `libsquashfs`) library implements the bulk of the
+squashfs reading/writing functionality. It is built as a shared library and
+is installed on the target system along with the application programs and a
+bunch of public headers.
+
+Finally, `libcommon` contains miscellaneous stuff shared between the
+application programs, such as common CLI handling code, some higher level
+utilities, higher level wrappers around `libsqfs` for some tool specific
+tasks.
+
+### Licensing Implications
+
+The application programs and the static libraries are GPL licensed,
+while `libsquashfs` is licensed under the LGPL. Because the code
+of `libutil` is compiled into `libsquashfs`, it also needs to be under
+the LGPL and only contain 3rd party code under a compatible license.
+
+Furthermore, since the LZO compressor library is GPL licensed, `libsquashfs`
+cannot use it directly and thus does not support LZO compression. Instead,
+the `libcommon` library contains an implementation of the `libsquashfs`
+compressor interface that uses the LZO library, so the application
+programs *do support* LZO, but the library doesn't.
+
+
+### Managing Symbols and Visiblity
+
+All symbols exported from `libsquashfs` must start with a `sqfs_` prefix.
+Likewise, all data structures and typedefs in the public header use this prefix
+and macros use an `SQFS_` prefix, in order to prevent namespace pollution.
+
+The `sqfs/predef.h` header contains a macro called `SQFS_API` for marking
+exported symbols. Whether the symbols are imported or exported, depends on
+the presence of the `SQFS_BUILDING_DLL` macro.
+
+To mark symbols as explicitly not exported (required on some platforms), the
+macro `SQFS_INTERNAL` is used (e.g. on all `libutil` functions to keep
+the internal).
+
+An additional `SQFS_INLINE` macro is provided for inline functions declared
+in headers.
+
+The public headers of `libsquashfs` also must not include any headers of the
+other libraries, as they are not installed on the target system.
+
+However, somewhat contradictory to the diagtram, a number of the libraries
+outlined above need declarations, typedefs and macros from `sqfs/predef.h`
+and simply include thta header.
+
+
+## Object Oriented Design
+
+Anybody who has done C programming to a reasonable degree should be familiar
+with the technique. An interface is basically a `struct` with function pointers
+where the first argument is a pointer to the instance (`this` pointer).
+
+Single inheritance basically means making the base struct the first member of
+the extended struct. A pointer to the extended object can be down cast to a
+pointer to the base struct and used as such.
+
+To the extent possible, concrete implementations are made completely opaque and
+only have a factory function to instantiate them, for a more loose coupling.
+
+The `sqfs/predef.h` defines and typedefs a `sqfs_object_t` structure, which
+is at the bottom of the inheritance hierarchy.
+
+It contains two function pointers `delete` and `copy`. The former destroys and
+frees the object itself, the later creates an exact copy of the object.
+The `copy` callbacks may be `NULL`, if creating copies is not applicable for a
+particular type of object.
+
+For convenience, two inline helpers `sqfs_destroy` and `sqfs_copy` are provided
+that cast a `void` pointer into an object pointer and call the respecive
+callbacks. The later also checks if the callback is `NULL`.
+
+
+## The libsquashfs malloc/free Issue
+
+While most code in `libsquashfs` works with objects that have a `destroy` hook,
+some functions return pointers to data blobs or dumb structures that have been
+allocated with `malloc` and expect the caller to free them again.
+
+This turned out to be a design issue, since the shared library could in theory
+end up being linked against a different C runtime then an application using it.
+On Unix like systems this would require a rather freakish circumstances, but
+on Windows this actually happens fairly easily.
+
+As a result, a `sqfs_free` function was added to `libsquashfs` to expose access
+to the `free` function of the libraries run-time. All new code
+using `libsquashfs` should use that function, but to maintain backward
+compatibility with existing code, the library has to continue using regular
+malloc at those places, so programs that currently work with a simple `free`
+also continue to work.