aboutsummaryrefslogtreecommitdiff
path: root/doc/architecture.md
blob: 8acf6414a56eafe4f1a864d9cf322e7982cd23e8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# Squashfs-tools-ng Software Architecture

Generally speaking, the package tries to somewhat imitate a typical Unix
filesystem structure.

The source code for an executable program is located in `bin/<program-name>/`,
while the source code for a library is located in `lib/<library-name>/`,
without the typical `lib` prefix.

Shared header files for the libraries are in `include/`. So far, a header
sub-directory is only used for `libsquashfs`, since those headers are somewhat
more numerous and are installed on the system in the same sub-directory.

If a binray program comes with a man page, the man page is located at the same
location as the program source (i.e. `bin/<program-name>/<program-name>.1`).

Extra documentation (like this file) is located in the `doc` directory, and
source code for example programs which are not installed is in `extras`.

Unit tests for the libraries are in `tests/<library-name>`, with a `lib` prefix
and tests for programs are in `tests/<program-name>`.

## Library Stacking

To achieve loose coupling, core functionality is implemented by libraries in a
reasonably generic way, and may in-turn make use of other libraries to implement
their functionality.

To the extent possible, the actual application programs are merely frontends
for the underlying libraries.

The following diagram tries to illustrate how the libraries are stacked:

     _______________________________________
    |                                       |
    |         Application Programs          |
    |_______________________________________|
                ____________________________
               |                            |
               |         libcommon          |
     __________|____________________________|
    |          |              |             |
    |  libtar  |  libfstree   |             |
    |__________|_______       |   libsqfs   |
    |                  |      |             |
    |    libfstream    |      |             |
    |__________________|______|_____________|
    |                         |             |
    |        libcompat        |   libutil   |
    |_________________________|_____________|


At the bottom, `libutil` contains common helper functions and container
data structures (dynamic array, hash table, rb-tree, et cetera) used by
both `libsqfs` and the application programs.

The `libcompat` library contains fallback implementations for OS library
functions that might not be available everywhere (e.g. POSIX stuff missing
on Windows or GNU stuff missing on BSD).

The `libfstream` library implements stream based I/O abstraction, i.e. it has
an abstract data structure for a non-seek-able read-only input streams and
write-only output streams. It has concrete implementations wrapping the
underlying OS functionality, as well as stream-compressor based implementations
that wrap an existing interface instance.

On top of `libfstream`, the `libtar` library is implemented. It supports
simple reading & decoding of tar headers from an input stream, as well as
generating and writing tar headers to an output stream and supports various
extensions (e.g. GNU, SCHILY). Thanks to the `libfstream` compressor wrappers,
it supports transparent compression/decompression of tar headers and data.

The `libfstree` library contains functionality related to managing a
and manipulating a hierarchical filesystem tree. It can directly parse the
description format for `gensquashfs` or scan a directory.

The `libsqfs` (actually `libsquashfs`) library implements the bulk of the
squashfs reading/writing functionality. It is built as a shared library and
is installed on the target system along with the application programs and a
bunch of public headers.

Finally, `libcommon` contains miscellaneous stuff shared between the
application programs, such as common CLI handling code, some higher level
utilities, higher level wrappers around `libsqfs` for some tool specific
tasks.

### Licensing Implications

The application programs and the static libraries are GPL licensed,
while `libsquashfs` is licensed under the LGPL. Because the code
of `libutil` is compiled into `libsquashfs`, it also needs to be under
the LGPL and only contain 3rd party code under a compatible license.

Furthermore, since the LZO compressor library is GPL licensed, `libsquashfs`
cannot use it directly and thus does not support LZO compression. Instead,
the `libcommon` library contains an implementation of the `libsquashfs`
compressor interface that uses the LZO library, so the application
programs *do support* LZO, but the library doesn't.


### Managing Symbols and Visiblity

All symbols exported from `libsquashfs` must start with a `sqfs_` prefix.
Likewise, all data structures and typedefs in the public header use this prefix
and macros use an `SQFS_` prefix, in order to prevent namespace pollution.

The `sqfs/predef.h` header contains a macro called `SQFS_API` for marking
exported symbols. Whether the symbols are imported or exported, depends on
the presence of the `SQFS_BUILDING_DLL` macro.

To mark symbols as explicitly not exported (required on some platforms), the
macro `SQFS_INTERNAL` is used (e.g. on all `libutil` functions to keep
the internal).

An additional `SQFS_INLINE` macro is provided for inline functions declared
in headers.

The public headers of `libsquashfs` also must not include any headers of the
other libraries, as they are not installed on the target system.

However, somewhat contradictory to the diagtram, a number of the libraries
outlined above need declarations, typedefs and macros from `sqfs/predef.h`
and simply include thta header.


## Object Oriented Design

Anybody who has done C programming to a reasonable degree should be familiar
with the technique. An interface is basically a `struct` with function pointers
where the first argument is a pointer to the instance (`this` pointer).

Single inheritance basically means making the base struct the first member of
the extended struct. A pointer to the extended object can be down cast to a
pointer to the base struct and used as such.

To the extent possible, concrete implementations are made completely opaque and
only have a factory function to instantiate them, for a more loose coupling.

The `sqfs/predef.h` defines and typedefs a `sqfs_object_t` structure, which
is at the bottom of the inheritance hierarchy.

It contains two function pointers `delete` and `copy`. The former destroys and
frees the object itself, the later creates an exact copy of the object.
The `copy` callbacks may be `NULL`, if creating copies is not applicable for a
particular type of object.

For convenience, two inline helpers `sqfs_destroy` and `sqfs_copy` are provided
that cast a `void` pointer into an object pointer and call the respecive
callbacks. The later also checks if the callback is `NULL`.


## The libsquashfs malloc/free Issue

While most code in `libsquashfs` works with objects that have a `destroy` hook,
some functions return pointers to data blobs or dumb structures that have been
allocated with `malloc` and expect the caller to free them again.

This turned out to be a design issue, since the shared library could in theory
end up being linked against a different C runtime then an application using it.
On Unix like systems this would require a rather freakish circumstances, but
on Windows this actually happens fairly easily.

As a result, a `sqfs_free` function was added to `libsquashfs` to expose access
to the `free` function of the libraries run-time. All new code
using `libsquashfs` should use that function, but to maintain backward
compatibility with existing code, the library has to continue using regular
malloc at those places, so programs that currently work with a simple `free`
also continue to work.