aboutsummaryrefslogtreecommitdiff
path: root/elfstartup.md
diff options
context:
space:
mode:
Diffstat (limited to 'elfstartup.md')
-rw-r--r--elfstartup.md20
1 files changed, 10 insertions, 10 deletions
diff --git a/elfstartup.md b/elfstartup.md
index 37ee0db..f88bce4 100644
--- a/elfstartup.md
+++ b/elfstartup.md
@@ -3,9 +3,9 @@
This section provides a high level overview of the startup process of a
dynamically linked program on Linux.
-When using the `exec` system call to run a program, the kernel maps it into
-memory and tries to determine what kind of executable it is by looking at
-the magic number. Based on the type of executable, some data structures are
+When using the `exec` system call to run a program, the kernel looks at the
+first few bytes of the target file and tries to determine what kind of
+executable. Based on the type of executable, some data structures are
parsed and the program is run. For a statically linked ELF program, this means
fiddling the entry point address out of the header and jumping to it (with
a kernel to user space transition of course).
@@ -16,16 +16,16 @@ run. This mechanism is also used for implementing dynamically linked programs.
Similar to how scripts have an interpreter field (`#!/bin/sh`
or `#!/usr/bin/perl`), ELF files can also have an interpreter section. For
dynamically linked ELF executables, the compiler sets the interpreter field
-to the loader (`ld-linux.so` or similar).
+to the run time linker (`ld-linux.so` or similar), also known as "loader".
The `ld-linux.so` loader is typically provided by the `libc` implementation
-(i.e. Musl, glibc, ...) then maps the actual executable into memory
+(i.e. Musl, glibc, ...). It maps the actual executable into memory
with `mmap(2)`, parses the dynamic section and mmaps the used libraries
(possibly recursively since libraries may need other libraries), does
some relocations if applicable and then jumps to the entry point address.
The kernel itself actually has no concept of libraries. Thanks to this
-mechanism, it doesn't have to.
+mechanism, it doesn't even have to.
The whole process of using an interpreter is actually done recursively. An
interpreter can in-turn also have an interpreter. For instance if you exec
@@ -39,9 +39,9 @@ no interpreter field set, so the kernel maps it into memory, extracts the
entry point address and runs it.
If `/bin/sh` were statically linked, the last step would be missing and the
-kernel would start executing right there. It should also be noted that Linux
-has a hard limit for interpreter recursion depth, typically set to 3 to
-support this exact standard case (script, interpreter, loader).
+kernel would start executing right there. Linux actually has a hard limit for
+interpreter recursion depth, typically set to 3 to support this exact standard
+case (script, interpreter, loader).
The entry point of the ELF file that the loader jumps to is of course NOT
the `main` function of the C program. It points to setup code provided by
@@ -56,7 +56,7 @@ against this object file and expects it to have a symbol called `_start`. The
entry point address of the ELF file is set to the location of `_start` and the
interpreter is set to the path of the loader.
-Finally, somewhere inside the `main` function of `/bin/sh` is run, it opens
+Finally, somewhere inside the `main` function of `/bin/sh`, it eventually opens
the file it has been provided on the command line and starts interpreting your
shell script.