The author's blog is a FANTASTIC source of information. I recommend checking out some of their other posts:
- https://mcyoung.xyz/2021/06/01/linker-script/
Given TFA's bias against GCC, I'm not so sure. e.g. looking at the linker script article… it's also missing the __start_XYZ and __stop_XYZ symbols automatically created by the linker.
It also focuses exclusively on sections. I wish it had at least mentioned segments, also known as program headers. Linux kernel's ELF loader does not care about sections, it only cares about segments.
Sections and segments are more or less the same concept: metadata that tells the loader how to map each part of the file into the correct memory regions with the correct memory protection attributes. Biggest difference is segments don't have names. Also they aren't neatly organized into logical blocks like sections are, they're just big file extents. The segments table is essentially a table of arguments for the mmap system call.
Learning this stuff from scratch was pretty tough. Linker script has commands to manipulate the program header table but I couldn't figure those out. In the end I asked developers to add command line options instead and the maintainer of mold actually obliged.
Looks like very few people know about stuff like this. One can use it to do some heavy wizardry though. I leveraged this machinery into a cool mechanism for embedding arbitrary data into ELF files. The kernel just memory maps the data in before the program has even begun execution. Typical solutions involve the program finding its own executable on the file system, reading it into memory and then finding some embedded data section. I made the kernel do almost all of that automatically.
https://www.matheusmoreira.com/articles/self-contained-lone-...
I wouldn't call them "same concept" at all. Segments (program headers) are all about the runtime (executables and shared libraries) and are low-cost. Sections are all about development (.o files) and are detailed.
Generally there are many sections combined into a single segment, other than special-purpose ones. Unless you are reimplementing ld.so, you almost certainly don't want to touch segments; sections are far easier to work with.
Also, normally you just just call `getauxval`, but if needed the type is already named `ElfW(auxv_t)*`.
> I wouldn't call them "same concept" at all.
They are both metadata about file extents and their memory images.
> sections are far easier to work with
Yes. They are not, however, loaded into memory by default. Linkers do not generate LOAD segments for section metadata since they are not needed for execution. Thus it's impossible for a program to introspect its own sections without additional logic and I/O to read them into memory.
> Also, normally you just just call `getauxval`, but if needed the type is already named `ElfW(auxv_t)*`.
True. I didn't use it because it was not available. I wrote my article in the context of a freestanding nolibc program.
Right, but you can just use the section start/end symbols for a section that already goes into a mapped segment.
Can you show me how that would work?
It's trivial to put arbitrary files into sections:
objcopy --add-section program.files.1=file.1.dat \
--add-section program.files.2=file.2.dat \
program program+files
The problem is the program.files.* sections do not get mapped in by a LOAD segment. I ended up having to write my own tool to patch in a LOAD segment into the segments table because objcopy does not have the ability to do it.Even asked a Stack Overflow question about this two years ago:
https://stackoverflow.com/q/77468641
The only answer I got told me to simply read the sections into memory via /proc/self/exe or edit the segments table and make it so that the LOAD segments cover the whole file. I eventually figured out ways to add LOAD segments to the table. By that point I didn't need sections anymore, just a custom segment type.
The whole point of section names is that they mean something. If you give it a name that matches `.rodata.*` it will be part of the existing read-only LOADed segments, or `.data.*` for (private) read-write.
Use `ld --verbose` to see what sections are mapped by default (it is impossible for a linker to work without having such a linker script; we're just lucky that GNU ld exposes it in a sane form rather than hard-coding it as C code). In modern versions of the linker (there is still old documentation found by search engines), you can specify multiple SECTIONS commands (likely from multiple scripts, i.e. just files passed on the command line), but why would you when you can conform to the default one?
You should pick a section name that won't collide with the section names generated by `-fdata-sections` (or `-ffunction-sections` if that's ever relevant for you).
That requires relinking the executable. That is not always desirable or possible. Unless the dynamic linker ignores the segments table in favor of doing this on the fly... Even if that's the case, it won't work for statically linked executables. Only the dynamic linker can assign meaning to section names at runtime and the dynamic linker isn't involved at all in the case of statically linked programs.
Absolutely agree. Had my own fun dealings with ELF, and to be clear, on plain mainline shipping products (amd64 Linux), not toys/exercise/funky embedded. (Wouldn't have known about section start/stop symbols otherwise)
I was really struck by the antipathy toward GCC. I'm not sure I quite understand where it's coming from.