To me this whole thing is interesting since it essentially requires ELF loading to be duplicated between the kernel and libc, and then possibly duplicated again for libdl vs ldlinux. Seems unideal. (Though nothing new. Pretty sure it's been like that for decades by this point.)
> Yeah it turns out the kernel doesn't care about sections at all. It only ever cares about the PT_LOAD segments in the program header table, which is essentially a table of arguments for the mmap system call. Sections are just dynamic linker metadata and are never covered by PT_LOAD segments.
The simplicity of the ELF loader in Linux can be exploited to make extremely small executables [1], since most of the data in the ELF header is stuff that the kernel doesn't care about.
[0] https://news.ycombinator.com/item?id=45706380#45709203
[1] https://www.muppetlabs.com/~breadbox/software/tiny/teensy.ht...
Oh.
I liked the way QNX did it. Loading was done by a .so file, entirely by userspace. When you built a kernel boot image, you could include whatever userspace programs and .so files were needed to get started, as raw memory images. They were all loaded by the boot loader. That included the .so file with the code for loading programs. All loading and preprocessing of executable images was done entirely in user space.
It looks like Linux now has similar capabilities, but the old cruft remains. This is typical of Linux migration of machinery to user space. The kernel doesn't seem to shrink.
I understand why they went this route. While it is unfortunate to need duplicate code parsing and loading ELF files, the ELF binfmt in the kernel is at least relatively simple, since it does not need to worry about dynamic linking. Doing what QNX did would be possible, but it would also add moving parts and change the relationship Linux has with the userland, which is one thing they do not like to do. They could probably come up with a middleground, like pre-baking a raw memory image with an ELF loader that can be stuck into a new process when exec'ing an ELF binary and shipping that with the kernel, but I'm sure there would be observable side-effects with regards to performance and maybe locks, I can see it being more impactful to focus on ensuring the existing implementation is correct. (AFAIK it is still "only" a few thousand lines.)