The Journey Before main()

upvote

The Journey Before main()

(amit.prasad.me)

311 points

by amitprasad2 days ago |

upvote

by fweimer2 days ago|

[-]

> The ELF file contains a dynamic section which tells the kernel which shared libraries to load, and another section which tells the kernel to dynamically “relocate” pointers to those functions, so everything checks out.

This is not how dynamic linking works on GNU/Linux. The kernel processes the program headers for the main program (mapping the PT_LOAD segments, without relocating them) and notices the PT_INTERP program interpreter (the path to the dynamic linker) among the program headers. The kernel then loads the dynamic linker in much the same way as the main program (again without relocation) and transfers control to its entry point. It's up to the dynamic linker to self-relocate, load the referenced share objects (this time using plain mmap and mprotect, the kernel ELF loader is not used for that), relocate them and the main program, and then transfer control to the main program.

The scheme is not that dissimilar to the #! shebang lines, with the dynamic linker taking the role of the script interpreter, except that ELF is a binary format.

reply

upvote

by matheusmoreira1 days ago|

[-]

Yeah it turns out the kernel doesn't care about sections at all. It only ever cares about the PT_LOAD segments in the program header table, which is essentially a table of arguments for the mmap system call. Sections are just dynamic linker metadata and are never covered by PT_LOAD segments.

This seems to be a common misconception. I too suffered from it once... Tried to embed arbitrary files into ELF files using objcopy. The tool could easily create new sections with the file contents just fine, but the kernel wouldn't load them into memory. It was really confusing at first.

https://stackoverflow.com/q/77468641

There were no tools for patching the program header table, I ended up making them! The mold linker even added a feature just to make this patching easy!

https://www.matheusmoreira.com/articles/self-contained-lone-...

reply

upvote

by mkoubaa2 days ago|

[-]

I've always wondered why there weren't more popular loaders to choose from given that on Linux loaders are user-space

reply

upvote

by fweimer1 days ago|

[-]

With containers, you usually get incompatible dynamic loaders in the containers (see mananaysiempre' comment; the glibc dynamic linker sees rather active development in some LTS distributions). This wouldn't be possible if the loader were part of the kernel.

Non-ELF loaders are fairly common, too. It's how Wine works, and how Microsoft reuses PE/COFF SQL Server binaries on Linux.

reply

upvote

by ksherlock1 days ago|

[-]

There's also binfmt support, which can check a supposedly executable file against some magic and auto-launch an interpreter (like wine or java or dosemu). I looked into it for something once but in my case the magic wasn't good enough.

https://www.kernel.org/doc/html/latest/admin-guide/binfmt-mi...

reply

upvote

by delusional1 days ago|

[-]

Its super awesome for qemu. It let's you chroot into a arm root (full of arm binaries) on you x86 machine and just run it like normal. No VM required.

reply

upvote

by mananaysiempre1 days ago|

[-]

Part of it is the Glibc loader’s carnal knowledge of Glibc proper; there’s essentially no module boundary there. (That’s not completely unjustified, but Glibc is especially hostile there, like in its many other architectural choices.) Musl outright merges the two into a single binary. So if you want to do a loader then you’re also doing a libc.

Part of it for desktop Linux specifically is that a lot of the graphics stack is very unfriendly to alternative libcs or loaders. For example, Wayland is nominally a protocol admitting multiple implementations, but if you want to not be dumb[1] and do GPU-accelerated graphics, then the ABI ties you to libwayland.so specifically (event-loop opinions and all) in order to load vendor-specific userspace drivers, which entails your distro’s preferred libc (probably Glibc).

[1] There can of course be good engineering reasons to be dumb.

reply

upvote

by codedokode1 days ago|

[-]

Why do you need "vendor-specific userspace drivers"? I thought graphic acceleration uses OpenGL/Vulkan, and non-accelerated graphics uses DRM? And there are no "drivers" for Wayland compositors?

reply

upvote

by matheusmoreira1 days ago|

[-]

OpenGL and Vulkan are implemented as libraries in user space as the Mesa project.

reply

upvote

by BobbyTables22 days ago|

[-]

I suspect it is because they get really hairy.

Loading ELFs and processing relocations is actually not too bad. It’s fun after the initial learning curve.

Then one has to worry about handling of “dlopen” and the loader creating the data structures it cares about. Yuck!!!

It’s kinda a shame because the glibc loader is a bit bloated with all the audit and preload handling. Great for flexibility, not for security.

reply

upvote

by matheusmoreira1 days ago|

[-]

It's hard to describe how complex this stuff is. Shared object loaders are essentially primitive package managers, topologically sortinf dependencies and everything...

https://blogs.oracle.com/solaris/post/init-and-fini-processi...

reply

upvote

by amitprasad2 days ago|

[-]

You’re right, and I knew this back in February when I wrote most of this post. I must have revised it down incorrectly before posting; will correct. Bit of a facepalm from my side.

reply

upvote

by mmsc2 days ago|

[-]

It's also possible to pack a whole codebase into "before main()" - or with no main() at all. I was recently experimenting doing this, as well as a whole codebase that only uses main() and calls itself over and over. Good fun: https://joshua.hu/packing-codebase-into-single-function-disr...

reply

upvote

by 17186274402 days ago|

[-]

That is a really fun read and honestly doesn't even seem to be complicated and brittle. Just rename every function to main(100+n, ...).

reply

upvote

by thatxliner1 days ago|

[-]

Just wondering, how did you get that domain name? I’ve been looking for registrars offering .hu

reply

upvote

by hashstring1 days ago|

[-]

whois data points to https://www.domain.hu.

reply

upvote

by slater1 days ago|

[-]

https://nic.hu/index_en.html ?

reply

upvote

by archmaster2 days ago|

[-]

This is awesome! To anyone interested in learning more about this, I wrote https://cpu.land/ a couple years ago. It doesn't go as in-depth into e.g. memory layout as OP does but does cover multitasking and how the code is loaded in the first place.

reply

upvote

by fuzzy_biscuit2 days ago|

[-]

I love cpu.land! Thanks for creating such a fun resource.

reply

upvote

by khaledh2 days ago|

[-]

> A note on interpreters: If the executable file starts with a shebang (#!), the kernel will use the shebang-specified interpreter to run the program. For example, #!/usr/bin/python3 will run the program using the Python interpreter, #!/bin/bash will run the program using the Bash shell, etc.

This caused me a lot of pain while trying to debug a 3rd party Java application that was trying to launch an executable script, and throwing an IO error "java.io.IOException: error=2, No such file or directory." I was puzzled because I know the script is right there (using its full path) and it had the executable bit set. It turns out that the shebang in the script was wrong, so the OS was complaining (actual error from a shell would be "The file specified the interpreter '/foo/bar', which is not an executable command."), but the Java error was completely misleading :|

Note: If you wonder why I didn't see this error by running the script myself: I did, and it ran fine locally. But the application was running on a remote host that had a different path for the interpreter.

reply

upvote

by 17186274402 days ago|

[-]

Note, that this is not a Java specific problem, it can occur with other programs as well. "No such file or directory" is just the nice description for ENOENT, which can occur in a lot of syscalls. I typically just run the program through strace, then you will quickly see what the program did.

reply

upvote

by gjf2 days ago|

[-]

For those interested, I did a breakdown of the hashbang: https://blog.foletta.net/post/2021-04-19-what-the/

reply

upvote

by mscdex2 days ago|

[-]

Also be aware that kernel support for shebangs depends on CONFIG_BINFMT_SCRIPT=y being in the kernel config.

reply

upvote

by vbezhenar2 days ago|

[-]

I wonder how many C projects prefer to avoid standard library, just invoking Linux syscalls directly. Much more fun to write software this way, IMO.

reply

upvote

by electroly2 days ago|

[-]

Not exactly the same, but on Windows if you use entirely Win32 calls you can avoid linking any C runtime library. Win32 is below the C standard library on Windows and the C runtime is optional.

reply

upvote

by okanat2 days ago|

[-]

This is one of the cornerstones that guarantee Windows can easily upgrade the C runtime and make performance and security upgrades. Win32 APIs have a different function calling ABI too.

So only part of that gets "bloated" is Win32 API itself (which is spread across multiple DLLs and don't actually bloat RAM usage). Most of the time even those functions and structures are carefully designed to have some future-proofness but it is usual to see APIs like CreateFile, CreateFile2, CreateFile3. Internally the earlier versions are upgraded to call the latest version. So not so much bloating there either.

When the C runtime and the OS system calls are combined into the single binary like POSIX, it creates the ABI hell we're in with the modern Unix-likes. Either the OSes have to regularly break the C ABI compatibility for the updates or we have to live with terrible implementations.

GNU libc and Linux combo is particularly bad. On GNU/Linux (or any other current libc replacements), the dynamic loading is also provided by the C library. This makes "forever" binary file compatibility particularly tricky to achieve. Glibc broke certain games / Steam by removing some parts of their ELF implementation: https://sourceware.org/bugzilla/show_bug.cgi?id=32653 . They backed due to huge backlash from the community.

If "the year of Linux desktop" would ever happen, they need to either do an Android and change the definition of what a software package is, or split Glibc into 3 parts: syscalls, dynamic loader and the actual C library.

PS: There is actually a catch to your " C runtime is optional." argument. Microsoft still intentionally holds back the ability of compiling native ABI Windows programs without Visual Studio.

The structured exception handlers (equivalent of Windows for SIGILL, SIGBUS etc.. not for SIGINT or SIGTERM though) are populated by the object files from the C runtime libraries (called VCRuntime/VCStartup). So it is actually not possible to have official Windows binaries without MSVC or any other C runtime like Mingw-64 that provides those symbols. It looks like some developers in Microsoft wanted to open-source VCRuntime / VCStartup but it was ~vetoed~ not fully approved by some people: https://github.com/microsoft/STL/issues/4560#issuecomment-23... , https://www.reddit.com/r/cpp/comments/1l8mqlv/is_msvc_ever_g...

reply

upvote

by 17186274402 days ago|

[-]

> split Glibc into 3 parts: syscalls, dynamic loader and the actual C library.

What is left of the C standard library, if you remove syscall wrappers?

> ABI hell

Is that really the case? From my understanding the problem is more, that Linux isn't an OS, so you can't rely on any *.so being there.

reply

upvote

by okanat2 days ago|

[-]

> > split Glibc into 3 parts: syscalls, dynamic loader and the actual C library.

> What is left of the C standard library, if you remove syscall wrappers?

Still quite a bit actually. Stuff like malloc, realloc, free, fopen, FILE, getaddrinfo, getlogin, math functions like cos, sin tan, stdatomic implementations, some string functions are all defined in C library. They are not direct system calls unlike: open, read, write, ioctl, setsockopt, capget, capset ....

> > ABI hell

> Is that really the case? From my understanding the problem is more, that Linux isn't an OS, so you can't rely on any *.so being there.

That's why I used more specific term GNU/Linux at the start. There is no guarantee of any .so file can be successfully loaded even if it is there. Glibc can break anything. With the Steam bug I linked this is exactly what happened. Shared object files were there, Glibc stopped supporting a certain ELF file field.

There is only and only one guarantee with Linux-based systems: syscalls (and other similar ways to talk with kernel like ioctl struct memory layouts etc) always keep working.

There is so much invisible dependence on Glibc behavior. Glibc also controls how the DNS works for the programs for example. That also needs to be split into a different library. Same for managing user info like `getlogin`. Moreover all this functionality is actually implemented as dynamic library plugins in Glibc (NSSwitch) that rely on ld.so that's also shipped by Glibc. It is literally a Medusa head of snakes that bite multiple tails. It is extremely hard to test ABI breakages like this.

reply

upvote

by 17186274402 days ago|

[-]

> malloc, realloc, free

Wrapper around sbrk, mmap, etc. whatever the modern variant is.

> fopen, FILE

Wrapper around open, write, read, close.

> stdatomic implementations

You can argue, these are wrappers around thread syscalls.

> math functions like cos, sin tan, some string functions are all defined in C library

True for these, but they are so small, they could just be inlined directly, on their own they wouldn't necessarily deserve a library.

> That's why I used more specific term GNU/Linux at the start.

While GNU/Linux does describe a complete OS, it doesn't describe any specific OS. Every Distro does it's own thing, so I think these is what you actually need to call an OS. But everything is built so that the user can take the control over the architecture and which components the OS consists of, so every installation can be a snowflake, and then it is technically its own OS.

I personally consider libc and the compiler (which both make a C implementation) to be part of the OS. I think this is both grounded in theory and in practice. Only in some weird middle ground between theory and practice you can consider them to not be.

reply

upvote

by vbezhenar1 days ago|

[-]

> malloc, realloc, free > Wrapper around sbrk, mmap, etc. whatever the modern variant is.

I don't think that's correct. While `malloc` uses `brk` syscall to allocate large memory areas, it uses non-trivial algorithms and data-structures to further divide that areas into smaller chunks which actually returned. Using syscall for every `malloc`/`free` is quite an overhead.

> fopen, FILE

> Wrapper around open, write, read, close.

They're not just wrappers. They implement internal buffering, some transformations (for example see "binary" mode, "text" mode.

> stdatomic implementations

> You can argue, these are wrappers around thread syscalls.

No, they're wrappers around compiler intrinsics which emit specific assembly instructions. At least for any sane architecture.

> I personally consider libc and the compiler (which both make a C implementation) to be part of the OS. I think this is both grounded in theory and in practice. Only in some weird middle ground between theory and practice you can consider them to not be.

C is used a lot in embedded projects. I even think that's the majority of C code nowadays. These projects usually don't use libc (as there's no operating system, so concept of file or process just doesn't make sense). So it's very important to separate C compiler and libc and C compiler must be able to emit code with zero dependencies.

reply

upvote

by 17186274401 days ago|

[-]

Yeah sure they do more things than only doing the syscall, that's the point of an abstraction. But they still provide the functionality of the syscalls, just in the abstraction that you want it to be exposed as in the programming language. That's what I would consider a wrapper.

> C is used a lot in embedded projects.

Sure that's a freestanding implementation, which primary distinction is that it doesn't rely on the libc. The notion of the libc being part of the OS in the wider sense, still holds water here, since here no OS corresponds to no libc.

reply

upvote

by saagarjha1 days ago|

[-]

mmap and sbrk would be very poor implementations of malloc.

reply

upvote

by 17186274401 days ago|

[-]

We are talking of wrappers on top of mmap and sbrk. Of course you wouldn't use mmap and sbrk instead of the abstraction. It's really the same as the difference between fread and read.

reply

upvote

by codedokode1 days ago|

[-]

> Glibc broke certain games / Steam by removing some parts of their ELF implementation: https://sourceware.org/bugzilla/show_bug.cgi?id=32653 . They backed due to huge backlash from the community.

It would be better if you specified which part was removed: support for executable code on stack. This is used in 99% cases by malware so it is better to break 1% of broken programs and have other 99% run safer.

reply

upvote

by josefx1 days ago|

[-]

The comments on that bug report mention several language runtimes getting broken. Preventing languages that are generally safer than C from working seems rather counterproductive to overall security.

reply

upvote

by saagarjha1 days ago|

[-]

> If "the year of Linux desktop" would ever happen, they need to either do an Android and change the definition of what a software package is, or split Glibc into 3 parts: syscalls, dynamic loader and the actual C library.

The dynamic loader used to be its own library, FWIW. It got merged into the main one recently.

reply

upvote

by liqilin15671 days ago|

[-]

I'm sick of glibc compatibility problems. Are there any recommended replacements?

reply

upvote

by electroly1 days ago|

[-]

For non-graphical apps, you can link statically against musl to produce a binary that only depends on the Linux kernel version and not the version or type of libc on the system. You may take a performance hit as musl isn't optimized for speed, and a size hit for shipping your own libc, and a feature hit because musl is designed to be minimal, but for many command line tools all of these downsides are acceptable.

reply

upvote

by emidln1 days ago|

[-]

.interp to a glibc/libc you ship or static linking. These days it’s probably faster (in dev time) to just run a container than setting up a bespoke interp and a parallel set of libraries (and the associated toolchain changes or binary patching needed to support it).

reply

upvote

by liqilin15671 days ago|

[-]

Running a container is exactly my current solution as well.

Are there any other solutions that don't depend on glibc?

reply

upvote

by retatop1 days ago|

[-]

Guix (and I assume Nix as well, but I only know Guix) can create a package that is completely self contained including glibc. You can even have it be an AppImage https://guix.gnu.org/manual/devel/en/html_node/Invoking-guix...

reply

upvote

by 17186274401 days ago|

[-]

Glibc is half of GNU/Linux. You can of course use another libc, but it will be a different OS.

reply

upvote

by liqilin15671 days ago|

[-]

Yeah, even library loading relies on glibc, so we can't really escape glibc on GNU/Linux.

reply

upvote

by 171862744015 hours ago|

[-]

I don't really know why people expect to be able to bypass the OS and not have problems. It seems to come from people who think a "Linux OS" only consists of the Linux kernel.

reply

upvote

by vbezhenar19 hours ago|

[-]

I wonder if anyone implemented loading shared libraries without glibc? It shouldn't be that hard, just need to implement ELF parser and glibc-compatible relocation mechanism.

reply

upvote

by 171862744015 hours ago|

[-]

I don't think nobody has done, that. It is just that vendoring your own OS comes with a lot of work.

reply

upvote

by codedokode1 days ago|

[-]

I think using syscalls directly is a worse idea than loading shared libraries, and new kernel features, like ALSA (audio playback), DRM (graphics rendering) and other use libraries instead of documenting syscalls and ioctls. This is better because it allows intercepting and subverting the calls, adding support for features even if the kernel doesn't support it, makes it easier to port code to other OSes, support different architectures (32-bit code on 64-bit kernel), and allows changing kernel interface without breaking anything. So Windows-style approach with system libraries is better in every aspect.

reply

upvote

by matheusmoreira1 days ago|

[-]

I once wrote a liblinux project just for this!! It was indeed extremely fun. Details in my other comment:

https://news.ycombinator.com/item?id=45709141

I abandoned it because Linux itself now has a rich set of nolibc headers.

Now I'm working on a whole programming language based around this concept. A freestanding lisp interpreter targeting Linux directly with builtin system call support. The idea is to complete the interpreter and then write the standard library and Linux user space in lisp using the system calls.

It's been an amazing journey. It's incredible how far one can take this.

reply

upvote

by 17186274402 days ago|

[-]

I generally try to stay portable, but file descriptors are just to nice, to not use them.

reply

upvote

by Retr0id2 days ago|

[-]

File descriptors are part of the linux syscall API, not libc. Are you thinking of FILE?

reply

upvote

by ajross2 days ago|

[-]

The "syscall API" is part of libc too. The read syscall is a trap, you put arguments in the right registers and issue the correct instruction[1] to enter the kernel. That's not something that can be expressed in C. The read() function that your C code actually uses is a C function provided by the C library.

[1] "svc 0" on ARM, "int 0x80" on i386, etc...

reply

upvote

by matheusmoreira1 days ago|

[-]

> That's not something that can be expressed in C.

I've often made the argument that compilers should add builtins for Linux system calls. Just emit code in the right calling convention and the system call instruction, and return the result. Even high level dynamic languages could have their JIT compilers generate this code.

I actually tried to hack a linux_system_call builtin into GCC at some point. Lost that work in a hard drive crash, sadly. The maintainers didn't seem too convinced in the mailing list so I didn't bother rewriting it.

> The read() function that your C code actually uses is a C function provided by the C library.

These are just magic wrapper functions. The actual Linux system call entry point is language agnostic, specified at the instruction architecture level, and is considered stable.

https://www.matheusmoreira.com/articles/linux-system-calls

This is different from other systems which force people to use the C library to interface with the kernel.

One of the most annoying things in the Linux manuals is they conflate the glibc wrappers with the actual system calls in Linux. The C library does a lot more than just wrap these things, they dynamically choose the best variants and even implement cancellation/interruption mechanisms. Separating the Linux behavior from libc behavior can be difficult, and in my experience requires reading kernel source code.

reply

upvote

by 17186274401 days ago|

[-]

> I've often made the argument that compilers should add builtins for Linux system calls. Just emit code in the right calling convention and the system call instruction, and return the result. Even high level dynamic languages could have their JIT compilers generate this code.

You can only do that, when you compile for a specific machine. In general you are compiling for some abstract notion of an OS. JITs always compile for the machine they are running on, so they don't have that problem. There is code, that is compiled directly to your syscalls specific to your machine, so that abstract code can use this. It's called libc for the C language.

> One of the most annoying things in the Linux manuals is they conflate the glibc wrappers with the actual system calls in Linux. The C library does a lot more than just wrap these things, they dynamically choose the best variants and even implement cancellation/interruption mechanisms. Separating the Linux behavior from libc behavior can be difficult, and in my experience requires reading kernel source code.

In my experience there are often detailed explanation in the notes section. From readv(2):

  NOTES
       POSIX.1  allows  an  implementation  to  place a limit on the number of
       items that can be passed in iov.  An implementation can  advertise  its
       limit  by  defining IOV_MAX in <limits.h> or at run time via the return
       value from sysconf(_SC_IOV_MAX).  On modern Linux systems, the limit is
       1024.  Back in Linux 2.0 days, this limit was 16.

   C library/kernel differences
       The  raw  preadv() and pwritev() system calls have call signatures that
       differ slightly from that of the corresponding GNU  C  library  wrapper
       functions  shown  in  the SYNOPSIS.  The final argument, offset, is un‐
       packed by the wrapper functions into two arguments in the system calls:

           unsigned long pos_l, unsigned long pos

       These arguments contain, respectively, the low order and high order  32
       bits of offset.

   Historical C library/kernel differences
       To  deal  with  the  fact  that IOV_MAX was so low on early versions of
       Linux, the glibc wrapper functions for readv() and  writev()  did  some
       extra  work  if  they  detected  that the underlying kernel system call
       failed because this limit was exceeded.  In the case  of  readv(),  the
       wrapper  function  allocated a temporary buffer large enough for all of
       the items specified by iov, passed that buffer in a  call  to  read(2),
       copied  data from the buffer to the locations specified by the iov_base
       fields of the elements of iov, and then freed the buffer.  The  wrapper
       function  for  writev()  performed the analogous task using a temporary
       buffer and a call to write(2).

       The need for this extra effort in the glibc wrapper functions went away
       with Linux 2.2 and later.  However, glibc continued to provide this be‐
       havior until version 2.10.  Starting with glibc version 2.9, the  wrap‐
       per  functions  provide  this behavior only if the library detects that
       the system is running a Linux kernel older than version 2.6.18 (an  ar‐
       bitrarily  selected  kernel  version).  And since glibc 2.20 (which re‐
       quires a minimum Linux kernel version of  2.6.32),  the  glibc  wrapper
       functions always just directly invoke the system calls.

reply

upvote

by matheusmoreira1 days ago|

[-]

> You can only do that, when you compile for a specific machine.

You always compile for a specific machine. There is always a target instruction set architecture. It decides the calling convention used for Linux system calls. Compiler can even produce an error in case the target is not supported by Linux.

> In general you are compiling for some abstract notion of an OS.

This "abstract notion of an OS" boils down to the libc. Freestanding C gets rid of most of it. Making system calls is also perfectly valid in hosted C. Modern languages like Rust also have freestanding modes.

> In my experience there are often detailed explanation in the notes section.

That's the problem. Why is the Linux stuff just a bunch of footnotes in the Linux manual? It should be in the main section. The glibc specifics should be footnotes.

reply

upvote

by 17186274401 days ago|

[-]

Specific machine meaning defined set of installed software, versions in install locations.

Abstract notion of OS meaning Debian 12. Not Linux kernel commit ####, GNU libc commit ####, dpkg commit ####, apt commit ####, Apache httpd commit #### with patch ### to ### from Debian 4 version ### and Ubuntu 21 version ###, SQLite3 with special patches ### installed in /opt/bin/foo, ... (you get the idea).

> That's the problem. Why is the Linux stuff just a bunch of footnotes in the Linux manual? It should be in the main section. The glibc specifics should be footnotes.

Because you look at the OS manual, not at the documentation of the kernel. Notes and Bugs are also not footnotes in man pages. They are pretty important and are basically the first free-form section where you can tell about the ideas, ideals and history. The first part a pretty strict, formal description of the calling semantics.

reply

upvote

by matheusmoreira1 days ago|

[-]

Let's systematize this.

Compilers build for target triples such as x86_64-linux-gnu. It is of the form isa-kernel-userspace. If kernel is linux, the builtin can be used. The isa determines the code generated by the compiler, both in general and for the builtin. The userspace can be anything at all, including none. Sometimes compilers build for target quadruples which also include a vendor, and that information is also irrelevant.

reply

upvote

by 17186274401 days ago|

[-]

I am not sure you understand my point. Inlining libc definitions for syscalls is fine when you only care about Debian 12 commit hash ####. It will break as soon as you think your machine is running Debian 12 and you updated it, so surely it includes the latest userspace-patches. It will also break when a user uses the OS configuration to change the behaviour of some OS functionality, but your code is oblivious to that matter, because your code bypasses the OS version of libc.

Modifying the OS is fine, if this is what you want to do, but it comes with tradeoffs.

----

You wrote earlier:

> actually tried to hack a linux_system_call builtin into GCC at some point. [...] The maintainers didn't seem too convinced in the mailing list so I didn't bother rewriting it.

I am not sure what exactly this means. There is syscall(2) in the libc, if you want to do this. If you want to inline the wrappers you can pass -static to the compiler invocation.

reply

upvote

by matheusmoreira1 days ago|

[-]

> It will break

If it ever breaks, it's a bug in the Linux kernel.

> It will also break when a user uses the OS configuration to change the behaviour of some OS functionality

Can you give concrete examples of this?

> There is syscall(2) in the libc, if you want to do this.

I know. I've written my own syscall(), as well. The idea is to put it in the compiler as a builtin so there's no need to even write it.

reply

upvote

by 171862744014 hours ago|

[-]

> If it ever breaks, it's a bug in the Linux kernel.

No, your program will still instruct the kernel to do the same. It will just cause conflicts with the other OS internals.

> Can you give concrete examples of this?

Adding another encoding as a gconv module. The DNS issues everyone is talking about.

I don't know what that gets you compared to using syscall(2) and -static. When you want your program to depend on the kernel API instead of the OS API, then you should really link libc statically.

reply

upvote

by matheusmoreira12 hours ago|

[-]

> It will just cause conflicts with the other OS internals.

But not with the kernel.

"Other OS internals" are just replaceable components. The idea is to depend on Linux only, not on Linux+glibc.

> Adding another encoding as a gconv module. The DNS issues everyone is talking about.

Those are glibc problems, not Linux problems. Linux does not perform name resolution or character encoding conversion.

reply

upvote

by cyphar1 days ago|

[-]

The libc syscall wrappers are part of the libc API, but on Linux, syscalls are part of the stable ABI and so you can freely do __asm__(...) to write your own version of syscall(2) and it is fully supported. Yeah, __asm__ is probably not in the C spec, but every compiler implements it...

For instance, Go directly calls Linux system calls without going through libc (which has lead to lots of workarounds to emulate some glibc-specific behaviour -- swings and roundabouts I guess...).

Other operating systems do not provide this kind of compatibility guarantee and instead require you to always go through libc as the syscall ABI is not stable (though ultimately, you can still use __asm__ if you so choose).

In any case, file descriptors are definitely not a libc construct on Linux.

reply

upvote

by 17186274401 days ago|

[-]

Yes, you can. Then you don't write against the OS, but against the kernel. It sometimes works, because the kernel is a separate project, it sometimes doesn't, you gave an example yourself.

> In any case, file descriptors are definitely not a libc construct on Linux.

File descriptors come definitely from the kernel, but they do also exist as a concept in libc, and I was referring to them as such. I was saying that I depend on non-portable libc functions, even though I value portability, because the API is just so nice. I did not want to indicate, that I am doing syscalls directly.

reply

upvote

by Retr0id2 days ago|

[-]

syscalls are an implementation detail of some libc impls on some platforms, but the C spec does not mention syscalls.

reply

upvote

by 17186274402 days ago|

[-]

I did mean file descriptors.

reply

upvote

by Retr0id2 days ago|

[-]

Then I'm confused by what you meant, because you can use fds with or without libc.

reply

upvote

by 17186274401 days ago|

[-]

I don't want to bypass libc in general, because I care about portability, but fds are just a nice interface, so I still use them instead of FILE, which would be the portable choice. My calls are still subject to OS choices, that differ from the kernel, since I don't bypass libc.

reply

upvote

by jjmarr2 days ago|

[-]

Tons of driver code does this.

reply

upvote

by forrestthewoods2 days ago|

[-]

You had me with “avoid C standard library” but lost me at “incoming Linux syscalls directly”.

Windows support is a requirement, and no WSL2 doesn’t count.

C standard library is pretty bad and it’d be great if not using it was a little easier and more common.

reply

upvote

by WJW2 days ago|

[-]

Obviously only a requirement if you intend your software to run under windows. But if you don't, why bother. Not all software is intended to be distributed to users far and wide. Some of it is just for yourself, and some of it will only ever run on linux servers.

reply

upvote

by forrestthewoods2 days ago|

[-]

> some of it will only ever run on linux servers.

I’ve spent quite a lot of time dealing with code that will ever run on Linux which did not in fact only ever run on Linux!

Obviously for hobby projects anyone can do what they want. But adult projects should support Windows imho and consider Windows support from the start. Cross-platform is super easy unless you choose to make it hard.

reply

upvote

by matheusmoreira1 days ago|

[-]

> But adult projects should support Windows imho and consider Windows support from the start.

Hope whatever "adult" is working on the project this is getting paid handsomely. They'd certainly need to pay me big bucks to care about Windows support.

In any case, Linux system call ABI is becoming a lingua franca of systems programming. BSDs have implemented Linux system calls. Windows has straight up included Linux in the system. It looks like simply targeting Linux can easily result in a binary that actually does run anywhere.

reply

upvote

by codedokode1 days ago|

[-]

Try playing audio or displaying image on the screen using only documented syscalls. And make it work on all platforms you mentioned.

reply

upvote

by matheusmoreira1 days ago|

[-]

Displaying an image on the screen is not that difficult a task. Linux has framebuffer device files. You open them, issue an ioctl to get metadata like screen geometry and color depth, then mmap the framebuffer as an array of pixels you can CPU render to. It's eerily similar to the way terminal applications work.

It's also possible to use Linux KMS/DRM without any user space libraries.

https://github.com/laxyyza/drmlist/

The problem with hardware accelerated rendering is much of the associated functionality is actually implemented in user space and therefore not part of the kernel. They unfortunately force the libc on us. One would have to reimplement things like Mesa in order to do this. Not impossible, just incredibly time consuming.

Things could have been organized in a way that makes this feasible. Example: SQLite. You can plug in your own memory allocation functions and VFS layer. I've been slowly porting the SQLite Unix VFS to freestanding Linux in order to use it in my freestanding applications.

reply

upvote

by forrestthewoods1 days ago|

[-]

> Windows has straight up included Linux in the system. It looks like simply targeting Linux can easily result in a binary that actually does run anywhere.

Kind of. But not really. WSL2 is a thing. But most code isn’t running in WSL2 so if your thing “runs on windows” but requires running in a WSL2 context then oftentimes it might as well not exist.

> They'd certainly need to pay me big bucks to care about Windows support.

The great irony is that Windows is a much much much better and more pleasant dev environment. Linux is utterly miserable and it’s all modern programmers know. :(

reply

upvote

by 17186274401 days ago|

[-]

There is also WSL1 and Cygwin and MinGW/MSYS2.

And no WSL2 is not a newer version of WSL1, they are entirely different products.

reply

upvote

by forrestthewoods1 days ago|

[-]

MinGW is awful. Avoid. Cygwin is honestly not really something that has come up in my career.

I don’t know why Linux people are so adamant to break their backs - and the backs of everyone around them - to try and do things TheLinuxWay. It’s weird. IMHo it’s far far far better and to take a “when in Rome” approach.

My experience is that Linux people are MUCH worse at refusing to take a When in Rome approach than the other way. The great tragedy is that the Linux way is not always the best way.

reply

upvote

by 17186274401 days ago|

[-]

I found MinGW to be quite nice, but ymmv.

> to try and do things TheLinuxWay

It's not really about TheLinuxWay. It's more that Microsoft completely lacks POSIX tools at all and the compiler needs to have a complete IDE installed, which I would need a license for, and the compiler invocation also doesn't really correspond to any other compiler.

reply

upvote

by forrestthewoods1 days ago|

[-]

> Microsoft completely lacks POSIX tools

True!

> compiler needs to have a complete IDE installed

Not true. You can download just MSVC the toolchain sans IDE. Works great. https://stackoverflow.com/questions/76792904/how-to-install-...

> compiler invocation also doesn't really correspond to any other compiler

True. But you don’t have to use MSVC. You can just use Clang for everything.

Clang on Windows does typically use the Microsoft C++ standard library implementation. But that’s totally fine and won’t impact your invocation.

reply

upvote

by 171862744015 hours ago|

[-]

But then I don't understand your complaints against MSYS2/MinGW. MSYS2 UCRT (the default environment) is a collection of POSIX tools and GCC to compile against the Microsoft C++ standard library. The only difference to what you tell me is completely fine is, that it uses GCC instead of Clang. Other MSYS2 environments are Clang instead of GCC.

MinGW is the open-source implementation of the Windows API, so that you can use the Microsoft C++ standard library, without needing to use the MS toolchain.

reply

upvote

by forrestthewoods7 hours ago|

[-]

Using MinGW and POSIX tools is trying to force a square Linux peg through a round Windows hole. You can try and force it if you want.

If you started with a native Windows-only project you would never use MinGW. Probably 0.01% of Windows projects use GCC.

Over the years I have come to associate “project uses MinGW” with “this probably take two days of my life to get running and I’m just going to hit hurdle after hurdle after hurdle”.

The whole Linux concept of a “dev environment” is kind of really bad and broken and is why everyone uses Docker or Linux or one of a dozen different mutually incompatible environments.

The actually correct thing to do is for projects to include their fucking dependencies so they JustWork without jumping through all these hoops.

reply

upvote

by 171862744015 hours ago|

[-]

> Not true. You can download just MSVC the toolchain sans IDE. Works great.

How is the standalone MS build system called?

reply

upvote

by forrestthewoods7 hours ago|

[-]

The standalone IDE-less build tools comes with MsBuild.exe. So you just use that.

reply

upvote

by WJW2 days ago|

[-]

I don't think we are talking about the same type of software? The type I was talking about will only ever run on Linux because it's a (HTTP-ish) server that will only ever run on Linux.

Probably a server that is only ever run by a single company on a single CPU type. That company will have complete control of the OS stack, so if it says no Windows, then no Windows has to be supported.

reply

upvote

by forrestthewoods1 days ago|

[-]

cool

reply

upvote

by vidarh1 days ago|

[-]

I've worked on dozens of "adult" projects for 30 years, only 2 of which ever needed to run against the Win32 API, and only one of which ever ran on Windows. There's a whole world of people out there who don't care about Windows compatibility because it's usually not relevant to the work we do.

reply

upvote

[-]

deleted

reply

upvote

by rfl8902 days ago|

[-]

You can make CRT-free Win32 programs, read this guide[1] and you're all set. I've written a couple CLI utilities which are completely CRT-free and weigh just under a few kilobytes.

[1]: https://nullprogram.com/blog/2023/02/15/

reply

upvote

by matheusmoreira1 days ago|

[-]

Almost freestanding. It still requires you to link against kernel32 and use the functions it provides. This is because issuing system calls directly to the Windows kernel is not supported. The kernel developers reserve the right to change things like system call numbers, so they can't be hardcoded into the application.

reply

upvote

by rfl8901 days ago|

[-]

Kernel32.dll is loaded into all Windows processes by default, so you actually can have a valid, working Windows binary with 0 entries in the import table. See here[1] for a "Hello world" program written as such.

[1]: https://gist.github.com/rfl890/195307136c7216cf243f7594832f4...

reply

upvote

by matheusmoreira14 hours ago|

[-]

That's interesting. How does it work?

  PEB *peb = (PEB *)__readgsqword(0x60);
    
  LIST_ENTRY *current_entry = peb->Ldr->InMemoryOrderModuleList.Flink->Flink;

It just obtains a pointer to the loader's data structures out of nowhere?

Is this actually supported by Microsoft or are people going to end up in a Raymond Chen article if they use this?

reply

upvote

by forrestthewoods1 days ago|

[-]

> Almost freestanding. It still requires you to link against kernel32

Nitpick: the phrase “link against kernel32” feels like a Linux-ism. If you’re only calling a few function you need to load kernel32.dll and call some functions in it. But that’s a slightly different operation than linking against it. At least how I’ve always used the term link.

You’re not wrong in principle. But Linux and Windows do a lot of things differently wrt linking and loading libs. (I think Windows does it waaay better but ymmv)

reply

upvote

by 17186274401 days ago|

[-]

> (I think Windows does it waaay better but ymmv)

Can you elaborate on that?

Btw., I don't want to bash Windows here, I think the Windows core OS developers are (one of) the only good developers at Microsoft. The NT kernel is widely praised for its quality and the actual OS seems to be really solid. They just happen to also have lots of shitty company sections that release crappy software and bundle malware, ads and telemetry with the actual OS.

reply

upvote

by forrestthewoods1 days ago|

[-]

Windows 11 Pro with O&O Shutup is perfectly fine. You’re not wrong and the trend is concerning.

But on the actual topic. I think “Linux” does a few things way worse. (Technically not Linux but GCC/Clang blah blah blah).

Linux does at least three dumb things. 1) Treat static/dynamic linking the same 2) No import line 3) global system shared libraries.

All three are bad. Shared/dynamkc libraries should be black boxes. Import libs are just objectively superior to the pure hell that is linking an old version of glibc. And big ball or global shared libraries is such a catastrophic failure that Docker was invented to hack around it.

reply

upvote

by 17186274401 days ago|

[-]

Can you write that so, that people who are dumb and don't know the Windows way also get it?

reply

upvote

by forrestthewoods20 hours ago|

[-]

> Treat static/dynamic linking the same

Imagine you have an executable with a random library that has a global variable. Now you have a shared/dynamic library that just so happens to use that library deep in its bowels. It's not in the public API, it's an implementation detail. Is the global variable shared across the exe and shared lib or not? On Linux it's shared, on Windows its not.

I think the Windows way is better. Things randomly breaking because different DLLs randomly used the same symbol under the hood is super dumb imho. Treating them as black boxes is better. IMHO. YMMV.

> No import lib (typo! lib, not line)

In Linux (not the kernal blah blah blah) when you link a shared library - like glibc - you typically link the actual shared library. So on your build machine you pass /path/to/glibc.so as an argument. Then when your program runs it dynamically loads whatever version of glibc.so is on that machine.

On Windows you don't link against foo.dll. Instead you link against a thin, small import lib called (ideally) foo.imp.lib.

This is better for a few reasons. For one, when you're building a program that intends to use a shared library you shouldn't actually require a full copy of that lib. It's strictly unnecessary by definition.

Linux (gcc/clang blah blah blah) makes it really hard to cross-compile and really hard to link against older versions of a library than is on your system. It should be trivial to link against glibc2.15 even if your system is on glibc2.40.

> global system shared libraries

The Linux Way is to install shared libraries into the global path. This way when openssl has a security vuln you only need to update one library instead of recompile all programs.

This architecture has proven - imho objectively - to be an abject and catastrophic failure. It's so bad that the world invented Docker so that a big complicated expensive slow packaging step has to be performed just to reliably run a program with all its dependencies.

Linux Dependency Hell is 100x worse than Windows DLL Hell. In Windows the Microsoft system libraries are ultra stable. And virtually nothing gets installed into the global path. Computer programs then simply include the DLLs and dependencies they need. Which is roughly what Docker does. But Docker comes with a lot of other baggage and complexity that honestly just isn't needed.

These are my opinions. They are not held by the majority of HN commenters. But I stand by all of them! Not mentioned is that Windows has significantly better profilers and debuggers than Linux. That may change in the next two years.

Also, super duper unpopular opinion, but bash sucks and any script longer than 10 lines should be written in a real language with a debugger.

reply

upvote

by 171862744016 hours ago|

[-]

> On Linux it's shared, on Windows its not.

Yes, the default compiler invocation makes all symbols exported. But leaving it like that is super lazy, it will likely break things (like you wrote). You can change the default with -fvisibility=[default|internal|hidden|protected] and it's kind of expected that you do. Oh, and I just found out that GCC has -fvisibility-ms-compat, to make it work like the MS compiler.

> Instead you link against a thin, small import lib called (ideally) foo.imp.lib.

Interesting. How is that file created? Is it created automatically, when you build foo.dll? How is it shipped? Is it generally distributed with foo.dll, because then I don't really see the benefit of linking against foo2.15.imp.lib compared to foo2.15.dll.

> It should be trivial to link against glibc2.15 even if your system is on glibc2.40.

It don't know if you know that, but on Linux glibc2.40 is not really only version 2.40. It includes all the versions up to 2.40. When you link against a symbol that was last changed in 2.15, you link against glibc2.15, not against glibc2.40. If you only use symbols from glibc2.15, then you have effectively linked the complete program against glibc2.15.

But yes, enforcing this should be trivial. I think this a common complaint.

> The Linux Way is to install shared libraries into the global path.

Only in so far, as on Windows you put the libraries into 'C:\Program Files\PROGRAM\' and on Linux into '/usr/lib/PROGRAM/'. You of course shouldn't dump all your libraries into '/usr/lib'. That's different when you install a library by itself. I don't know how common that is on Windows?

I don't really know what problems you have in mind, but it seems like you think a program would have a dependency on 'libfoo.so', so at runtime it could randomly break by getting linked against another libfoo, that happens to be in the library path. But that is not the case, you link against '/usr/lib/foo.so.6'. Relying on runtime environment paths for linking is as bad as calling execve("bash foo") and this is a security bug. Paths are for the user, so that he doesn't need to specify the full path, not for programs to use for dependency management. Also when you don't want updates to minor versions, then you can link to '/usr/lib/foo.so.6.2'. And when you don't want bugfixes, you can link against '/usr/lib/foo.so.6.2.15', but that would be super dumb in my opinion. On Linux ABIs have there own versions differently from the library versions, I agree that this can be confusing for newcomers.

A fundamentally difference is also that there is a single entity controlling installation on Linux. It is the responsibility of the OS to install programs, bypassing that just creates a huge mess. I think that is the better way and both Apple and Microsoft are moving to that way, but likely for other reasons (corporate control). This doesn't mean, that the user can't install his own programs which aren't included in the OS repository. OS repository != OS package manager. I think when you can bother to create foo-installer.exe, you should also create foo.deb . Extracting foo.zip into C:\ is also a dumb idea, yet some people think it suddenly isn't dumb anymore when doing it on Linux.

PIP and similar projects are a bad idea, in my opinion. When someone wants to create their own package system breaking the OS, they should have at least the decency to roll it in /opt. Actually that is not a problem in Python proper. They have essentially solved that for decades and all that dance with venv, uv and what else is completely unnecessary. You can install different Python installation into the OS path. Python installs into /usr/bin/python3.x and creates /usr/lib/python3.x/ by default. Each python version will only use the appropriate libraries. That's my unpopular opinion. That mess is why Docker was created, but in my opinion that does not come from following the Linux way, but by actively sabotaging it.

> Also, super duper unpopular opinion, but bash sucks and any script longer than 10 lines should be written in a real language with a debugger.

Bash's purpose is to cobble programs together and setup pipes and process hierarchies and job control. It excels at this task. Using it for anything else sucks, but I don't think that is widely disputed.

reply

upvote

by matheusmoreira1 days ago|

[-]

Linux does none of those things. That's user space stuff. Linux loads your ELF and jumps to its entry point. That's it.

Linux is so great you're actually free to remake the entire user space in your image if you want. It's the only kernel that lets you do it, all the others force you to go through C library nonsense, including Windows.

The glibc madness you described is just a convention, kept in place by inertia. You absolutely can trash glibc if you want to. I too have a vision for Linux user space and am working towards realizing it. Nothing will happen unless someone puts the work in.

reply

upvote

by forrestthewoods1 days ago|

[-]

Yes that’s all filed under blah blah blah.

Some people use “Linux” to exclusively refer to the Linux kernel. Most people do not.

reply

upvote

by 171862744015 hours ago|

[-]

Linux by default does mean Linux kernel, but in my reply I didn't cared about that either. When all know what is meant, that is fine in my opinion.

I think it is important to have GNU/Linux in mind, because there are OSs that don't use glibc and work totally different, so none of your complaints apply. But yes, most people think of GNU/Linux, when you tell them about Linux.

It is also relevant to consider that there is no OS called GNU/Linux. The OSs are called Debian, Arch, OpenSuSE, Fedora, ... . It is fine for different OS to have differently working runtime linkers and installation methods, but some people act surprised when they find out ignoring that doesn't work.

reply

upvote

by matheusmoreira1 days ago|

[-]

Loading means creating a memory image of the library. Linking means resolving the symbols to addresses within that memory image.

Loading a library and calling some functions from it is linking. The function pointer you receive is your link to the library function.

reply

upvote

by forrestthewoods1 days ago|

[-]

You’re not wrong per se. But it was phrased in a very linuxy way imho.

> Linking means resolving the symbols to addresses within that memory image.

Well, you can call LoadLibrary and GetProcAddress. Which is arguably linking. But does not use the linker at link time. Although LoadLibrary is in kernel32!

reply

upvote

by 17186274401 days ago|

[-]

Linker is short for Link Loader, so I don't now what your definition of linking is, if it doesn't include loading.

reply

upvote

by forrestthewoods2 days ago|

[-]

Great post!

reply

upvote

by antihero2 days ago|

[-]

> Windows support is a requirement

Why, exactly?

reply

upvote

by AnimalMuppet2 days ago|

[-]

> Windows support is a requirement...

For what?

There is some software for which Windows support is required. There are others for which it is not, and never will be. (And for an article about running ELF files on RiscV with a Linux OS, the "Windows support" complaint seems a bit odd...)

reply

upvote

by throwawaysoxjje2 days ago|

[-]

A requirement from whom? To do what?

reply

upvote

by pmc002 days ago|

[-]

You can do this in Windows too, useful if you want tiny executables that use minimum resources.

I wrote this little systemwide mute utility for Windows that way, annoying to be missing some parts of the CRT but not bad, code here: https://github.com/pablocastro/minimute

reply

upvote

by gpm2 days ago|

[-]

I thought windows had an unstable syscall interface?

reply

upvote

by Dwedit2 days ago|

[-]

Pretty much yeah.

You have your usual Win32 API functions found in libraries like Kernel32, User32, and GDI32, but since after Windows XP, those don't actually make system calls. The actual system calls are found in NTDLL and Win32U. Lots of functions you can import, and they're basically one instruction long. Just SYSENTER for the native version, or a switch back to 64-bit mode for a WOW64 DLL. The names of the function always begin with Nt, like NtCreateFile. There's a corresponding Kernel mode call that starts with Zw instead, so in Kernel mode you have ZwCreateFile.

But the system call numbers used with SYSENTER are indeed reordered every time there's a major version change to Windows, so you just call into NTDLL or Win32U instead if you want to directly make a system call.

reply

upvote

by LegionMammal9782 days ago|

[-]

It looks like that project does link against the usual Windows DLLs, it just doesn't use a static or dynamic C runtime.

reply

upvote

by pmc002 days ago|

[-]

Windows isn’t quite like Linux in that typically apps don’t make syscalls directly. Maybe you could say what’s in ntdll is the system call contract, but in practice you call the subsystem specific API, typically the Win32 API, which is huge compared to the Linux syscall list because it includes all sorts of things like UI, COM (!), etc.

The project has some of the properties discussed above such as not having a typical main() (or winmain), because there’s no CRT to call it.

reply

upvote

by turbert2 days ago|

[-]

Its been a while since I've touched this stuff but my recollection is the ELF interpreter (ldso, not the kernel) is responsible for everything after mapping the initial ELF's segments.

iirc execve maps pt_load segments from the program header, populates the aux vector on the stack, and jump straight to the ELF interpreter's entry point. Any linked objects are loaded in userspace by the elf interpreter. The kernel has no knowledge of the PLT/GOT.

reply

upvote

by matheusmoreira1 days ago|

[-]

That's right!

https://lwn.net/Articles/631631/

https://github.com/torvalds/linux/blob/master/fs/binfmt_elf....

Especially relevant for dynamic linkers is the AT_PHDR and AT_BASE auxiliary vector entries which provide the address of the executable's program header table and the address of the interpreter, respectively.

https://lwn.net/Articles/519085/

reply

upvote

by nneonneo1 days ago|

[-]

For a fun example of a crash that can occur before main() even starts: https://stackoverflow.com/questions/12570374/floating-point-...

The poster was receiving a SIGFPE (floating point exception) on a C program that is simply “int main() { return 0; }”. A fun little mystery to dive into!

reply

upvote

by Animats1 days ago|

[-]

From the title, I thought this was going to be about the parts of a program that run before the main function is entered. Static objects have to be constructed. Quite a bit of code can run. Order of initialization can be a problem. What happens if you try to do I/O from a static constructor? Does that even work?

reply

upvote

by amitprasad1 days ago|

[-]

This is heavily language runtime dependent — there’s nothing that fundamentally stops you from doing anything during the phase between jumping to an entry point and the main()

reply

upvote

by abnercoimbre1 days ago|

[-]

Indeed the craziest among us occasionally abuse this fact, so long as the compiler implementation lets us.

reply

upvote

by Animats1 days ago|

[-]

Right. This tends to come up with packages, which just by virtue of being loaded, set up to do something such as log, print, catch errors, or phone home to something.

reply

upvote

by bignerd_952 days ago|

[-]

As someone who teaches this stuff at university, I see students getting confused every single year by how textbooks draw memory. The problem is mostly visual, not conceptual.

Most diagrams in books and slides use an old hardware-centric convention: they draw higher addresses at the top of the page and lower addresses at the bottom. People sometimes justify this with an analogy like “floors in a building go up,” so address 0x7fffffffe000 is drawn “higher” than 0x400000.

But this is backwards from how humans read almost everything today. When you look at code in VS Code or any other IDE, line 1 is at the top, then line 2 is below it, then 3, 4, etc. Numbers go up as you go down. Your brain learns: “down = bigger index.”

Memory in a real Linux process actually matches the VS Code model much more closely than the textbook diagrams suggest.

You can see it yourself with:

cat /proc/$$/maps

(pick any PID instead of $$).

...

[0x00000000] lower addresses

...

[0x00620000] HEAP start

[0x00643000] HEAP extended ↓ (more allocations => higher addresses)

...

[0x7ffd8c3f7000] STACK top (<- stack pointer)

                  ↑ the stack pointer starts here and moves upward

                  (toward lower addresses) when you push

[0x7ffd8c418000] STACK start

...

[0xffffffffff600000] higher addresses

...

The output is printed from low addresses to high addresses. At the top of the output you'll usually see the binary, shared libs, heap, etc. Those all live at lower virtual addresses. Farther down in the output you'll eventually see the stack, which lives at a higher virtual address. In other words: as you scroll down, the addresses get bigger. Exactly like scrolling down in an editor gives you bigger line numbers.

The phrases “the heap grows up” and “the stack grows down” aren't wrong. They're just describing what happens to the numeric addresses: the heap expands toward higher addresses, and the stack moves into lower addresses.

The real problem is how we draw it. We label “up” on the page as “higher address,” which is the opposite of how people read code or even how /proc/<pid>/maps is printed. So students have to mentally flip the diagram before they can even think about what the stack and heap are doing.

If we just drew memory like an editor (low addresses at the top, high addresses further down) it would click instantly. Scroll down, addresses go up, and the stack sits at the bottom. At that point it’s no longer “the stack grows down”: it’s just the stack pointer being decremented, moving to lower addresses (which, in the diagram, means moving upward).

reply

upvote

by krackers1 days ago|

[-]

The stack does grow down though no matter what, in the sense that the pushing decrements the stack pointer. You can represent this as "up" in your diagram, but I don't think this makes it any easier conceptually because by analogy to a simple push/pop on an array, you'd naively expect higher addresses to contain more recent stack contents.

The core of the issue is that the direction stack growth differs from "usual" memory access patterns which usually allocate from lower to higher addresses (consider array access, or how strings are laid out in memory. And little-endian systems are the majority)

But if we're going with visualization options I prefer to visualize it horizontally, with lower addresses on left. This has a natural correspondence with how you access an array or lay out strings in memory.

reply

upvote

by bignerd_951 days ago|

[-]

Please try to draw, step by step, a process where lower addresses are at the top and higher addresses are at the bottom. You’ll see that this makes everything much easier to understand.

Do not confuse this with push and pop on an abstract stack data structure. That is not the same as the process stack. On a real process stack, newer data is stored at LOWER addresses. In fact, every push decrements the stack pointer (the address is decreased).

If you want an example, think about how a string is placed and accessed on the stack. First, the stack pointer is decremented to reserve space (so in my diagram this “moves up” visually). Then the string can be read byte by byte by incrementing an index from the lower address toward the higher address. This is exactly like reading a book: left to right, top to bottom. If you flip memory upside down, everything becomes unnatural to understand: you would have to read the string from the bottom to the top.

Try decompiling a program with Ghidra. Open the disassembly view and look at the addresses on the left. Lower addresses are shown at the top. Higher addresses are shown at the bottom. In my diagram this matches perfectly. Everything is consistent and you never have to mentally flip the memory layout.

Years of practice led me to this, not just theory.

reply

upvote

by amitprasad2 days ago|

[-]

I think I got stuck in the same rut that I learned address space in whilst writing that diagram. I would tend to agree with you that your model makes much more sense to the student.

Related: In notation, one thing that I used to struggle with is how addresses (e.g. 0xAB_CD) actually have the bit representation of [0xCD, 0xAB]. Wonder if there's a common way to address that?

reply

upvote

by bignerd_952 days ago|

[-]

If you're referring to little-endianness, it means the CPU stores multi-byte values in memory with the least significant byte first (at the lowest address).

This convention started on early Intel chips and was kept for backward compatibility. It also has a practical benefit: it makes basic arithmetic and type widening cheaper in hardware. The "low" part of the value is always at the base address, so the CPU can load 8 bits, then 16 bits, then 32 bits, etc. starting from the same address without extra offset math.

So when you say an address like 0xABCD shows up in memory as [0xCD, 0xAB] byte-by-byte, that's not the address being "reversed". That's just the little-endian in-memory layout of that numeric value.

There are also big-endian architectures, where the most significant byte is stored at the lowest address. That matches how humans usually write numbers (0xABCD in memory as [0xAB, 0xCD]). But most mainstream desktop/server CPUs today are little-endian, so you mostly see the little-endian view.

reply

upvote

by amitprasad2 days ago|

[-]

Not so much the confusion of what little endian is, but how we tend to represent it in notation. Of course this confusion was back when I was first learning things in high school, but I imagine I’m not alone in it

reply

upvote

by bignerd_952 days ago|

[-]

Yes, I reached the same conclusions the hard way while exploiting memory corruption bugs. Once I understood how misleading these representations can be, everything finally became clear.

About the address notation you're describing, I'm not sure I fully get the problem. Can you spell out the question with a concrete example?

This is what the address space of a real bash process looks like on my machine:

__

$ cat /proc/$(pidof bash)/maps

5e6e8fd0f000-5e6e8fd3f000 r--p 00000000 fc:00 3539412 /usr/bin/bash

5e6e8fd3f000-5e6e8fe2e000 r-xp 00030000 fc:00 3539412 /usr/bin/bash

5e6e8fe2e000-5e6e8fe63000 r--p 0011f000 fc:00 3539412 /usr/bin/bash

5e6e8fe63000-5e6e8fe67000 r--p 00154000 fc:00 3539412 /usr/bin/bash

5e6e8fe67000-5e6e8fe70000 rw-p 00158000 fc:00 3539412 /usr/bin/bash

5e6e8fe70000-5e6e8fe7b000 rw-p 00000000 00:00 0

5e6e94891000-5e6e94a1e000 rw-p 00000000 00:00 0 [heap]

7ec3d1400000-7ec3d16eb000 r--p 00000000 fc:00 3550901 /usr/lib/locale/locale-archive

7ec3d1800000-7ec3d1828000 r--p 00000000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1828000-7ec3d19b0000 r-xp 00028000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d19b0000-7ec3d19ff000 r--p 001b0000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d19ff000-7ec3d1a03000 r--p 001fe000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1a03000-7ec3d1a05000 rw-p 00202000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1a05000-7ec3d1a12000 rw-p 00000000 00:00 0

7ec3d1a2b000-7ec3d1a84000 r--p 00000000 fc:00 3549063 /usr/lib/locale/C.utf8/LC_CTYPE

7ec3d1a84000-7ec3d1a85000 r--p 00000000 fc:00 3549069 /usr/lib/locale/C.utf8/LC_NUMERIC

7ec3d1a85000-7ec3d1a86000 r--p 00000000 fc:00 3549072 /usr/lib/locale/C.utf8/LC_TIME

7ec3d1a86000-7ec3d1a87000 r--p 00000000 fc:00 3549062 /usr/lib/locale/C.utf8/LC_COLLATE

7ec3d1a87000-7ec3d1a88000 r--p 00000000 fc:00 3549067 /usr/lib/locale/C.utf8/LC_MONETARY

7ec3d1a88000-7ec3d1a89000 r--p 00000000 fc:00 3549066 /usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES

7ec3d1a89000-7ec3d1a8a000 r--p 00000000 fc:00 3549070 /usr/lib/locale/C.utf8/LC_PAPER

7ec3d1a8a000-7ec3d1a8b000 r--p 00000000 fc:00 3549068 /usr/lib/locale/C.utf8/LC_NAME

7ec3d1a8b000-7ec3d1a8c000 r--p 00000000 fc:00 3549061 /usr/lib/locale/C.utf8/LC_ADDRESS

7ec3d1a8c000-7ec3d1a8d000 r--p 00000000 fc:00 3549071 /usr/lib/locale/C.utf8/LC_TELEPHONE

7ec3d1a8d000-7ec3d1a90000 rw-p 00000000 00:00 0

7ec3d1a90000-7ec3d1a9e000 r--p 00000000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1a9e000-7ec3d1ab1000 r-xp 0000e000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ab1000-7ec3d1abf000 r--p 00021000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1abf000-7ec3d1ac3000 r--p 0002e000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ac3000-7ec3d1ac4000 rw-p 00032000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ac4000-7ec3d1ac5000 r--p 00000000 fc:00 3549065 /usr/lib/locale/C.utf8/LC_MEASUREMENT

7ec3d1ac5000-7ec3d1ac6000 r--p 00000000 fc:00 3549064 /usr/lib/locale/C.utf8/LC_IDENTIFICATION

7ec3d1ac6000-7ec3d1acd000 r--s 00000000 fc:00 3548984 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache

7ec3d1acd000-7ec3d1acf000 rw-p 00000000 00:00 0

7ec3d1acf000-7ec3d1ad0000 r--p 00000000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1ad0000-7ec3d1afb000 r-xp 00001000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1afb000-7ec3d1b05000 r--p 0002c000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1b05000-7ec3d1b07000 r--p 00036000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1b07000-7ec3d1b09000 rw-p 00038000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ffd266f8000-7ffd26719000 rw-p 00000000 00:00 0 [stack]

7ffd2678a000-7ffd2678e000 r--p 00000000 00:00 0 [vvar]

7ffd2678e000-7ffd26790000 r-xp 00000000 00:00 0 [vdso]

ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]

___

Each line is a memory mapping. The first field is the start address. The second field is the end address. So an entry like

7ffd266f8000-7ffd26719000

means "this mapping covers virtual addresses from 0x7ffd266f8000 up to 0x7ffd26719000."

The addresses are always increasing:

- left to right: within a single line you go from lower address to higher address

- top to bottom: as you go down the list you also go to higher and higher addresses

Exactly like reading a book: left to right and then top to bottom.

reply

upvote

by 17186274402 days ago|

[-]

The issue amitprasad is pointing out is when you read addresses byte-wise and you determine that they are in little-endian.

reply

upvote

by 17186274402 days ago|

[-]

That's how stacks on my desk grow and how everything grows in reality. I wouldn't numerate stacked things on my desk from the top, since this constantly changes. You also wouldn't name the first branch of a tree (the plant) to be the top-most one.

In your example "the stack grows down", seems to be wrong in the image.

reply

upvote

by bignerd_952 days ago|

[-]

Thanks! I tried to rewrite the final sentence

reply

upvote

by 17186274402 days ago|

[-]

Yeah, but does that really help? The phrases "growing down/up" still exist and now you defined them to mean the opposite. This issue still didn't go away, since heap and stack still grow in different directions. Can't you just start drawing from the bottom of the blackboard, and it will be obvious? Coordinate systems also typically work that way.

reply

upvote

by bignerd_952 days ago|

[-]

Yes, I draw the heap starting at the top of the board and the stack starting at the bottom of the board and grow them toward each other. That works fine in a one-off explanation.

The problem is that most textbooks draw the opposite, so the student leaves my lecture, opens a book or a slide deck, and now “down” means a different thing.

It gets worse when they get curious and look at a real process with /proc/<pid>/maps. Linux prints mappings from low address to high address as you scroll down (which matches my representation). That is literally reversed from the usual textbook diagram. Students notice and ask why the book is “wrong.”

So I've learned I have to explicitly call this out as notation.

Same story as in electronics class still teaching conventional current flow (positive to negative), even though electrons move the other way (negative to positive). Source: https://www.allaboutcircuits.com/textbook/direct-current/chp.... Historical convention, and then pedagogy has to patch it forever.

reply

upvote

by ofalkaed2 days ago|

[-]

Starting at the bottom of the blackboard would be backwards from how it prints in the terminal when you cat /proc/<pid>/maps.

reply

upvote

by 17186274401 days ago|

[-]

The way it's printed in the terminal is honestly backwards to me. This seams to only come from the scroll direction of the terminal, and because this is not a drawing, but a simple list. Every other tool like a debugger show it in the opposite direction and in all illustrations I have read it's that way too.

reply

upvote

by bignerd_951 days ago|

[-]

So you use debuggers. Good. Then you can confirm that the program counter is incremented after each instruction, and that you read assembly from top to bottom. That means smaller addresses are at the top and larger addresses are at the bottom. This matches my layout, and it also matches what you see in the terminal in /proc/<pid>/maps.

reply

upvote

by 17186274401 days ago|

[-]

Incrementation direction is orthogonal to up/down. When I disassemble a program, the addresses are all over the place and not ordered.

> That means smaller addresses are at the top and larger addresses are at the bottom.

No, my mental model is the exact opposite and this matches the jargon out there.

> also matches what you see in the terminal in /proc/<pid>/maps

I think of this as a sorted list, not as a display or description of a model.

When I drive on a road, I think of think of things on the road near me to have lower addresses in my coordinate system and things further away as having higher addresses. When I write a list of things I see, this will be from left-to-right then from top-to-bottom on a sheet of paper, because it is a list that follows the writing directions of my language/script. When I look at a traffic sign things nearer to me will be at the bottom and things far away at the top, because that's the agreed-upon mental model of a road. When I look at my navi, things near me will be in the center and things far away from me at the edge of the display.

When I write down points in the first sector of the coordinate system, I might order things according to the x-coordinate ascending top-to-bottom. That doesn't mean I would draw the axis inverted.

The correspondence of physical addresses to position is entirely non-linear and also three dimensional so there is no natural top and bottom especially when we are talking about virtual addresses.

When I get taught a new concept I want to get to know the model everyone uses. I will not like a teacher, that tells me a different ordering which is different from how everyone else does it, because this the output of some random command on some random OS, which actually shows a list, not a graph of a memory model. (Sorry that's harsh, of course I still appreciate didactic simplification.)

Maybe the issue is that you consider the stack to be so important to determine the model of the whole process space. When I would draw a stack on it's own, I DO draw it from top-to-bottom. But when I draw a whole process space, I do not, because everything else is mapped/allocated from bottom-to-top. When you invert the direction of the mental model, yes the stack now grows from bottom-to-top. But no, the other things are now allocated from top-to-bottom instead. This are more things: the text, libraries, mmap'd files, and most-used thing: the heap are all allocated inverted now. And the most important thing, the index for all that: the addresses now are allocated from top-to-bottom.

reply

upvote

by krackers1 days ago|

[-]

Yeah this is sort of the same objection I had in https://news.ycombinator.com/item?id=45709016

Although thinking about it more, the fact that address indexing is now from top-to-bottom is actually consistent with how I imagine indexing for Nx1 array (or equivalently, how you index matrices in math).

Thinking about it more, I don't think there is any real convention, even diagrams are split. You are going to have to learn to "flip" things either way, the same way matrix indexing differs from cartesian plane indexing. Same way different graphical systems have different conventions for where the origin is. People colloquially say things like "high mem", and you'll have to translate this to your mental model.

It's why I suggested a horizontal, left-to-right visualization. I think everyone would agree that lower addresses on the left, higher addresses on right.

Also could you elaborate more on what you mean by debuggers showing it the opposite direction? If you do `info proc mappings` in gdb it is also lower addresses at top. This might be debugger specific though.

reply

upvote

by hagbard_c2 days ago|

[-]

On the subject of symbols:

> Yeah, that’s it. Now, 2308 may be slightly bloated because we link against musl instead of glibc, but the point still stands: There’s a lot of stuff going on behind the scenes here.

Slightly bloated is a slight understatement. The same program linked to glibc tops at 36 symbols in .symtab:

    $ readelf -a hello|grep "'.symtab'"
    Symbol table '.symtab' contains 36 entries:

reply

upvote

by amitprasad2 days ago|

[-]

Ah I should have taken the time to verify; It might also have something to do with the way I was compiling / cross-compiling for RISC-V!

More generally, I'm not surprised at the symtab bloat from statically-linking given the absolute size increase of the binary.

reply

upvote

by itopaloglu832 days ago|

[-]

I like doing this with old microcontrollers like PIC16 series etc. You said see how to stack pointer, timers, and variables etc. all are configured.

reply

upvote

by ramanvarma2 days ago|

[-]

did you see the relocations for the main binary applied before or after the linker resolves its own symbols? the ordering always feels like black magic when you step through it in a debugger

reply

upvote

by yawpitch1 days ago|

[-]

You’ve got a broken link in your markdown, round about the phrase “lang_start function (defined here)”.

reply

upvote

by matheusmoreira1 days ago|

[-]

Hacking this stuff is so fun!!

> Depending on your program, _start may be the only thing between the entrypoint and your main function

I once developed a liblinux project entirely built around this idea.

I wanted to get rid of libc and all of its initialization, complexity and global state. The C library is so complex it has a primitive form of package management built into it:

https://blogs.oracle.com/solaris/post/init-and-fini-processi...

So I made _start functions which did nothing but pass argc, argv, envp and auxv to the actual main function:

https://github.com/matheusmoreira/liblinux/blob/master/start...

https://github.com/matheusmoreira/liblinux/blob/master/start...

You can get surprisingly far with just this, and it's actually possible to understand what's going on. Biggest pain point was the lack of C library utility functions like number/string conversion. I simply wrote my own.

https://github.com/matheusmoreira/liblinux/tree/master/examp...

Linux is the only operating system that lets us do this. In other systems, the C library is part of the kernel interface. Bypassing it like this can and does break things. Go developers once discovered this the hard way.

https://www.matheusmoreira.com/articles/linux-system-calls

The kernel has their own nolibc infrastructure now, no doubt much better than my project.

https://github.com/torvalds/linux/tree/master/tools/include/...

I encourage everyone to use it.

Note also that _start is an arbitrary symbol. The name is not special at all. It's just some linker default. The ELF header contains a pointer to the entry point, not a symbol. Feel free to choose a nice name!

reply