upvote
Are there other merits than availability of literals in C?

It seems like one of the worst data structures ever - lookup complexity of a linked list with a expansion complexity of an array list with security problems added as a bonus.

reply
One I can think of is simplicity. No need to worry about what the type of the string should be (size_t?) or where it should be stored. Just pass around a pointer. Pointers fit the size of a CPU register most of the time. Though in my opinion the drawbacks (O(N) performance, NUL forbidden etc.) outweigh this benefit we are stuck. Many kernel interfaces like open, getdents etc. assume NUL-terminated strings, therefore any low-level language or library has to support them.
reply
But (i32 length, byte[] data) is as complex as (byte[] data, '\0'), its two-parts anyway. Of course it allows potentially for very long strings at the cost of just a single byte spent as a terminator. Beside the rarity of such a case, the "space savings" might play a role on a PDP11, or on a Z80, but not on any of the modern architectures that need structures aligned to 32 or even 64 bit boundary. The efficiency and security costs far outweigh any savings is space or simplicity (heh) of processing.

Null-terminated strings are the other billion-dollar mistake, along with the original NULL.

reply
It's fine as a serialization/deserialization primitive for on-disk files, as long as the NULL character is invalid.

String tables in most object file formats work like that, a concatenated series of ASCIIZ strings. One byte of overhead (NUL), requires only an offset into one to address a string and you can share strings with common suffixes. It's a very compact layout.

reply
Nothing prevents you from using a shared pool of strings that don't have null terminator. It can even be more efficient, since you don't have the null byte to handle at string end. Depending on the maximum string length you want to support, it doesn't even have to take more space.
reply
How do you represent that pool of strings on-disk?

If we concatenate the raw strings together without the null terminator, either all string references will require a length on top of the offset (25% size penalty for a Elf32_Sym), or we'll need a separate descriptor table that stores string offsets and lengths to index into.

If we prepend strings with a length (let's say LEB128), we'll be at best tied with null-terminated strings because we'd have a byte for the length vs. a byte for the terminator. At worst, we'll have a longer string table because we'd need more than one byte to encode a long string length and we would lose the ability to share string suffixes.

Out of all the jank from a.out and COFF that was eliminated with ELF, that representation for the string table was kept (in fact, the only change was mandating a null byte at the beginning to have the offset 0 indicate a null string). It works fine since the 1970s and doesn't cause undue problems, as nothing prevents a parser to spit out std::string_view instead of const char* for the application code.

reply
Hearing someone mention FreeBASIC really brings me back. It was the first language I ever used pointers in.
reply