undefined

upvote

points

by AKSF_Ackermann7 hours ago |

upvote

by jcalvinowens5 hours ago|

[-]

Don't ignore endianness. But making little endian the default is the right thing to do, it is so much more ubiquitous in the modern world.

The vast majority of modern network protocols use little endian byte ordering. Most Linux filesystems use little endian for their on-disk binary representations.

There is absolutely no good reason for networking protocols to be defined to use big endian. It's an antiquated arbitrary idea: just do what makes sense.

Use these functions to avoid ifdef noise: https://man7.org/linux/man-pages/man3/endian.3.html

reply

upvote

by drob5181 hours ago|

[-]

What do you mean by “networking protocols,” exactly? Most packet level Internet protocols (TCP, UDP, etc.) are big endian. Ethernet is big endian at the octet level and little endian on the wire at the bit level. Network order is big endian because it has to be something and it’s easier to draw pictures as a matrix of bytes that are transmitted from left to right and top to bottom. There is no right answer to endianness. It’s like which side of the road cars should drive on. You just need to pick one and stick with it. Mostly people bitch about endianness when their processor is the opposite of whatever someone else picked. But processors are all over the map. IBM mainframes are big endian. Motorola 68k is big. HP PA-RISC is big. IBM Power started big and then went bi. MIPS is bi. RISC-V is little. ARM is bi but dominantly little (AArch64). And of course x86 is little. So, take your pick. That said, little endianness is the right answer as is driving on the right side of the road.

reply

upvote

by hmry1 hours ago|

[-]

> RISC-V is little

These days it's bi, actually :) Although I don't see any CPU designer actually implementing that feature, except maybe MIPS (who have stopped working on their own ISA, and now want all their locked-in customers to switch to RISC-V without worrying about endianness bugs)

reply

upvote

by drob5181 hours ago|

[-]

Well, sort of. Instruction fetch is always little-endian but data load/store can be flipped into big. But IIRC the standard profiles specify little, so it's pretty much always going to be little. But yea, technically speaking data load/store could be big. Maybe that's important for some embedded environments.

reply

upvote

by hmry36 minutes ago|

[-]

> Well, sort of. Instruction fetch is always little-endian but data load/store can be flipped into big

ARM works the same way. And SPARC is the opposite, instructions are always big-endian, but data can be switched to little-endian.

reply

upvote

by Veserv2 hours ago|

[-]

You should actually not use format-swapping operations.

You should actually use format-swapping loads/stores (i.e deserialization/serialization).

This is because your computer can not compute on values of non-native endianness. As such, the value is logically converted back and forth on every operation. Of course, a competent optimizer can elide these conversions, but such actions fundamentally lack machine sympathy.

The better model is viewing the endianness as a serialization format and converting at the boundaries of your compute engine. This ensures you only need to care about endianness when serializing and deserializing wire formats and that you have no accidental mixing of formats in your internals; everything has been parsed to native before any computation occurs.

Essentially, non-native endianness should only exist in memory and preferably only memory filled in by the outside world before being parsed.

reply

upvote

by cbmuser3 hours ago|

[-]

> What you should do instead is write all your code so it is little-endian only, as the only relevant big-endian architecture is s390x, and if someone wants to run your code on s390x, they can afford a support contract.

Or you can just be a nice person and make your code endian-agnostic. ;-)

reply

upvote

by addaon6 hours ago|

[-]

There's still at least one relevant big-endian-only ARM chip out there, the TI Hercules. While in the past five or ten years we've gone from having very few options for lockstep microcontrollers (with the Hercules being a very compelling option) to being spoiled for choice, the Hercules is still a good fit for some applications, and is a pretty solid chip.

reply

upvote

by j16sdiz6 hours ago|

[-]

If you comes to low level network protocol (e.g. writing a TCP stack), the "network byte order" is always big-endian.

reply

upvote

by edflsafoiewq5 hours ago|

[-]

That's a serialization format.

reply

upvote

by 7jjjjjjj4 hours ago|

[-]

It goes without saying that all binary network protocols should document their byte order, and that if you're implementing a protocol documented as big endian you should use ntohl and friends to ensure correctness.

However if designing a new network protocol, choosing big endian is insanity. Use little endian, skip the macros, and just add

  #ifndef LITTLE_ENDIAN
    #error

Or the like to a header somewhere.

reply

upvote

by AnthonyMouse3 hours ago|

[-]

What does it actually cost you to define a macro which is a no-op on little endian architectures and then use it at the point of serialization/deserialization?

reply

upvote

by kccqzy2 hours ago|

[-]

A lot because to the compiler a no-op macro is the same as not having the macro in the same place so it won’t catch cases where you should use the macro but didn’t. Then you just give yourself a false sense of security unless you actually test on big endian.

reply

upvote

by AnthonyMouse2 hours ago|

[-]

The article demonstrates how you can run your existing test suite on big endian with a few simple commands. Or you can just wait until someone actually wants to use it there, they run your program or test suite on their actual big endian machine and then you get a one-line pull request for the place you forgot to use the macro.

Adding other architectures to your build system also tends to reveal nasty bugs in general, e.g. you were unknowingly triggering UB on all architectures but on the one you commonly use it causes silent data corruption whereas one with a different memory layout results in a much more conspicuous segfault.

reply

upvote

by whizzter4 hours ago|

[-]

And honestly at this point it's mostly a historical artifact, if we write that kind of stuff then sure we need to care but to produce modern stuff is a honestly massive waste of time at this point.

FWIW I doing hobby-stuff for Amiga's (68k big-endian) but that's just that, hobby stuff.

reply

upvote

by skrtskrt6 hours ago|

[-]

Prometheus index format is also a big-endian binary file - haven’t found any reference to why it was chosen.

reply

upvote

by GandalfHN5 hours ago|

[-]

Outsourcing endianness pain to your customers is an easy way to teach them about segfaults and silent data corruption. s390x is niche, endian bugs are not.

Network protocols and file formats still need a defined byte order, and the first time your code talks to hardware or reads old data, little-endian assumptions leak all over the place. Ignoring portability buys you a pile of vendor-specific hacks later, because your team will meet those 'irrelevant' platforms in appliances, embedded boxes, or somebody else's DB import path long before a sales rep waves a support contract at you.

reply

upvote

by AKSF_Ackermann5 hours ago|

[-]

Not sure why you consider that to be an issue, if you need to interact with a format that specifies values to be BE, just always byte-swap. And every appliance/embedded box i had to interact with ran either x86 or some flavour of 32-bit arm (in LE mode, of course).

reply

upvote

by adrian_b3 hours ago|

[-]

Endianness problems should have been solved by compilers, not by programmers.

Most existing CPUs, have instructions to load and store memory data of various sizes into registers, while reversing the byte order.

So programs that work with big-endian data typically differ from those working with little-endian data just by replacing the load and store instructions.

Therefore you should have types like int16, int32, int64, int16_be, int32_be, int64_be, for little-endian integers and big-endian integers and the compiler should generate the appropriate code.

At least in the languages with user-defined data types and overloadable operators and functions, like C++, you can define these yourself, when the language does not provide them, instead of using ugly workarounds like htonl and the like, which can be very inefficient if the compiler is not clever enough to optimize them away.

reply

upvote

by 5 hours ago|

[-]

deleted

reply

upvote

by 7jjjjjjj5 hours ago|

[-]

Assuming an 8-bit byte used to be a "vendor specific hack." Assuming twos complement integers used to be a "vendor specific hack." When all the 36-bit machines died, and all the one's complement machines died, we got over it.

That's where big endian is now. All the BE architectures are dying or dead. No big endian system will ever be popular again. It's time for big endian to be consigned to the dustbin of history.

reply

upvote

by namibj3 hours ago|

[-]

JS numbers behave much more like C's definition of signed overflow being UB as it's signed numbers are effectively like 51-ish bit with a SEPARATE sign bit and non-assiciative behavior when overflow happens.

reply

upvote

by cmrdporcupine4 hours ago|

[-]

> No big endian system will ever be popular again

Cries in 68k nostalgia

reply

upvote

by zephen3 hours ago|

[-]

> It's time for big endian to be consigned to the dustbin of history.

And, especially what most people call big-endian, which is a bastardized mixed-endian mess of most significant byte is zero, while least significant bit is likewise zero.

reply

upvote

by sllabres4 hours ago|

[-]

Not only the System/390. Its also IBM i, AIX, and for many protocols the network byte order. AFAIK the binary data in JPG (1) and Java Class [2] files a re big endian. And if you write down a hexadecimal number as 0x12345678 you are writing big-endian.

(1) for JPG for embedded TIFF metadata which can have both.

[2] https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.ht...

reply

upvote

by hmry4 hours ago|

[-]

The endianness of file formats and handwriting is irrelevant when it comes to deciding whether your code should support running on big-endian CPUs.

The only question that matters: Do your customers / users want to run it on big-endian hardware? And for 99% of programmers, the answer is no, because their customers have never knowingly been in the same room as a big-endian CPU.

reply

upvote

by socalgal22 hours ago|

[-]

I'm with you this. I lived through the big endian / little endian hell in the 80/90s. Little endian won. Anyone making a big endian architechture at this point would be shooting themselves in the foot because off all the incompatibilities. Don't make things more complicated.

In fact, I'd be surprised if you made a big endian arch and then ran a browser on it if some large number of websites would fail because they used typedarrays and aren't endian aware.

The solution is not to ask every programmer in the universe to write endian aware code. The solution is to standardize on little endian

reply

upvote

by classichasclass1 hours ago|

[-]

We already know that's the case. I had to add little endian typed array emulation to TenFourFox.

reply

upvote

by nyrikki6 hours ago|

[-]

The linked to blog post in the OP explains this better IMHO [0]:

   If the data stream encodes values with byte order B, then the algorithm to decode the value on computer with byte order C should be about B, not about the relationship between B and C.

One cannot just ignore the big/little data interchange problem MacOS[1], Java, TCP/IP, Jpeg etc...

The point (for me) is not that your code runs on a s390, it is that you abstract your personal local implementation details from the data interchange formats. And unfortunately almost all of the processors are little, and many of the popular and unavoidable externalization are big...

[0] https://commandcenter.blogspot.com/2012/04/byte-order-fallac... [1] https://github.com/apple/darwin-xnu/blob/main/EXTERNAL_HEADE...

reply

upvote

by whizzter4 hours ago|

[-]

MacOS "was" big-endian due to 68k and later PPC cpu's (the PPC Mac's could've been little but Apple picked big for convenience and porting).

Their x86 changeover moved the CPU's to little-endian and Aarch64 continues solidifies that tradition.

Same with Java, there's probably a strong influence from SPARC's and with PPC, 68k and SPARC being relevant back in the 90s it wasn't a bold choice.

But all of this is more or less legacy at this point, I have little reason to believe that the types of code I write will ever end up on a s390 or any other big-endian platform unless something truly revolutionizes the computing landscape since x86, aarch64, risc-v and so on run little now.

reply

upvote

by adrian_b3 hours ago|

[-]

To cope with data interchange formats, you need a set of big endian data types, e.g. for each kind of signed or unsigned integer with a size of 16 bits or bigger you must have a big endian variant, e.g. identified with a "_be" suffix.

Most CPUs (including x86-64) have variants of the load and store instructions that reverse the byte order (e.g. MOVBE in x86-64). The remaining CPUs have byte reversal instructions for registers, so a reversed byte order load or store can be simulated by a sequence of 2 instructions.

So the little-endian types and the big-endian data types must be handled identically by a compiler, except that the load and store instructions use different encodings.

The structures used in a data-exchange format must be declared with the correct types and that should take care of everything.

Any decent programming language must provide means for the user to define such data types, when they are not provided by the base language.

The traditional UNIX conversion functions are the wrong way to handle endianness differences. An optimizing compiler must be able to recognize them as special cases in order to be able to optimize them away from the machine code.

A program that is written using only data types with known endianness can be compiled for either little-endian targets or big-endian targets and it will work identically.

All the problems that have ever existed in handling endianness have been caused by programming languages where the endianness of the base data types was left undefined, for fear that recompiling a program for a target of different endianness could result in a slower program.

This fear is obsolete today.

reply

upvote

by userbinator1 hours ago|

[-]

Linus agrees: https://www.phoronix.com/news/Torvalds-No-RISC-V-BE

reply

upvote

by EPWN3D6 hours ago|

[-]

I mostly agree, but network byte ordering is still a thing.

reply

upvote

by bear86426 hours ago|

[-]

> the only relevant big-endian architecture is s390x

The adjacent POWER architecture is also still relevant - but as you say, they too can afford a support contract.

reply

upvote

by AKSF_Ackermann5 hours ago|

[-]

The adjacent POWER architecture seems to be used in ppc64le mode these days.

reply

upvote

by classichasclass3 hours ago|

[-]

For Linux, yes. AIX and IBM i still run big.

reply

upvote

by namibj3 hours ago|

[-]

The latter can definitely afford a support contract.

reply