undefined

points

[-]

Yikes. That would lose the ability to know the meaning of the current bytes, or misinterpret them badly, if you happen to get one critical byte dropped or mangled in transmission. At least UTF-8 is self-syncing: if you end up starting to read in the middle of a non-rewindable stream whose beginning has already passed, you can identify the start of the next valid codepoint sequence unambiguously, and then end up being able to sync up with the stream, and you're guaranteed not to have to read more than 4 bytes (6 bytes when UTF-8 was originally designed) in order to find a sync point.

But if you have to rely on a byte that may have already gone past? No way to pick up in the middle of a stream and know what went before.

by 41 minutes ago|

parent|

[-]

deleted

by yencabulator1 hours ago|

prev|

[-]

A huge, central, part of UTF-8 design is that you can start decoding it from any arbitrary offset, it is self-aligning.