undefined

points

[-]

> It does not matter which is the relationship between the sizes of such types, there will always be values of the operand that cannot be represented in the result.

Hmm? Seems to me that unsigned -> larger signed works, although other conversions may not.

But yes, I generally agree that these are terrible conversions to do implicitly, given that the entire point of those types is to control the interpretation of memory at a bits-and-bytes level. Languages where implicit numeric conversions make sense are generally not languages that care so much about integer size, and the entire point of having unsigned types is to bake that range constraint in.

by uecker8 hours ago|

prev|

[-]

You could just use -Wsign-conversion.

by adrian_b8 hours ago|

parent|

[-]

Obviously, that should always be used, like also the compiler options for checking integer overflow and accesses out-of-bounds.

However, this kind of implicit conversions must really be forbidden in the standard, because the correct program source is different from the one permitted by the standard.

When you activate most compiler options that detect undefined behaviors, the correct program source remains the same, even if the compiler now implements a better behavior for the translated program than the minimal behavior specified by the standard.

That happens because most undefined behaviors are detected at run time. On the other hand, incorrect implicit conversions are a property of the source code, which is always detected during compilation, so such programs must be rejected.

by gus_massa49 minutes ago|

parent|

[-]

Integer overflow and accesses out-of-bounds must be checked at runtime that makes the program slower. It looks like -Wsign-conversion can be checked at compilation time, perhaps with a few false positives where the numbers are "always" small enough.

Does it also complain when the assigned variable is big enough to avoid the problem? Does the compiler generate slower code with the explicit conversions?

It looks like an nice task to compile major projects with -Wsign-conversion and send PR fixing the warnings. (Assuming they are only a few, let's say 5. Sending an uninvited PR with a thousand changes will make the maintainers unhappy.)

by uecker8 hours ago|

parent|

prev|

[-]

The standard will not forbid anything that breaks billions of lines of code still be used and maintained.

But it is easy enough to use modern tooling and coding styles to deal with signed overflow. Nowadays, silent unsigned wrap around causing logic errors is the more vexing issue, which indicates the undefined behavior actually helps rather than hurts when used with good tooling.

by ojeda1 hours ago|

parent|

[-]

> which indicates the undefined behavior actually helps rather than hurts when used with good tooling

No, one doesn't need undefined behavior for that at all (which does hurt).

What actually helps is diagnosing the issue, just like one can diagnose the unsigned case just fine (which is not UB).

Instead, for this sort of thing, C could have "Erroneous Behavior", like Rust has (C++ also added it, recently).

Of course, existing ambiguous C code will remain to be tricky. What matters, after all, is having ways to express what we are expecting in the source code, so that a reader (whether tooling, humans or LLMs) can rely on that.

by adrian_b8 hours ago|

parent|

prev|

[-]

Silent unsigned wrap around is caused by another mistake of the C language (and of all later languages inspired by C), there is only a single unsigned type.

The hardware of modern CPUs actually implements 5 distinct data types that must be declared as "unsigned" in C: non-negative integers, integer residues a.k.a. modular integers, bit strings, binary polynomials and binary polynomial residues.

A modern programming language should better have these 5 distinct types, but it must have at least distinct types for non-negative integers and for integer residues. There are several programming languages that provide at least this distinction. The other data types would be more difficult to support in a high-level language, as they use certain machine instructions that compilers typically do not know how to use.

The change in the C standard that was made so that now "unsigned" means integer residue, has left the language without any means to specify a data type for non-negative integers, which is extremely wrong, because there are more programs that use "unsigned" for non-negative integers than programs that use "unsigned" for integer residues.

The hardware of most CPUs implements very well non-negative integers so non-negative integer overflow is easily detected, but the current standard makes impossible to use the hardware.

by xgk4 hours ago|

parent|

[-]

> CPUs actually implements 5 distinct data types

Yes, that's true, but the registers themselves are untyped, what modern CPUs really implement is multiple instruction semantics over the same bit-patterns. In short: same bits, five algebras! The algebras are given by different instructions (on the same bit patterns).

Here is an example, the bit pattern 1011:

• as a non-negative integer: 11. ISA operations: Arm UDIV, RISC-V DIVU, x86 DIV

• as an integer residue mod 16: the class [11] in Z/16Z. ISA operations: Arm ADD, RISC-V ADD/ADDI, x86 ADD

• as a bit string: bits 3, 1, and 0 are set. ISA operations: Arm EOR, RISC-V ANDI/ORI/XORI, x86 AND.

• as a binary polynomial: x^3 + x + 1. ISA operations: Arm PMULL, RISC-V clmul/clmulh/clmulr, x86 PCLMULQDQ

• as a binary polynomial residue modulo, say, x^4 + x + 1: the residue class of x^3 + x + 1 in GF(2)[x] / (x^4 + x + 1). ISA operations: Arm CRC32* / CRC32C*, x86 CRC32, RISC-V clmulr

And actually ... the floating point numbers also have the same bit patters, and could, in principle reside in the same registers. On modern ISAs, floats are usually implemented in a distinct register file.

You can use different functions in C on the bit patterns we call unsigned.

by uecker7 hours ago|

parent|

prev|

[-]

There are other languages such as Ada that allow you to more precisely specify such things. Before requesting many new types for C, one should clarify why those languages did not already replace C.

I agree though that using "unsigned" for non-negative integers is problematic and that there should be a way to specify non-negative integers. I would be fine with an attribute.

The problem is also that the standard committee is not the ruling body of the C language. It is the place where people come together to negotiate some minimal requirements. If you want something, you need to first convince the compilers vendors to implement it as an extension.

by 7 hours ago|

parent|

prev|

[-]

deleted

by throw_await3 hours ago|

parent|

prev|

[-]

Those billions of lines are already broken by definition.

by uecker1 hours ago|

parent|

[-]

Sure, buddy.

by fonheponho5 hours ago|

prev|

[-]

> It does not matter which is the relationship between the sizes of such types, there will always be values of the operand that cannot be represented in the result.

It's not that bad actually; not "always". The only nontrivial case is when, as a part of the usual arithmetic conversions, you (perhaps unwittingly) convert a signed integer type to an unsigned integer type [*], and the original value was negative.

[*] This can happen in two cases (paraphrasing the standard):

- if the operand that has unsigned integer type has rank greater than or equal to the rank of the signed integer type of the other operand,

- if the operand that has signed integer type has rank greater than or equal to the rank of the unsigned integer type of the other operand, but the signed integer type cannot represent all values of the unsigned integer type.

Examples: (a) "unsigned int" vs. "signed int"; (b) "long signed int" vs. "unsigned int" in a POSIX ILP32 programming environment. Under (a), you get conversion to "unsigned int"; under (b), you get conversion (for both operands) to "long unsigned int".

Section "3.2 Conversions | 3.2.1 Arithmetic operands | 3.2.1.1 Characters, and integers" in the C89 Rationale <https://www.open-std.org/Jtc1/sc22/WG14/www/C89Rationale.pdf> is worth reading. (An updated version of the same section is included in the C99 Rationale <https://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.1...> under 6.3.1.1.)

It deals precisely with the problem highlighted in the blog post. I'll quote just the beginning and the end:

> Since the publication of K&R, a serious divergence has occurred among implementations of C in the evolution of integral promotion rules. Implementations fall into two major camps, which may be characterized as unsigned preserving and value preserving. [...]

> The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.

> QUIET CHANGE -- A program that depends upon unsigned preserving arithmetic conversions will behave diﬀerently, probably without complaint. This is considered the most serious semantic change made by the Committee to a widespread current practice.