Very few software engineers understand that tolerances are fundamental.
In the physical sciences, strict equality - of actual quantities, not variables in equations - is almost never a thing. Even though an equation might show that two theoretical quantities are exactly equal, the practical fact is that every quantity begins life as a measurement, and measurements have inherent uncertainty.
Even in design work, nothing is exact. It's simply not possible. A resistor's value is subject to manufacturing tolerances, will vary with temperature, and can drift as it ages. A mechanical part will also have manufacturing tolerance, and changes size and shape with temperature, applied forces, and wear. So even if a spec sheet states an exact number, the heading or notes will tell you that this is a nominal value under specific conditions. (Those conditions are also impossible to achieve and maintain exactly for all the same reasons.)
Even the voltages that represent 0 and 1 inside a computer aren't exact. Digital parts like CPUs, GPUs, RAM, etc. specify low and high thresholds, under or over which a voltage is considered a 0 or 1.
Floating-point numbers have uses outside the physical sciences, so there's no one-size-fits-all approach to using them correctly. But if you are writing code that deals with physical quantities, making equality comparisons is almost always going to be wrong even if floating-point numbers had infinite precision and no rounding error. Physical quantities simply can't be used that way.
The thing with computational geometry is, that its usually someone else's geometry, i.e you have no control over its quality or intention. In other words, whether two points or planes or lines actually align or align within 1e-4 is no longer really mathematically interesting because its all about the intention of the user: does the user think these planes overlap?.
This is why most geometry kernels (see open cascade) sport things like "fuzzy boolean operations" [0]) that lean into epsilons. These epsilons mask the error-prone supply chain of these meshes that arrive in your program by allowing some tolerance.
Finally, the remark "There are many ways of solving this problem" is also overly reductive, everyone reading here should really understand that this is a topic that is being actively researched right now in 2026, hence there are currently no blessed solutions to this problem, otherwise this research would not be needed. Even more so, to some extent this problem is fundamentally unsolvable depending on what you mean by "solvable", because your input is inexact not all geometrical operations are topologically valid, hence an "exact" or let alone "correct along some dimension" result cannot be achieved for all (combination of) inputs.
[0] https://dev.opencascade.org/content/fuzzy-boolean-operations
They don’t just lean into epsilons, the session context tolerance is used for almost every single point classification operation in geometric kernels and many primitives carry their own accumulating error component for downstream math.
Even then the current state of the art (in production kernels) is tolerance expansion where the kernel goes through up to 7 expansion steps retrying point classification until it just gives up. Those edge cases were some of the hardest parts of working on a kernel.
This is a fundamentally unsolvable problem with floating point math (I worked on both Parasolid and ACIS in the 2000s). Even the ray-box intersection example TFA gives is a long standing thorn - raytracing is one of the last fallbacks for nasty point classification problems.
Could you point to any literature/freely available resource that comes close to the SOTA for these kinds of operations? I would be greatly helped.
It's a fundamentally unsolvable problem with B-reps! The problem completely disappears with F-reps. (In exchange for some other difficult problems).
Ahhaha.
(I used to work in nTop, and boy is this an understatement when it comes to field based solid modeling)
The GP wasn't wrong. To "lean in" means to fully commit to, go all in on, (or, equivalently, go all out on).
I'm wondering if people have heard the expression "leaning in" from people who were insincere/lying, and assumed that that was what the phrase means?
The Rust code in the assert_f64_eq macro is:
if (a >= b && a - b < f64::EPSILON) || (a <= b && b - a < f64::EPSILON)
I'm the author of the Rust assertables crate. It provides floating-point assert macros much as described in the article.https://github.com/SixArm/assertables-rust-crate/blob/main/s...
If there's a way to make it more precise and/or specific and/or faster, or create similar macros with better functionality and/or correctness, that's great.
See the same directory for corresponding assert_* macros for less than, greater than, etc.
It's defined as the difference between 1.0 and the smallest number larger than 1.0. More usefully, it's the spacing between adjacent representable float numbers in the range 1.0 to 2.0.
Because floats get less precise at every integer power of two, it's impossible for two numbers greater than or equal to 2.0 to be epsilon apart. The spacing between 2.0 and the next larger number is 2*epsilon.
That means `abs(a - b) <= epsilon` is equivalent to `a == b` for any a or b greater than or equal to 2.0. And if you use `<` then the limit will be 1.0 instead.
Epsilon is the wrong tool for the job in 99.9% of cases.
Multiplying epsilon by the largest number you are dealing with is a strategy that makes using epsilons at least somewhat logical.
So I'd probably rewrite that code to first find the ulp of the larger of the abs of a and b and then assert that their difference is less than or equal to that.
Edit: Or maybe the smaller of the abs of the two, I haven't totally thought through the consequences. It might not matter, because the ulps will only differ when the numbers are significantly apart and then it doesn't matter which one you pick. Perhaps you can just always pick the first number and get its ULP.
IIRC it would compute the "dynamic" epsilon value essentially by adding one to the mantissa (treated as an integer) to get the next possible float. Then subtract from that the initial value to get the dynamic epsilon value.
Definitely use library functions if you got 'em though.
epsilons are fine in the case that you actually want to put a static error bound on an equality comparison. numpy's relative errors are better for floats at arbitrary scales (https://numpy.org/doc/stable/reference/generated/numpy.isclo...).
edit: ahh i forgot all about ulps. that is what people often confuse ieee eps with. also, good background material in the necronomicon (https://en.wikipedia.org/wiki/Numerical_Recipes).
EPSILON = (1 ulp for numbers in the range [1, 2)). is a lousy choice for tolerance. Every operation whose result is in the range [1, 2) has a mathematical absolute error of ½ ulp. Doing just a few operations in a row has a chance to make the error term larger than your tolerance, simply because of the inherent inaccuracy of floating-point operations. Randomly generate a few doubles in the range [1, 10], then randomize the list and compute the sum of different random orders in the list, and your assertion should fail. I'd guess you haven't run into this issue because either very few people are using this particular assertion, or the people who do happen to be testing it in cases where the result is fully deterministic.
If you look at professional solvers for numerical algorithms, one of the things you'll notice is that not only is the (relative!) tolerance tunable, but there's actually several different tolerance values. The HiGHS linear solver for example uses 5 different tolerance values for its simplex algorithm. Furthermore, the default values for these tolerances tend to be in the region of 10^-6 - 10^-10... about the square root of f64::EPSILON. There's a basic rule of thumb in numerical analysis that you need your internal working precision to be roughly twice the number of digits as your output precision.
I would suggest that "equals" actually is for "exactly equals" as in (a == b). In many pieces of floating point code this is the correct thing to test. Then also add a function for "within range of" so your users can specify an epsilon of interest, using the formula (abs(a - b) < eps). You may also want to support multidimensional quantities by allowing the user to specify a distance metric. You probably also want a relative version of the comparison in addition to an absolute version.
Auto-computing epsilons for an equality check is really hard and depends on the usage, as well as the numerics of the code that is upstream and downstream of the comparison. I don't see how you would do it in an assertion library.
// Precise matching:
assert_f64_eq!(a, 0.1, Steps(2))
// same as: assert!(a == 0.1.next_down().next_down())
// Number of digits (after period) that are matching:
assert_f64_eq!(a, 0.1, Digits(5))
// Relative error:
assert_f64_eq!(a, 0.1, Rel(0.5))The usual pattern is abs(a - b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol) to avoid both large-value and near-zero pitfalls.
https://github.com/python/cpython/blob/d61fcf834d197f0113a6a...
For my own soft-floating point math library, I expect the value is off by a some percentage, not just off by epsilon. And so I have my own almostSame method [1] which accounts for that and is quite a bit more complex. Actually multiple such methods. But well, that's just my own use case.
[1] https://github.com/thomasmueller/bau-lang/blob/main/src/test...
let y = 2.0;
let x = sqrt(y);
Now is `x` actually the square root of 2? Of course not - because the digit expansion of sqrt(2) doesn't terminate, the only way to precisely represent it is with symbolics. So what do we actually have? `x` was either rounded up or down to a number that does have an exact FP representation. So, `x` / sqrt(2) is in `[1 - eps, 1 + eps]`. The eps tells you, on a relative scale, the maximum distance to an adjacent FP number for any real number. (Full disclosure, IDK how this interacts with weird stuff like denormals).Note that in general we can only guarantee hitting this relative error for single ops. More elaborate computations may develop worse error as things compound. But it gets even worse. This error says nothing about errors that don't occur in the machine. For example, say I have a test that takes some experimental data, runs my whiz-bang algorithm, and checks if the result is close to elementary charge of an electron. Now I can't just worry about machine error but also a zillion different kinds of experimental error.
There are also cases where we want to enforce a contract on a number so we stay within acceptable domains. Author alluded to this. For example - if I compute some `x` s.t. I'm later going to take `acos(x)`, `x` had better be between `[-1, 1]`. `x >= -1 - EPS && x <= 1 + EPS` wouldn't be right because it would include two numbers, -1 - EPS and 1 + EPS, that are outside the acceptable domain.
- "I want to relax exact equality because my computation has errors" -> Make `assert_rel_tol` and `assert_abs_tol`.
- "I want to enforce determinism" -> exact equality.
- "I want to enforce a domain" -> exact comparison
Your code here is using eps for controlling absolute error, which is already not great since eps is about relative error. Unfortunately your assertion degenerates to `a == b` for large numbers but is extremely loose for small numbers.
https://numpy.org/doc/stable/reference/generated/numpy.allcl...
You probably also want an isclose and probably want to push most users toward using that.
if a.abs()+b.abs() >= (a-b).abs() * 2f64.powi(48)
It remains accurate for small and for big numbers. 48 is slightly less than 52.
Fix your precision so it matches.
You only need so many significant digits.
‘a.to_bits() == b.to_bits()’
Alternatively, use ‘partial_eq’ and fall back to bit equality if it returns None.
C++ implements this https://en.cppreference.com/cpp/numeric/math/nextafter
Rust does not https://rust-lang.github.io/rfcs/3173-float-next-up-down.htm... but people have in various places.
Rust's https://doc.rust-lang.org/std/primitive.f32.html#method.next... https://doc.rust-lang.org/std/primitive.f32.html#method.next... of course exist, they're even actually constant expressions (the C++ functions are constexpr since 2023 but of course you're not promised they actually work as constant expressions because C++ is a stupid language and "constexpr" means almost nothing)
You can also rely on the fact (not promised in C++) that these are actually the IEEE floats and so they have all the resulting properties you can (entirely in safe Rust) just ask for the integers with the same bit pattern, compare integers and because of how IEEE is designed that tells you how far away in some proportional sense, the two values are.
On an actual CPU manufactured this century that's almost free because the type system evaporates during compilation -- for example f32::to_bits is literally zero CPU instructions.
>Currently it is not possible to answer the question ‘which floating point value comes after x’ in Rust without intimate knowledge of the IEEE 754 standard.
So nevermind on it not being present in Rust I guess I was finding old documentation
By 2025 every remaining question about edge cases or real world experience was resolved and in April 2025 the finished feature was stabilized in release 1.86, so it just works in Rust since about a year.
For future reference you can follow separate links from a Rust RFC document to see whether the project took this RFC (anybody can write one, not everything gets accepted) and then also how far along the implementation work is. Can I use this in nightly? Maybe there's an outstanding question I can help answer. Or, maybe it's writing a stabilization report and this is my last chance to say "Hey, I am an expert on this and your API is a bit wrong".
IIRC this was not ALWAYS the case, on x86 not too long ago the CPU might choose to put your operation in an 80-bit fp register, and if due to multitasking the CPU state got evicted, it would only be able to store it in a 32-bit slot while it's waiting to be scheduled back in?
It might not be the case now in a modern system if based on load patterns the software decides to schedule some math operations or another on the GPU vs the CPU, or maybe some sort of corner case where you are horizontally load balancing on two different GPUs (one AMD, one Nvidia) -- I'm speculating here.
Eventually I learned about the 80-bit thing and that macos gcc was automatically adding a -ffloat-store to make == more predictable (they use a floats everywhere in the UI library). Since pdftotext was full of == comparisons, I ended up adding a -ffloat-store to the gcc command line and calling it a day.
I don't think the CPU was ever allowed to do that, but with your average compiler you were playing with fire.
Did any actual OS mess up state like that? They could and should save the full registers. There's even a bultin instruction for this, FSAVE.
The same series of operations with the same input will always produce exactly the same floating point results. Every time. No exceptions.
Hardware doesn't matter. Breed of CPU doesn't matter. Threads don't matter. Scheduling doesn't matter. IEEE floating point is a standard. Everyone follows the standard. Anything not producing indentical results for the same series of operations is *broken*.
What you are referring to is the result of different compilers doing a different series of operations than each other. In particular, if you are using the x87 fp unit, MSVC will round 80-bit floating point down to 32/64 bits before doing a comparison, and GCC will not by default.
Compliers doesn't even use 80-bit FP by default when compiling for 64 bit targets, so this is not a concern anymore, and hasn't been for a very long time.
- NaN bits are non-deterministic. x86 and ARM generate different sign bits for NaNs. Wasm says NaN payloads are completely unpredictable.
- GPUs don't give a shit about IEEE-754 and apply optimizations raging from DAZ to -ffast-math.
- sin, rsqrt, etc. behave differently when implemented by different libraries. If you're linking libm for sin, you can get different implementations depending on the libc in use. Or you can get different results on different hardware.
- C compilers are allowed to "optimize" a * b + c to FMA when they wish to. The standard only technically allows this merge within one expression, but GCC enables this in all cases by default on some `-std`s.
You're technically correct that floats can be used right, but it's just impossible to explain to a layman that, yes, floats are fine on CPUs, but not on GPUs; fine if you're doing normal arithmetic and sqrt, but not sin or rsqrt; fine on modern compilers, but not old ones; fine on x86, but not i686; fine if you're writing code yourself, but not if you're relying on linear algebra libraries, unless of course you write `a * b + c` and compile with the wrong options; fine if you rely on float equality, but not bitwise equality; etc. Everything is broken and the entire thing is a mess.
I still think it's important to fight the misinformation.
Programmers have been conditioned to be so afraid of floats that many believe that doing a + b has an essentially random outcome when it doesn't work that way at all. It leads people to spend a bunch of effort on things that they don't need to be doing.
Physical quantities involve imprecision: measurement devices, tools, display devices, ADC/DACs etc... They all have some tolerances. And when you are using epsilons, the epsilon value should be chosen based on that physical value. For example, you set the epsilon to 1e-4 because that's 100 microns and you can't display 100 micron details.
That's also the reason why there is not one size fits all solution. If you are working with microscopic objects, 100 microns is huge, and if you are doing a space simulation, 1 km may be negligible. Some operations involve a huge loss of precision, some don't, and sometimes you really want exact numbers and therefore you have to know your fractional powers of 2.
I only played it rather than modded it, so happy to be corrected or further enlightened, but seems like an interesting problem to have to solve.
Edit: sure enough, it was actually discussed here: https://news.ycombinator.com/item?id=26938812
Float is also fantastic for depth values precisely because they have more precision towards the origin, basically quasi-logarithmic precision. Having double the precision at half the distance is A+. At least if you're writing software rasterizers and do linear depth. The story with depth buffer precision in GPU pipelines with normalized depth and and hyperbolic distribution is...sad.
If "geometry" refers to the geometry of an affine space, i.e. a space of points, then indeed there is nothing special about any point that is chosen as the origin and no reason do desire lower tolerances for the coordinates of points close to the current origin.
Therefore for the coordinates of points in an affine space, using fixed-point numbers would be a better choice. There are also other quantities for which usually floating-point numbers are used, despite the fact that fixed-point numbers are preferable, e.g. angles and logarithms.
On the other hand, if you work with the vector space associated to an affine space, i.e. with the set of displacements from one point to another, then the origin is special, i.e. it corresponds with no displacement. For the components of a vector, floating-point numbers are normally the right representation.
So for the best results, one would need both fixed-point numbers and floating-point numbers in a computer.
These were provided in some early computers, but it is expensive to provide hardware for both, so eventually hardware execution units were provided only for floating-point numbers.
The reason is that fixed-point numbers can be implemented in software with a modest overhead, using operations with integer numbers. The overhead consists in implementing correct rounding, keeping track of the position of the fraction point and doing some extra shifting when multiplications or divisions are done.
In languages that allow the user to define custom types and that allow operator overloading and function overloading, like C++, it is possible to make the use of fixed-point numbers as simple as the use of the floating-point numbers.
Some programming languages, like Ada, have fixed-point numbers among the standard data types. Nevertheless, not all compilers for such programming languages include an implementation for fixed-point numbers that has a good performance.
And since they're essentially the same, there just aren't many situations where implementing your own fixed point is worth it. MCUs without FPUs are increasingly uncommon. Financial calculations seem to have converged on Decimal floating point. Floating point determinism is largely solved these days. Fixed point has better precision at a given width, but 53 vs 64 bits isn't much different for most applications. If you happen to regularly encounter situations where you need translation invariants across a huge range at a fixed (high) precision though, fixed point is probably more useful to you.
The applications where the difference does not matter are those whose accuracy requirements are much less than provided by the numeric format that is used.
When using double-precision FP64 numbers, the rounding errors are frequently small enough to satisfy the requirements of an application, regardless if those requirements are specified as a relative error or as an absolute error.
In such cases, floating-point numbers must be used, because they are supported by the existing hardware.
But when an application has more strict requirements for the maximum absolute error, there are cases when it is preferable to use smaller fixed-point formats instead of bigger floating-point formats, especially when FP64 is not sufficient, so quadruple-precision floating-point numbers would be needed, for which there is only seldom hardware support, so they must be implemented in software anyway, preferably as double-double-precision numbers.
i.e. the difference between having a limit for the absolute error or for the relative error.
The masking procedure I mentioned gives uniform absolute error in floats, at the cost of lost precision in the significand. The trade-off between the two is really space and hence precision.I'm not saying fixed point is never useful, just that it's a very situational technique these days to address specific issues rather than an alternative default. So if you aren't even doing numerical analysis (as most people don't), you should stick with floats.
Double precision floating point is like a 54-bit fixed point system that automatically scales to the exact size you need it to be. You get huge benefits for paying those 10 exponent bits. Even if you need those extra bits, you're often better off switching to a higher precision float or a double-double system.
If all you're comparing is the result from the same operations, you _may_ be fine using equality, but you should really know that you're never getting a number from an uncontrolled source.
Done some reading. Thanks to the article to waking me up to this fact at least. I didn't realize that the epsilon provided by languages tends to be the one that only works around 1.0, and if you want to use episilons globally (which the article would say is generally a bad idea) you need to be more dynamic as your ranges, and potential errors, increase.
I was arguing that we could squeeze a tiny bit more precision out of our angle types by storing angles in radians (range: -π to π) instead of degrees (range: -180 to 180) because when storing as degrees, we were wasting a ton of floating point precision on angles between -1° and 1°.
No matter what scale you pick, your number line is going to look like this: https://anniecherkaev.com/images/floating_point_density.jpg Do a 2x zoom in or out and not a single pixel of the graph will change, just the labels.
Whether your biggest value is 0.005 or 7000000, most of your range has 25 (or 54) bits of precision. 99% of values are either too small to matter or outside your range. Changing your scale shifts between the "too small" and "too big" categories, but the number of useful values stays roughly the same.
In those old FP formats, the product of the smallest normalized and non-null FP number with the biggest normalized and non-infinite FP number was approximately equal to 1.
However in the IEEE standard for FP arithmetic, it was decided that overflows are more dangerous than underflows, so the range of numbers greater than 1 has been increased by diminishing the range of numbers smaller than 1.
With IEEE FP numbers, the product of the smallest and biggest non-null non-infinite numbers is no longer approximately 1, but it is approximately 4.
So there are more numbers greater than 1 than smaller than 1. For IEEE FP numbers, there are approximately as many numbers smaller than 2 as there are numbers greater than 2.
An extra complication appears when the underflow exception is masked. Then there is an additional set of numbers smaller than 1, the denormalized numbers. Those are not many enough to compensate the additional numbers bigger than 1, but with those the mid point is no longer at 2, but somewhere between 1 and 2, close to 1.5.
This is just wrong? The largest Float64 is 1.7976931348623157e308 and the smallest is 5.0e-324 They multiply to ~1e-16.
That's a subnormal [1]. The smallest normal double is 2.22507e-308:
DBL_MIN = 2.22507e-308
DBL_TRUE_MIN = 4.94066e-324
[1] https://en.wikipedia.org/wiki/Subnormal_numberHowever, for angles the relative error is completely irrelevant. For angles only the absolute error matters.
For angles the optimum representation is as fixed-point numbers, not as floating-point numbers.
Even though the first number is smaller than the 2nd one, they actually represent the same angle once you consider that they are different units. So there's no precision advantage (absolute or relative) to converting degrees to radians.
Note that I'm not saying anything about fixed vs floating point, only responding to an earlier comment that radians give more precision in floating point representation.
Changing the unit gives the illusion of changing absolute error, but doesn't actually change the absolute error.
Re: teaching floats; when I was working with students, we touch on floats slightly, but mostly just to reinforce the idea that they aren’t always exact. I think, realistically, it can be hard. You don’t want to put an “intro to numerical analysis” class into the first couple lectures of your “intro to programming” class, where you introduce the data-types.
Then, if you are going to do a sort of numerical analysis or scientific computing class… I dunno, that bit of information could end up being a bit of trivia or easily forgotten, right?
> > the myth about exactness is that you can't use strict equality with floating point numbers because they are somehow fuzzy. They are not.
> They are though. All arithmetic operations involve rounding, so e.g. (7.0 / 1234 + 0.5) * 1234 is not equal to 7.0 + 617 (it differs in 1 ULP). On the other hand, (9.0 / 1234 + 0.5) * 1234 is equal to 9.0 + 617, so the end result is sometimes exact and sometimes is not. How can you know beforehand which one is the case in your specific case? Generally, you can't, any arithmetic operation can potentially give you 1 ULP of error, and it can (and likely, will) slowly accumulate.
Also, please don't comment how nobody has a use for "f(x) = (x / 1234 + 0.5) * 1234": there are all kinds of queer computations people do in floating point, and for most of them, figuring out the exactness of the end result requires an absurd amount of applied numerical analysis, doing which would undermine most of the "just let the computer crunch the numbers" point of doing this computation on a computer.
I think if you want to work with values that might be exactly equal to other values, floating point is simply not the right choice. For money, use BigDecimal or something like that. For lots of purposes, int might be more appropriate. If you do need floating point, maybe compare whether the value is larger than the other value.
_Bool equiv(float x, float y) {
return (x <= y && y <= x)
|| (x != x && y != y);
}
which both handles NaNs sensibly (all NaNs are equivalent) and won't warn about using == on floats. I find it also easy to remember how to write when starting a new project.It's a classic, so let's take vector normalization as an example. Topologically, you're ripping a hole in the space, and that's causing your issues. It manifests as NaN for length-zero vectors, weird precision issues too close to zero, etc, but no matter what you employ to try to fix it you're never going to have a good time squishing N-D space onto the surface of an N-D sphere if you need it to be continuous.
Some common subroutines where I see this:
1. You want to know the average direction of a bunch of objects and thus have to normalize each vector contributing to that average. Solution 1: That's not what you want almost ever. In any of the sciences, or anything loosely approximating the real world, you want to average the un-normalized vectors 99.999% of the time. Solution 2: Maybe you really do need directions for some reason (e.g., tracking where birds are looking in a game). Then don't rely on vectors for your in-band signaling. Explicitly track direction and magnitude separately and observe the magic of never having direction-related precision errors.
2. You're doing some sort of lighting normalization and need to compute something involving areas of potentially near-degenerate triangles, dividing by those values to weight contributions appropriately. Solution: Same as above, this is kind of like an average of averages problem. It can make fuzzy, intuitive sense, but you'll get better results if you do your summing and averaging in an un-normalized space. If you really do need surface normals, store those explicitly and separate from magnitude.
3. You're doing some sort of ML voodoo to try to get better empirical results via some vague appeal to vanishing gradients or whatever. Solution: The core property you want is a somewhat strange constraint on your layer's Jacobian matrix, and outside of like two papers nobody is willing to put up with the code complexity or runtime costs, even when they recognize it as the right thing to do. Everything you're doing is a hack anyway, so make your normalization term x/(|x|+eps) with eps > 0 rather than equal to zero like normal. Choose eps much smaller than most of the vectors you're normalizing this way and much bigger than zero. Something like 1e-3, 1e-20, and 1e-150 should be fine for f16, f32, and f64. You don't have to tune because it's a pretty weak constraint on the model, and it's able to learn around it.
> The title of this post is an intentional clickbait.
Unfortunately, that's where the honesty ends.
> It's NOT OK to compare floating-points using epsilons.
> So, are epsilons good or bad? Usually bad, but sometimes okay.
So which is it? Emphatically NOT OK, or sometimes okay?
bool DawsonCompare(float af, float bf, int maxDiff)
{
int ai = *reinterpret_cast<int*>(&af);
int bi = *reinterpret_cast<int*>(&bf);
if (ai < 0)
ai = 0x80000000 - ai;
if (bi < 0)
bi = 0x80000000 - bi;
int diff = ai - bi;
if (abs(diff) < maxDiff)
return true;
return false;
}Signed zero and the sign-magnitude representation is more of an issue, but can be resolved by XORing the sign bit into the mantissa and exponent fields, flipping the negative range. This places -0 adjacent to 0 which is typically enough, and can be fixed up for minimal additional cost (another subtract).
func equiv(x, y float32, ignoreBits int) bool {
mask := uint32(0xFFFFFFFF) << ignoreBits
xi, yi := math.Float32bits(x), math.Float32bits(y)
return xi&mask == yi&mask
}
with the sensitivity controlled by ignoreBits, higher values being less sensitive.Supposing y is 1.0 and x is the predecessor of 1.0, the smallest value of ignoreBits for which equiv would return true is 24.
But a worst case example is found at the very next power of 2, 2.0 (bitwise 0x40000000), whose predecessor is quite different (bitwise 0x3FFFFFFF). In this case, you'd have to set ignoreBits to 31, and thus equivalence here is no better than checking that the two numbers have the same sign.
I'm unconvinced. Doesnt this just replace the need to choose a suitable epsilon with the need to choose the right number of bits to strip? With the latter affording much fewer choices for degree of "roughness" than does the former.
I think I'll just use scaled epsilon... though I've gotten lots of performance wins out of direct bitwise trickery with floats (e.g., fast rounding with mantissa normalization and casting).
This breaks down across the positive/negative boundary, but honestly, that's probably a good property. -0.00001 is not all that similar to +0.00001 despite being close on the number line.
It also requires that the inputs are finite (no INF/NAN), unless you are okay saying that FLT_MAX is roughly equal to infinity.
double freq = getInput(0);
if (freq != mLastFreq) {
calculateCoefficients(freq);
mLastFreq = freq;
}
Also, keep in mind that certain languages, such as JS, store all numbers as double-precision floating point numbers. So every time you are writing a numeric for-loop in JS you are implicitly relying on floating point equality :)Wait is British "maths" a singular noun or is this a typo? I was willing to go along with it if it was plural, but I have to draw the line here.
However, nowadays a word like physics is understood not as "natural things", but as an implicit abbreviation for "the science of natural things". Similarly for mathematics, mechanics, dynamics and so on.
So such nouns are used as singular nouns, because the implicit noun "science" is singular.
dpps xmm0, xmm0, 0x17 ; dot product of 3 lanes, write lane 0
sqrtss xmm0, xmm0
ret
And is considerably faster than the fancy version, mainly because Intel still hasn't given us horizontal-max vector instruction! ARM is a bit better in that regard with their fancy vmaxvq_f32 and vmaxnmvq_f32...Anything else is basically a nightmare to however has to maintain the code in the future.
Also, good luck with e.g. checking if points are aligned to a grid or the like without introducing a concept of epsilon _somewhere_.
This is especially true in cases where the number comes from some kind of input (user controls, sensor reading, etc.) or random number generation.