No real numbers

Integer addition

We start with a scenario where all variables are int and we add two positive numbers.

void foo(int x, int y) {
  if (x <= 0 || y <= 0) return;
  int z = x + y;

Assume you evaluate the following expressions afterwards. Select all which can be true!

z > x

z < x

z == x

z != x

Something else

If you did not already know, you are now aware about overflows. The range of numbers is limited for integers to something like 32bit or 64bit. So if you try to compute a larger number the CPU cannot represented it and the number overflows into something different.

How can you detect or avoid overflows? The first idea is "if the result is negative...". Avoid that! If the operation has happened, you already had undefined behavior. How do you check before the operation? The good check is if (x > MAX_INT - y). This is not done by default because it comes with a serious performance hit. Instead of a single add instruction, the CPU does checking and branching. GCC and Clang come with builtins for overflow checking.

Let's try again with overflow check:

void foo(int x, int y) {
  if (x <= 0 || y <= 0) return;
  if (x > INT_MAX - y) return; // overflow
  int z = x + y;

Assume you evaluate the following expressions afterwards. Select all which can be true!

z > x

z < x

z == x

z != x

Something else

(Technically, the case z==x is possible in case the CPU uses saturating arithmetic. This is useful for digital signal processors but so rare that I ignored it in the test.)

There are also "underflows" since there is boundary in the other direction as well. You can handle these in the same way.

Floating point addition

As an alternative to integers, programmers can use IEEE 754 floating point number. Of course, you keep overflows in mind now but do you know how they show up?

void foo(double x, double y) {
  if (x <= 0.0 || y <= 0.0) return;
  double z = x + y;

Assume you evaluate the following expressions afterwards. Select all which can be true!

z > x

z < x

z == x

z != x

Trap

At this point you know that floating point operations may or may not trap depending on mode. In C99, feenableexcept lets you change the behavior.

Now let us invert the question and ask for "false". Just by asking the question, you probably suspect that the answer is just the previous ones inverted. So be careful. The trap option is missing because we already know that traps are possible.

void foo(double x, double y) {
  if (x <= 0.0 || y <= 0.0) return;
  double z = x + y;

Assume you evaluate the following expressions afterwards. Select all which can be false!

z > x

z < x

z == x

z != x

Not a Number (NaN) is something you need to be aware of when dealing with floating point. Any comparison with NaN is false, so the answers above are trivial. This includes x == x which makes it the trick to check for NaN. Let us use it.

void foo(double x, double y) {
  if (x <= 0.0 || y <= 0.0) return;
  if (! (x == x && y == y)) return;
  double z = x + y;

Assume you evaluate the following expressions afterwards. Select all which can be false!

z > x

z < x

z == x

z != x

This concludes the pitfalls with addition. It leaves the other operations subtraction, multiplication, division, modulo. We are not going to consider them as because it is the same drill. Only special topics remain for here.

Division by Zero

Everybody knows division by zero is not allowed. For integers you can assume a trap happens but floats have additional options. So what happens if we try to?

void foo(double x, double y) {
  double z = x / y;

Assume y is zero. Select all true statements!

Always traps.

Never traps.

z is NaN assuming no trap.

z is not negative assuming no trap and x positive.

z is not NaN assuming no trap and x not NaN.

Division by zero is undefined for real numbers and integers but IEEE 754 defines something. The question about good defaults remains. Are mistakes too easy such that we should enable exceptions by default? Are infinity and NaN values like the others or are they more dangerous than 42E10 for example?

Equality Checks

People often advise you to not check floating point values for equality. Try it:

bool equalsDotThree(double x) { return x == 0.3; }

Select all expressions which evaluate to true when used as parameter for equalsDotThree.

0.3

0.3f

0.9 / 0.3

0.1 + 0.1 + 0.1

0.3 - 0.1 + 0.1

0.3 + 0.1 - 0.1

You have to consider the precision when checking for equality. So instead of x == y you test for abs(x-y) < epsilon. The value of epsilon depends on the context. There are probably utility libraries around which provide a bool isZero(double x) function which has a hardcoded epsilon inside. Do not try this at home! Sometimes epsilon should be 0.1 and sometimes rather 1E-10. This cannot be decided generically.

Literals

So better is less-than/greater-than everywhere with floating point? Let us try that with another tricky case: Converting floating point to integers. Since you are very careful by now, of course you think about a range check before you cast it.

uint32 toUnsigned(float32 x) {
if (x < 0.0F) return 0;
if (x > 4294967295.0F) return 4294967295;
return (uint32) x;
}

Select all true statements!

Underflow is properly checked.

Overflow is properly checked.

The integer conversion is done correctly.

Undefined behavior is possible.

The problem is that 4294967295 cannot be precisely represented in a 32bit float. Instead it gets represented as 4294967296.0F which is one more. For this number the if condition is false, so it gets to the cast and the result is undefined. A simple fix in this case would be to use >=.

This article does not intend to teach everything about integer and floating point arithmetic. Try this paper instead. Here, I only try to make you respect the fact that the numbers are not real (unless you answered everything correctly).

Thanks Manfred, Artjom and Christoph for valuable feedback.