Tuesday, November 14, 2006

Back to the compiler...

Building a compiler seems to be endlessly entertaining. There is always something to keep me busy - particularly so because I haven't really got a clue.

Here's one you may enjoy.

Working with a signed short, defined to be 2 bytes long, the range is -32768 (0x8000) to 32767 (0x7fff). What happens when you add one to the maximum value? How about subtract one from the minimum value? Multiply by 2? Divide by 2?

Simple, right? Well...

32767 + 1 = -32768
-32768 - 1 = 32767
-32768 * 2 = 0
-32768 * 3 = -32768
-32768 / 2 = -16384

Most of those should at least make you frown. It all makes sense and comes down to the fact that nobody wants to actually do a bounds check for every single computation done by a compiled program.

Starting with 32767 + 1. 32767 is, in hex 0x7fff, or 0b0111 1111 1111 1111 (binary). Add one to that, and you get 0b1000 0000 0000 0000. The high bit in a signed operation means negative. We just flipped from positive territory into negative territory. 0x8000 is the same value as -32768. Run that in reverse for -32768 - 1 becoming 32767.

-32768 * 2 is, essentially a bit-shift to the left. So, -32768 = 0x8000 = 0b1000 0000 0000 0000. Shifting the bit pattern to the left one bit yields 0b0000 0000 0000 0000, and this is easy, that is 0x0 or 0. Times three is the same as -32768*2 + -32768*1, so, by substitution, that becomes 0 + -32768*1, or -32768 (again the "times 2" part is shifted out of our value).

-32768 / 2, comes out right simply because dividing comes down to shifting to the right. So, 0b1000 0000 0000 0000 shifted to the right becomes 0b1100 0000 0000 0000 (the high bit is replaced), and we get the right answer of -16384.

Cool, huh? Real numbers have similar "gotcha's" that also make sense with a little thinking.

No comments: