Item 43585361

zabzonk • 7 days ago

I've never understood floating point :-)

djmips • 7 days ago

Fixed point is where the number has a predetermined number of bits for the integer and fraction like 8.8 where you have 0-255 for the integer and the fraction goes from 1/256 to 255/256 in steps of 1/256

Floating point at it's simplest just makes that a variable. So the (.) position is stored as a separate number. Now instead of being fixed - it floats around.

This way you can put more in the integer or more in the fraction.

The Microsoft Basic here used 23 bits for the number, 1 sign bit and 8 bits to say where the floating point should be placed.

Of course in practice you have to deal with a lot of details depending on how robust you want your system. This Basic was not as robost as modern IEEE754 but it did the job.

Reading more about IEE754 is a fascinating way to learn about modern floating point. I also recommend Bruce Dawson's observations on his Random ASCII blog.

codedokode • 7 days ago

Let's say we want to store numbers in computer memory but we are not allowed to use decimal point or any characters except for digits. We need to make some system to encode and decode real numbers as a sequence containing only digits.

With fixed point numbers, you write the digits into the memory and have a convention that the decimal point is always after N-th digit. For example, if we agree that the point is always after 2-nd digit then a string 000123 is interpreted as 00.0123 and 123000 means 1230. Using this system with 6 digits we can represent numbers from 0 to 9999 to precision of 0.01.

With floating point, you write both decimal point position (which we call "exponent") and digits (called "mantissa"). Let's agree that the first two digits are the exponent (point position) and the rest four is mantissa. Then this number:

means 01.23 or 1.23 (exponent is 2 meaning the decimal point is after 2nd digit in mantissa). Now using same 6 digits we can represent numbers from 0 to 9999·10⁹⁶ with relative precision of 1/10000.

That's all you need to know, and the rest should be easy to figure out.

1 reply

WalterBright • 7 days ago

In other words, a floating point number consists of 2 numbers and a sign bit:

1. the digits

2. the exponent

3. a sign bit

If you're familiar with scientific notation, yes, it's the same thing.

https://en.wikipedia.org/wiki/Scientific_notation

The rest is just the inevitable consequences of that.

1 reply

codedokode • 7 days ago

I like "decimal point position" more than "exponent". Also, if I remember correctly, "mantissa" is the significand (the digits of the number).

And by the way engineering notation (where exponent must divide by 3) is so much better. I hate converting things like 2.234·10¹¹ into billions in my head.

And by the way (unrelated to floating point) mathematicians could make better names for things, for example instead of "numerator" and "denominator" they could use "upper" and "lower number". So much easier!

1 reply

WalterBright • 7 days ago

I do get significand and mantissa mixed up. I solved that by just removing them!

hh2222 • 7 days ago

Wrote floating point routines in assembler back in college. When you get it, it's one of those aha moments.

WalterBright • 7 days ago

The specs for it are indeed hard to read. But the implementation isn't that bad. Things like the sticky bit and the guard bit are actually pretty simple.

However, crafting an algorithm that uses IEEE arithmetic and avoids the limitations of IEEE is hard.

whartung • 7 days ago

If you want a crash course in the mechanics of FP math, i.e. how it’s done at the bit level, then head over to the Project Oberon site and look for the PDF describing the implementation of their RISC machine in FPGA.

Chapter 16, pages 8-10, gives a very concise description of the process.