Floating-Point Operations

Loading...

Code

#
Lbl
Op
ATD

General Memory

What does this mean?

Help

Instruction to be executed next is highlighted in blue.

Target address of instruction to be executed next is highlighted in pink.

Mouse over words (the binary numbers) to see how they could be interpreted.

Other Registers:

Instruction Location Counter

Instruction Register

Storage Register

Accumulator

Multiplier-Quotient Register:

Index Register A

Index Register B

Index Register C

This example is similar to the previous addition example, but there are a few new things.

You'll notice at the end we have some psuedoinstructions.
ORG 10 tells us to start programming from register 10, while DEC means to store a number at the location. The effect of this is that we have register 10 holding -2.54 and register 11 holding 6.98. This is how numbers would actually be loaded into memory (rather than having them magically appear as before).

You'll also notice that we use the FAD operation instead of the ADD operation. This tells the computer to interpret the words as floating point numbers rather than fixed point numbers. The concept of floating point numbers is similar to that of scientific notation, storing a magnitude and an exponent into the word rather than its literal value. The ability to natively use the floating point numbers on the IBM 704 was a huge selling point; before then, you had to manually keep track of where your imaginary binary point was in your fixed point number.

There is a downside to the convenience of floating point, and it is lack of precision in floating point numbers. You'll notice that after running this program, the accumulator is not exactly 4.44 but something like 4.439999. In this case, the problem is that it's not possible to represent 2.54 or 6.98 in a finite number of binary digits, so inevitably some rounding occurs. You can see another example of this if you pop open your Javascript console and type in 1+10000000000000000. Modern computers today still use floating point, and you'll see that the 1 just disappears, because there aren't enough bits in the magnitude of the larger number to accommodate the small change. To help with this, the FAD operation also stores a result into the MQ register so that the sum of the accumulator and the MQ register provide a better approximation of the answer (basically the MQ register just acts as extra bits for precision), but apparently this was not often used, the precision of the accumulator being enough for most purposes.