As an aside, Sinclair ZX81 BASIC and most 8-bit computers use a more precise FP format (5-byte/40-bit) than newer machines, which use only 32-bit floating point. This is reportedly because Microsoft Extended BASIC used a higher-precision FP format than most bigger computers of the time, and that BASIC was used by everybody from Apple to Commodore and influenced others such as Sinclair and BBC Micro. In 1985 the IEEE standardized on 32-bit FP, but helpfully also defined double-precision (64-bit), so eventually most caught up to the humble ZX81. Now, a more precise format does not mean that Sinclair BASIC's floating point was perfect. Dr. Logan talks about some of these "rounding errors"
Understanding Floating-point Arithmetic from SYNC magazine volume 2 number 1 January/February 1982, pages 30-32
[edit: corrected the title of the article]Understanding Floating-point
Ian Logan
24 Nurses Lane, Skellingthorpe, Lincoln
LN6 OTT
The aim of this article is to give the reader some insight into the complex world of floating-point arithmetic. Since the 4K ROM provided only integer arithmetic, readers who possess only this ROM will be unable to try the programs. Nevertheless they will be able to follow the text.
In the Sinclair Manual, ZX81 Basic Programming, chapter 27, Steven Vickers shows that a floating-point number consists of a single exponent byte and 4 mantissa bytes, but he gives no further information. In order to understand this subject it is probably best to return to first principles — so with pencil and paper to hand proceed.
Decimal format
In the beginning there were only simple integers. But soon they begat decimal numbers, which have an integer part, a decimal-point and a decimal part. And in their turn decimal numbers begat E-format. which has a mantissa part, an 'E' and an exponent part.
For example, the number 'four' can be expressed as:It can readily be seen that in the E-format we have the essential parts of floating-point notation for decimal numbers all given, but it is useful at this point to introduce two conventions that will help us in conversion from decimal-floating-point to binary-floating-point.Code: Select all
4 - its integer value 4.000 - its decimal value 40000E-4 - just one of many E-format choices
1 ) Always express the mantissa starting with the decimal-point.
2) Do not attribute a sign to the mantissa.
Simply state whether the value is positive or negative. So instead of:These conventions can be considered to be 'normalizing' the floating-point decimal number.Code: Select all
Write: 40000E-4 .4E1 & positive 0.00678 .678E-2 & positive -223.9 .2239E3 & negative -0.7 .7E0 & negative
With a decimal number in its 'normalized' form we can now state that the mantissa is the decimal part of the form and the exponent is the integer part after the 'E'. The exponent is a signed integer and the overall form is either positive or negative. Consider the examples in Figure 1. The reader is urged to try further examples. (Perhaps with a friend marking the results.)
Figure 1.Code: Select all
Decimal Normalized Exponent Mantissa +/- 4 .4E1 +1 4 + 40 .4E2 +2 4 + .4 .4E0 +0 4 + -40.0 .4E2 +2 4 - -123.456 .123456E3 +3 123456 -
Binary Format
As the 8K ROM program deals with binary-floating-point numbers and not decimal-floating-point numbers, the reader will now have to convert the above conclusions so that they apply to binary-format numbers.
First, consider the state when all binary numbers represented integer values, that is:Code: Select all
Decimal Binary 45 0010 1101 255 1111 1111
In this state all values are integers and positive only. Next consider fixed-point binary numbers in which there is a fixed binary-point separating the integer byte(s) from the fraction bytes(s). That is:
Decimal Form Binary Form
integer point fraction
45 0010 1101 • 00000000
45.5 0010 1101 • 10000000
45.75 0010 1101 • 11000000
45.875 0010 1101 • 11100000
Note that in a fixed-point number the first bit after the binary-point represents the value .5 and the second bit .25 etc. (The values diminish by a factor of 2.)
However, it is also possible to consider the fraction part byte by byte, which in decimal can be illustrated as follows:
From above, .11100000 gives 224/256 as the fraction part and this does give 0.875.
Now at last the binary numbers can be 'normalized.' All that needs to be done is for the whole number to be moved to the left, or the right, as needed so that the most significant bit comes to be the first bit of the fraction part. The exponent is then given as the number of moves made (+ right, - left) and the mantissa is the number of bits wanted from the fraction part.
Hence from above:Note that in the example with a mantissa being limited to just 8 bits that the values 45.75 and 45.875 cannot be distinguished. This shows why the 8K ROM uses not one but 4 bytes for the mantissa and even then it 'rounds' off values — sometimes inconveniently.Code: Select all
Decimal Exponent Mantissa Form 45 +6 (dec.) 10110111 45.875 +6 (dec.) 10110111
But how are negative numbers dealt with? Well, it is easy; there is just a statement made to say whether the value is positive or negative. For example:Now it is time to run Program 1. This Floating-point Demonstration Program asks the user to enter any decimal number that he may wish, including fraction parts and 'E's '. The program then returns the true exponent, e' and the four bytes of the mantissa. ( e' is the exponent as developed above.) For example, entering the number 255 gives:Code: Select all
Decimal Exponent Mantissa +/- Form 255 +8 (dec.) 11111111 + -255 +8 (dec.) 11111111 -
and entering -9.9E37 will give:Code: Select all
Decimal number 255 Its exponent 8 And mantissa 255 0 0 0 0 And it is POSITIVE
Note: The last value can be checked by trying the line:Code: Select all
Decimal number -9.9E+37 Its exponent 127 And mantissa 148 245 105 108 And it is NEGATIVE
which gives 9.9E+37 as expected. (Note that 2**126*2 is used to prevent overflow.) Program 1 works by reading the floating-point number that has been attributed to the variable A as that number occurs in the variable area of the RAM. Certain changes have to be made to these bytes in order to give the true exponent and the appropriate mantissa. Note for interest the differences between values of A that ought to be the same. See Figure 2. The later result is a 'rounding' error.Code: Select all
PRINT (148/256+245/256**2+105/256**3+180/256**4)*2** 126*2
Whereas Program 1 borrows the result of the ROM program to get to its answer. Program 2. A Floating-point Builder, develops the result by successive multiplications, divisions, and subtractions. So try Program 2 in order to become more familiar with binary floating-point numbers.
Figure 2.Note: The lines 170. 180, and 210 are all attempts to get around the problem of 'rounding' errors. However, the serious reader might be interested in the fact that with an initial value of A such as 8 then the value of A at line 170 is:Code: Select all
1/2 dec. gives Exp. 0 Mantissa 128 0 0 0 but .5 dec. gives Exp. -1 Mantissa 255 255 255 255
'PRINT A' gives l, but 'IF A=1' is false. The explanation lies in the fact that A has the binary value of:Code: Select all
.999999999 < A < 1
instead of the expectedCode: Select all
EXP. 0 . Mantissa 127 255 255 253
and therefore shows that the COMPARISON operation is of greater sensitivity than the PRINT operation.Code: Select all
EXP. 1 . Mantissa 128 0 0 0
Does this 'bug' account for some programming problems?
Sinclair floating-point conventions
So far in this article I have described the use of the true exponent and the true mantissa, but in Sinclair machines the floating-point numbers follow two conventions which are:
1) The exponent byte always has 128 decimal, Hex. 80, added to it, unless it is the exponent for the value zero when the exponent is always zero. Hence the 'augmented exponent,' e, is the 'true exponent' e', +128. (See how in line 120 of Program 1 this is taken into account.)
2) The true numeric bit 7 of the first byte of the mantissa which is always set in a floating-point that has been 'normalized' is understood to be present and the bit replaced by a sign-bit. This bit is set for negative numbers and reset for positive numbers (and zero). (See how in line 140 of Program 1 this is taken into account.)
To make this clear consider the examples in Figure 3.
Figure 3.Code: Select all
Decimal Format True Format Sinclair Format Exp. Mant. Exp. Mant. 1.0 1 128 0 0 0 129 0 0 0 0 2.0 2 128 0 0 0 130 0 0 0 0 -2.0 2 128 0 0 0 130 128 0 0 0 3.0 2 192 0 0 0 130 64 0 0 0 -3.0 2 192 0 0 0 130 192 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0
Conclusions
Floating-point notation is logical, tedious perhaps, but very useful.
By way of lighter relief this months game is an example of Basic programming that shows how bytes can be saved in 8K ROM programs — who said the 8K ROM wastes bytes?
The idea of the game is simply to find a number that results in the pattern filling the whole board. My best score so far is about 100.
Remember that RND generates a given series of numbers, depending on the SEED for its starting point, but additional dummy calls to RND will create new series. E.g.,would be economic for a simple arithmetic series — alternate calls to RND are used by the 'pattern.'Code: Select all
145 POKE 0,RND
Part 2 of "Understanding Floating-point Arithmetic" will discuss the third language of the 8K ROM — the Calculator Language.
Bibliography
Sinclair ZX81 ROM Disassembly, Part A: 0000 H-00F54 H, by Dr. Ian Logan. Melbourne House outlets — £7. (Deals with the 'operating system' part of the 8K ROM
program).
Sinclair ZX81 ROM Disassembly, Part B: 0F55 H-1DFE H, by Dr. Ian Logan and Dr. Frank O'Hara. Melbourne House outlets — £8. (Deals with 'expression evaluation' and the 'calculator routines' in full detail).
▚
Program 1: Floating-point Demonstration ProgramCode: Select all
10 PRINT AT 17,0;"ENTER ANY NU Any decimal number. MBER" 20 INPUT A 30 CLS 40 LET V=PEEK 16400+256*PEEK 1 Get the present value of 6401 VARS. 50 DIM B(5) For the 5 bytes. 60 FOR C=l TO 5 Get each byte from the 70 LET B(C) =PEEK (V+C) variable area. 80 NEXT C 90 PRINT "DECIMAL NUMBER";TAB 17;A 100 PRINT 110 PRINT 120 PRINT "ITS EXPONENT";TAB 17 Form the true exponent. ;B(1)-126*(B(1)<>0) 130 PRINT 140 PRINT "AND MANTISSA";TAB 17 Form the true mantissa. ;(A<>0)*(B(2)+128*(B(2)<128));TA B 21;B(3);TAB 25;B(4);TAB 29;B(5 ) 150 PRINT 160 PRINT "AND IT IS";TAB 17;"P Give the sign. OSITIVE" AND (A>=0);"NEGATIVE" A ND (A<0) 170 RUN
Program 2: Floating-point BuilderProgram 3: Floating-point Number GameCode: Select all
10 INPUT A Any decimal value. 20 CLS 30 LET B=SGN A Keep the sign. 40 PRINT "DECIMAL NUMBER";TAB 17;A 50 LET A=ABS A Ignore negative sign. 60 PRINT 70 PRINT 80 LET E=0 Set exponent to zero. 90 PRINT "ITS EXPONENT";TAB 17 ;E 100 IF A>=.5 AND A<=1 OR A=0 TH Exit when "normalized". EN GOTO 150 110 LET E=E-(A<1)+(A>1) Exponent changes by one. 120 LET A=A*(.5+1.5*(A<1)) A chanes by 5 or 2 fold. 130 PRINT AT 3,17;E Watch it changing in SLOW. 140 GOTO 100 150 PRINT 160 PRINT "AND MANTISSA";TAB 17 170 IF A>.999999999 THEN LET A= 1 See text. 180 LET F=.003906249997 A little under 1/256. 190 FOR G=1 TO 4 Each mantissa byte. 200 LET H=INT (A/F) The decimal value. 210 IF H>255 THEN LET H=128 For a rounding error. 220 PRINT H;" " The byte and a "space." 230 LET A=A-INT (A/F)*F Decrease A. 240 LET F=F/256 Change for each byte. 250 NEXT G 260 PRINT 270 PRINT 280 PRINT "AND IT IS";TAB 17;"P Fetch the sign. OSI" AND (B>=0);"NEGA" AND (B<0) ;"TIVE" 290 RUN
Code: Select all
10 PRINT AT VAL "20",NOT PI;"N EW NUMBER?" 20 INPUT N 30 RAND N 40 CLS 50 FOR A=NOT PI TO VAL "15" 60 PRINT "±";"±±±±±±±±±±±±±±" AND (NOT A OR A=VAL "15");TAB VA L "15";"±" 70 NEXT A 80 LET A=VAL "7" 90 LET B=A 100 LET C=NOT PI 110 LET D=VAL "30" 120 LET D=D-SGN PI 130 IF D=NOT PI THEN RUN 140 LET E=INT (RND*INT PI)-SGN PI 150 LET F=INT (RND*INT PI)-SGN PI 160 PRINT AT A+E,B+F; 170 IF PEEK (PEEK VAL "16398"+V AL "256"*PEEK VAL "16399")<>NOT PI THEN GOTO VAL "120" 180 PRINT "*" 190 LET C=C+SGN PI 200 PRINT AT VAL "18",NOT PI;"S TARS = ";C 210 LET A=A+E 220 LET B=B+F 230 GOTO VAL "110"