Monday, 29 October 2012

Floating Point Arithmetic



Floating Point arithmetic is an arithmetic operation on floating point numbers which include addition, subtraction, multiplication and division. The operations are done with algorithms similar to those used on sign magnitude integers.
Addition :
Using a 4 digit decimal example;
9.988 x101 + 2.332 x 10-1
>9.988 x 101 + 0.023 x 101
>9.988 x101 + 0.023 x 101 = 10.011 x 101
>10.011 x 101  (overflow) = 1.0011 x 102
Answer = 1.001 x 102

Subtraction
Example 2: (subtraction)
Using 4 digit Binary example:
>1.000­2­ x 2-1  -1.00­0­2­ x2-2 ( 0.5 – 0.25)
>1.000­ x2-1  - 0.10­02 ­x 2­­­-1­­
>1.000­ x2-1  - 0.100­2 ­x 2­­­-1­­ = 0.100­2 x2-1
>0.10­02 ­x 2­­­-1  ( under flow)
Answer = 1.000­2 ­x 2

Multiplication
Using 4bit decimal example:
1.111 x 1010  X  9.500 x 10-7
>New exponent = 10 -7 = 3
>1.111 x 9.500 = 10.5545 x 103
>10.5545 x 103 (overflow) = 1.056 x 104

Example in binary:
1011.01 x 110.1
           1011.01
          X  110.1
            101101
                    0
       101101
     101101
-------------------
   1001001.001

 ­
Division
11111100 / 110=  101010
                     101010
                  --------------
          110) 11111100
                   110
                   ------------
                      111
                      110
                   -------------     
                          110
                          110
                   -------------
                                 0
                                 0   
                             ------


Rounding
Rounding Decimal:
Example 1:
0.8842       round to 3 decimal places = 0.884
                   round to 2 decimal places = 0.88

round toward + infinity
example 2:
1.23            round to 2 decimal places = 1.3
-2.86          round to 2 decimal places = -2.8

round toward – infinity
example 3
1.23            round to 2 decimal places = 1.2
-2.86          round to 2 decimal places = -2.9




No comments:

Post a Comment