topics@today: Stuck in Fixed & Floating point?

Hey, 👩

In the digital world, we represent numbers as binary bits (0's and 1's). Two such popular ways to represent real numbers in binary number system are:

Fixed-point representation
Floating-point representation

i. Fixed-point representation ⭐

> represents the integer and fractional parts separately.

Variants:

> Signed fixed point representation: has an extra sign bit at the beginning

Example: (3.5) (base = 10) = (11.101) (base = 2)

integer part(=3) + fractional part(=0.5) = 11 + 0.101 = 11.101

> Unsigned fixed point representation: no extra sign bit

Example: (3.5) (base = 10) = (011.101) (base = 2)

sign(=+ve) + integer part(=3) + fractional part(=0.5) = 0 + 11 + 0.101 = 011.101

[ NOTE: numbers starting with '0' are +ve and those with '1' are -ve ]

So, for -3.5, we will have 111.101 as the signed fixed point representation

-----------------------------------------------------------------------------------------------------------------------------

ii. Floating-point representation ⭐

It's much like the order of magnitude representation that we've learnt in Physics.

Let's learn by an example:

Just like the previous case, we have here (3.5) (base = 10) = (011.101) (base = 2). Now,

> step 1: separate the sign bit (here = 0)

> step 2: shift the decimal towards left keeping only one '1' on left (as we do for order of magnitude

representation) :

1.11010000.....

(you can keep as many zeros you like because trailing zeros after the decimal and leading zeros before the decimal point doesn't matter.)

> step 3: express as powers of 2:

11.101 = 1.1101 x 2^1 ( 2^1 since decimal was shifted by only 1 place)

[ steps 2 and 3 together is called normalization]

> step 4: Figure out the exponent and mantissa:

This is the most vital step. Basically, the number is : (-1)s(1+m) x 2(e-Bias)

where:

s = 0 (for +ve numbers) and 1 (for -ve numbers) (it's determines the sign)

to determine m, e and e-Bias, we have two standardized options to follow:

IEEE single-precision floating-point standard representation: it provides a 32-bit representation of floating-point numbers as follows:-

m = mantissa( or significand) = the part after the decimal point in step 3 representation

here, > m = 1101000000....up to 23 bits (following IEEE single precision format)

> e-Bias = 127 (127 = max. no. in 7 bits)

So, you can choose e = 128 because then 2(e-Bias) will be 2(128-127) = 2^1 and that is what

we want. [ see step 3]

In any case, e >= e-Bias, so e>=127 i.e. you'll require >=7 bits to represent the exponent which

is why we've reserved 8 bits for exponent (e) and remaining 32 - (1+8) = 23 bits for mantissa.

Hence, the representation is as:

0 (sign bit) e = 10000000 (binary of 128) and m = 1101000000....up to 23 bits

i.e. 0 10000000 11010000000000000000000

-----------------------------------------------------------------------------------------------------------------------------

IEEE double-precision floating-point standard representation: it provides a 64-bit representation of floating-point numbers as follows:-

here, > m = 1101000000....up to 52 bits (following IEEE double precision format)

> e-Bias = 1023 (1023 = max. no. in 10 bits)

So, you can choose e = 1024 because then 2(e-Bias) will be 2(1024 - 1023) = 2^1 and that is

what we want here. [ refer step 3]

In any case, e >= e-Bias, so e >= 1023 i.e. you'll require >=10 bits to represent the exponent

which is why we've reserved 11 bits for exponent (e) and remaining 64 - (1+11) = 52 bits for

mantissa.

Hence, the representation is as:

0 (sign bit) e = 10000000000 (binary of 1024) and m = 1101000000....up to 52 bits

i.e. 0 10000000000 1101000000000000000000000000000000000000000000000000

phewww... that's really huuuge!

-----------------------------------------------------------------------------------------------------------------------------

[ NOTE: for non-terminating real numbers, single-precision format only affords us 23 bits to

represent the fractional part. Thus, we've to settle for an approximation, rounding things to

the 23rd digit (and correspondingly, 52nd digit for double precision format). ]

(Images collected from: Google images)

-----------------------------------------------------------------------------------------------------------------------------

That's it!

For similar interesting posts, keep in touch with topics@today...💗

topics@today

Pages

Tuesday, June 8, 2021

Stuck in Fixed & Floating point?

No comments:

Post a Comment

You may like these:

HANDS ON JAVA(OOP - based uses) (Part-I)

Search This Blog