Range and Precision of a Fixed Point number (Need of Fixed Point Numbers)

May 29, 2019

Range and Precision

The range of a number gives the limits of the representation, while the precision gives the distance between successive numbers in the representation. The range and precision of a fixed-point number depend on the length of the word and the scaling.

Range

The following figure illustrates the range of representable numbers for an unsigned fixed-point number of size ws, scaling S, and bias B.

The following figure illustrates the range of representable numbers for a two's complement fixed-point number of size ws, scaling S, and bias B where the values of ws, scaling S, and bias B allow for both negative and positive numbers.

For both the signed and unsigned fixed-point numbers of any data type, the number of different bit patterns is 2ws.

For example, if the fixed-point data type is an integer with scaling defined as S=1 and B = 0, then the maximum unsigned value is 2ws−1, because zero must be represented. In two's complement, negative numbers must be represented as well as zero, so the maximum value is 2ws−1−1. Additionally, since there is only one representation for zero, there must be an unequal number of positive and negative numbers. This means there is a representation for −2ws−1 but not for 2ws−1.

Precision

The precision of a data type is given by the slope. In this usage, precision means the difference between neighboring representable values.

The precision of a fixed-point word depends on the word size and binary point location. Extending the precision of a word can always be accomplished with more bits, but you face practical limitations with this approach. Instead, you must carefully select the data type, word size, and scaling such that numbers are accurately represented. Rounding and padding with trailing zeros are typical methods implemented on processors to deal with the precision of binary words.

https://stackoverflow.com/questions/10067510/fixed-point-arithmetic-in-c-programming/11816993#11816993

The idea behind fixed-point arithmetic is that you store the values multiplied by a certain amount, use the multiplied values for all calculus, and divide it by the same amount when you want the result. The purpose of this technique is to use integer arithmetic (int, long...) while being able to represent fractions.

The usual and most efficient way of doing this in C is by using the bits shifting operators (<< and >>). Shifting bits is a quite simple and fast operation for the ALU and doing this have the property to multiply (<<) and divide (>>) the integer value by 2 on each shift (besides, many shifts can be done for exactly the same price of a single one). Of course, the drawback is that the multiplier must be a power of 2 (which is usually not a problem by itself as we don't really care about that exact multiplier value).

Search This Blog

InfoCompile