# Arithmetic overflow in the multiplication

Unlike most assemblers, the result of multiplication has the same number of digits as the multipliers in C / C + +. In order not to lose the accuracy, sometimes it is needed to perform additional operations for the multiplication.

Let us consider the task. The system determines the time since starting the program in the microprocessor’s ticks by calling the function:

`unsigned long long getTickCount ();`

Length unsigned long long - 64 bits. Converting to the physical units of time in the system, there is a constant:

`const unsigned long long TICKS_PER_SECOND = 1999000000ULL;`

It is required to determine the function of transferring ticks in a nanoseconds getNsec (unsigned long long ticks) with semantics:

`unsigned long long getNsecNaive (unsigned long long ticks) {`

static const unsigned long long NSEC_PER_SECOND = 1000000000ULL;

unsigned long long nsec = NSEC_PER_SECOND * ticks / TICKS_PER_SECOND;

return nsec;

}

It is necessary for the function getNsec () to provide the maximum possible accuracy. Parameter ticks can be as large or small. The calculations should be done for small ticks (up to 2 ^ 34) in the following order:

`(NSEC_PER_SECOND * ticks) / TICKS_PER_SECOND`

First we multiply, and then divide. When we NSEC_PER_SECOND <2 ^ 30, multiply, the overflow will not occur, since the result will be less than 2 ^ 64.

The calculations should be done for large (since a multiplication can cause overflow) in the following order:

`NSEC_PER_SECOND * (ticks / TICKS_PER_SECOND)`

The problem is that the second multiplier is always an integer, the result in decimal system always will end by nine zeros, actually is provided the seconds’ precision and function getNsec () should be renamed as getSec (). On the other hand, the unprocessed data allows providing the greater accuracy.

Considering the computer operations of multiplication and division of all possible 4 calculations, 2 more besides two that were discussed:

`(NSEC_PER_SECOND / TICKS_PER_SECOND) * ticks`

and

ticks / (TICKS_PER_SECOND / NSEC_PER_SECOND)

The first always gives zero, and the second - ticks, which is almost 50% error (which can be reduced to 0.1% in this case, but this error still is not the smallest possible).

So, at best we get the per-second accuracy. Solving the problem of increasing the accuracy we can in the following ways (in order of preference):

1. Perform calculations with the real (double) types

2. Lead arguments to the 128-bit integer

3. Use the function of MultDiv ()

These approaches may not be applied, because there are restrictions of platform (processor), programming environment and libraries, requirements of the project.

We use the following approach. Let the result of dividing ticks by TICKS_PER_SECOND obtain the partial quotient seconds and the remainder of r:

`unsigned long long seconds = ticks / TICKS_PER_SECOND;`

unsigned long long r = ticks% TICKS_PER_SECOND;

Then ticks = seconds * TICKS_PER_SECOND + r. Substitute for the formula nsec:

nsec = NSEC_PER_SECOND * (seconds * TICKS_PER_SECOND + r) / TICKS_PER_SECOND = NSEC_PER_SECOND * seconds + (NSEC_PER_SECOND * r) / TICKS_PER_SECOND. Since r <TICKS_PER_SECOND, (NSEC_PER_SECOND * r) will never have an overflow. The final function:

unsigned long long getNsec (unsigned long long ticks) {

static const unsigned long long NSEC_PER_SECOND = 1000000000ULL;

return NSEC_PER_SECOND * (ticks / TICKS_PER_SECOND)

+ ((NSEC_PER_SECOND) * (ticks% TICKS_PER_SECOND)) / TICKS_PER_SECOND;

}

The result turns out exactly if we would count NSEC_PER_SECOND * ticks as 128-bits value, and then it divided by TICKS_PER_SECOND, here is provided the maximum accuracy for the data source values and the particular number of digits of the result. The formula will not be simplified in any way in the operator: neither NSEC_PER_SECOND, nor / TICKS_PER_SECOND will not be taken out the brackets.

Vote for this post
Bring it to the Main Page |
||

## Comments

## Leave a Reply