Algorithm D, its implementation in

Hacker’s Delight, and elsewhere

- Purpose
- Introduction
- Generic implementation in Hacker’s Delight
- Specialised implementation in Hacker’s Delight
- Flawed implementation in libdivide
- Proper (and optimised) implementation …
- … in ANSI C
- … in Assembler

Algorithm D (Division of nonnegative integers):

**Note:** I added an introductory step D0 providing
definitions and descriptions.

Algorithm D(Division of nonnegative integers).

- D0. [Define]
- Let
`U`

be the dividend (or numerator) of`m+n`

digits, stored in an array of`m+n+1`

elements, onedigitper element, with the most significant digit in element`U[m+n−1]`

and the least significant digit in element`U[0]`

;

Let`V`

be the divisor (or denominator) of`n`

digits, stored in a second array of`n`

elements, with`n`

greater than 1, a non-zero most significant digit in element`V[n−1]`

and the least significant digit in element`V[0]`

;

Let`B`

be the base (or radix) of thedigits(alsolimbs,placesorwords), typically a power of 2.

(The algorithm computes the quotient`Q`

of`m+1`

digitsas`⌊U ⁄ V⌋`

(also`U`

ordivV`U ÷ V`

), and the remainder`R`

of`n`

digitsas`U`

modulo`V`

(also`U`

ormodV`U % V`

), using the followingprimitiveoperations:

addition or subtraction of two single-digit integers, giving a single-digit sum and a carry or a single-digit difference and a borrow;

multiplication of two single-digit integers, giving a double-digit product;

division of a double-digit integer by a single-digit integer, giving a single-digit quotient and a single-digit remainder.)- D1. [Normalize]
- Set
`D`

to`(B − 1) ÷ V[n−1]`

;

Multiply all digits of`U`

and`V`

by`D`

.

(On a binary computer, choose`D`

to be a power of 2 instead of the value provided above; any value of`D`

that results in`V[n−1]`

not less than`B ÷ 2`

will suffice. If`D`

is greater than 1, the eventual overflow digit of the dividend goes into element`U[m+n]`

.)- D2. [Initialize
`j`

]- Set the loop counter
`j`

to`m`

.- D3. [Calculate
`Q′`

]- Set
`Q′`

to`(U[n+j] * B + U[n−1+j]) ÷ V[n−1]`

;

Set`R′`

to`(U[n+j] * B + U[n−1+j]) % V[n−1]`

;

Test if`Q′`

equals`B`

or`Q′ * V[n−2]`

is greater than`R′ * B + U[n−2+j]`

;

If yes, then decrease`Q′`

by 1, increase`R′`

by`V[n−1]`

, and repeat this test while`R`

is less than`B`

.- D4. [Multiply and subtract]
- Replace
`(U[n+j]U[n−1+j]…U[j])`

by`(U[n+j]U[n−1+j]…U[j]) − Q′ * (V[n−1]…V[1]V[0])`

.

(The digits`(U[n+j]…U[j])`

should be kept positive; if the result of this step is actually negative,`(U[n+j]…U[j])`

should be left as the true value plus`B`

, namely as the^{n+1}`B`

’s complement of the true value, and a borrow to the left should be remembered.)- D5. [Test remainder]
- Set
`Q[j]`

to`Q′`

;

If the result of step D4 was negative, i.e. the subtraction needed a borrow, then proceed with step D6; otherwise proceed with step D7.- D6. [Add back]
- Decrease
`Q[j]`

by 1 and add`(0V[n−1]…V[1]V[0])`

to`(U[n+j]U[n−1+j]…U[1+j]U[j])`

.

(A carry will occur to the left of`U[n+j]`

, and it should be ignored since it cancels with the borrow that occurred in step D4.)- D7. [Loop on
`j`

]- Decrease
`j`

by 1;

Test if`j`

is not less than 0;

If yes, go back to step D3.- D8. [Unnormalize]
- Now
`(Q[m]…Q[1]Q[0])`

is the desired quotient, and the desired remainder may be obtained by dividing`(U[n−1]…U[1]U[0])`

by`D`

.

Algorithm Din ANSI C:

**Note:** I removed some parts of the code which are
not relevant here.

```
/* This divides an n-word dividend by an m-word divisor, giving an
n-m+1-word quotient and m-word remainder. The bignums are in arrays of
words. Here a "word" is 32 bits. This routine is designed for a 64-bit
machine which has a 64/64 division instruction. */ 1
[…]
/* q[0], r[0], u[0], and v[0] contain the LEAST significant words.
(The sequence is in little-endian order).
This is a fairly precise implementation of Knuth's Algorithm D, for a
binary computer with base b = 2**32. The caller supplies:
1. Space q for the quotient, m - n + 1 words (at least one).
2. Space r for the remainder (optional), n words.
3. The dividend u, m words, m >= 1.
4. The divisor v, n words, n >= 2.
The most significant digit of the divisor, v[n-1], must be nonzero. The
dividend u may have leading zeros; this just makes the algorithm take
longer and makes the quotient contain more leading zeros. A value of
NULL may be given for the address of the remainder to signify that the
caller does not want the remainder.
The program does not alter the input parameters u and v.
The quotient and remainder returned may have leading zeros. The
function itself returns a value of 0 for success and 1 for invalid
parameters (e.g., division by 0).
For now, we must have m >= n. Knuth's Algorithm D also requires
that the dividend be at least as long as the divisor. (In his terms,
m >= 0 (unstated). Therefore m+n >= n.) */
int divmnu(unsigned q[], unsigned r[],
const unsigned u[], const unsigned v[],
int m, int n) {
const unsigned long long b = 4294967296LL; // Number base (2**32).
unsigned *un, *vn; // Normalized form of u, v.
unsigned long long qhat; // Estimated quotient digit.
unsigned long long rhat; // A remainder.
unsigned long long p; // Product of two digits.
long long t, k;
int s, i, j;
if (m < n || n <= 1 || v[n-1] == 0)
return 1; // Return if invalid param.
[…]
/* Normalize by shifting v left just enough so that its high-order
bit is on, and shift u left the same amount. We may have to append a
high-order digit on the dividend; we do that unconditionally. */
s = nlz(v[n-1]); // 0 <= s <= 31.
vn = (unsigned *)alloca(4*n);
for (i = n - 1; i > 0; i--)
vn[i] = (v[i] << s) | ((unsigned long long)v[i-1] >> (32-s));
vn[0] = v[0] << s;
un = (unsigned *)alloca(4*(m + 1));
un[m] = (unsigned long long)u[m-1] >> (32-s);
for (i = m - 1; i > 0; i--)
un[i] = (u[i] << s) | ((unsigned long long)u[i-1] >> (32-s));
un[0] = u[0] << s;
for (j = m - n; j >= 0; j--) { // Main loop.
// Compute estimate qhat of q[j].
qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
#ifdef OPTIMIZE
rhat = (un[j+n]*b + un[j+n-1])%vn[n-1];
#else
2 rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
#endif
again:
#ifdef OPTIMIZE
if (qhat >= b || (unsigned)qhat*(unsigned long long)vn[n-2] > b*rhat + un[j+n-2])
#else
3 if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
#endif
{ qhat = qhat - 1;
rhat = rhat + vn[n-1];
if (rhat < b) goto again;
}
// Multiply and subtract.
k = 0;
for (i = 0; i < n; i++) {
#ifdef OPTIMIZE
p = (unsigned)qhat*(unsigned long long)vn[i];
#else
3 p = qhat*vn[i];
#endif
t = un[i+j] - k - (p & 0xFFFFFFFFLL);
un[i+j] = t;
k = (p >> 32) - (t >> 32);
}
t = un[j+n] - k;
un[j+n] = t;
q[j] = qhat; // Store quotient digit.
if (t < 0) { // If we subtracted too
q[j] = q[j] - 1; // much, add back.
k = 0;
for (i = 0; i < n; i++) {
t = (unsigned long long)un[i+j] + vn[i] + k;
un[i+j] = t;
k = t >> 32;
}
un[j+n] = un[j+n] + k;
}
} // End j.
// If the caller wants the remainder, unnormalize
// it and pass it back.
if (r != NULL) {
for (i = 0; i < n-1; i++)
r[i] = (un[i] >> s) | ((unsigned long long)un[i+1] << (32-s));
r[n-1] = un[n-1] >> s;
}
return 0;
}
[…]
```

This implementation exhibits the following misrepresentation and
shortcomings:
- Contrary to the highlighted part of its initial comment, this routine does not need a 64-bit machine; it but needs support for 64-bit integers and elementary 64-bit arithmetic operations, which ANSI C compilers provide since the last millennium.
- Although a
64/64 division instruction

is explicitly stated, the code doesn’t take full advantage of it: instead to use the`%`

operator to~~compute~~get the remainder of the 64÷64-bit division (which comesfor free

on 64-bit machines, and almost for free with software implementations on 32-bit machines), it but performs an extraneous 64×64-bit multiplication — which is**not**free, especially on 32-bit machines! - The previous argument also holds for the multiplications
`qhat*vn[n-2]`

and`qhat*vn[i]`

: while an optimising compiler**should**detect that these are 32×32-bit multiplications returning a 64-bit product, I would not count on it — better give the compiler the proper hints!

Algorithm Dbuilds upon a 64÷32-bit division returning a 32-bit quotient and a 32-bit remainder, and a 32×32-bit multiplication returning a 64-bit product!

**Note:** I removed some parts of the code which are
not relevant here.

```
/* Long division, unsigned (64/32 ==> 32).
This procedure performs unsigned "long division" i.e., division of a
64-bit unsigned dividend by a 32-bit unsigned divisor, producing a
32-bit quotient. In the overflow cases (divide by 0, or quotient
exceeds 32 bits), it returns a remainder of 0xFFFFFFFF (an impossible
value).
The dividend is u1 and u0, with u1 being the most significant word.
The divisor is parameter v. The value returned is the quotient.
[…] Several of the variables below could be
"short," but having them fullwords gives better code on gcc/Intel.
[…]
This is the version that's in the hacker book. */
unsigned divlu2(unsigned u1, unsigned u0, unsigned v,
unsigned *r) {
const unsigned b = 65536; // Number base (16 bits).
unsigned un1, un0, // Norm. dividend LSD's.
vn1, vn0, // Norm. divisor digits.
q1, q0, // Quotient digits.
un32, un21, un10,// Dividend digit pairs.
rhat; // A remainder.
int s; // Shift amount for norm.
if (u1 >= v) { // If overflow, set rem.
if (r != NULL) // to an impossible value,
*r = 0xFFFFFFFF; // and return the largest
return 0xFFFFFFFF;} // possible quotient.
s = nlz(v); // 0 <= s <= 31.
v = v << s; // Normalize divisor.
vn1 = v >> 16; // Break divisor up into
vn0 = v & 0xFFFF; // two 16-bit digits.
un32 = (u1 << s) | (u0 >> 32 - s) & (-s >> 31);
un10 = u0 << s; // Shift dividend left.
un1 = un10 >> 16; // Break right half of
un0 = un10 & 0xFFFF; // dividend into two digits.
q1 = un32/vn1; // Compute the first
#ifdef OPTIMIZE
rhat = un32%vn1; // quotient digit, q1.
#else
rhat = un32 - q1*vn1; // quotient digit, q1.
#endif
again1:
if (q1 >= b || q1*vn0 > b*rhat + un1) {
q1 = q1 - 1;
rhat = rhat + vn1;
if (rhat < b) goto again1;}
un21 = un32*b + un1 - q1*v; // Multiply and subtract.
q0 = un21/vn1; // Compute the second
#ifdef OPTIMIZE
rhat = un21%vn1; // quotient digit, q1.
#else
rhat = un21 - q0*vn1; // quotient digit, q0.
#endif
again2:
if (q0 >= b || q0*vn0 > b*rhat + un0) {
q0 = q0 - 1;
rhat = rhat + vn1;
if (rhat < b) goto again2;}
if (r != NULL) // If remainder is wanted,
*r = (un21*b + un0 - q0*v) >> s; // return it.
return q1*b + q0;
}
[…]
```

Especially notice the highlighted part of the initial comment!
**Warning:** don’t use this function; it
demonstrates that its ~~author~~ copyist does not
understand the original code and fails to read or ignores the
comment, it exhibits poor performance, and it has a serious bug!

```
// libdivide.h
// Copyright 2010 - 2018 ridiculous_fish
[…]
// Code taken from Hacker's Delight:
// http://www.hackersdelight.org/HDcode/divlu.c.
// License permits inclusion here per:
// http://www.hackersdelight.org/permissions.htm
static uint64_t libdivide_128_div_64_to_64(uint64_t u1, uint64_t u0, uint64_t v, uint64_t *r) {
1 const uint64_t b = (1ULL << 32); // Number base (16 bits)
2 uint64_t un1, un0; // Norm. dividend LSD's
2 uint64_t vn1, vn0; // Norm. divisor digits
2 uint64_t q1, q0; // Quotient digits
3 uint64_t un64, un21, un10; // Dividend digit pairs
uint64_t rhat; // A remainder
#ifdef OPTIMIZE
uint64_t p; // A product
#endif
int32_t s; // Shift amount for norm
// If overflow, set rem. to an impossible value,
// and return the largest possible quotient
if (u1 >= v) {
if (r != NULL)
*r = (uint64_t) -1;
return (uint64_t) -1;
}
// count leading zeros
s = libdivide__count_leading_zeros64(v);
if (s > 0) {
// Normalize divisor
v = v << s;
5 un64 = (u1 << s) | ((u0 >> (64 - s)) & (-s >> 31));
4 un10 = u0 << s; // Shift dividend left
} else {
// Avoid undefined behavior
6 un64 = u1 | u0;
4 un10 = u0;
}
// Break divisor up into two 32-bit digits
7 vn1 = v >> 32;
7 vn0 = v & 0xFFFFFFFF;
// Break right half of dividend into two digits
7 un1 = un10 >> 32;
7 un0 = un10 & 0xFFFFFFFF;
// Compute the first quotient digit, q1
q1 = un64 / vn1;
#ifdef OPTIMIZE
rhat = un64 % vn1;
p = q1 * vn0;
while (q1 >= b || p > b * rhat + un1) {
q1 = q1 - 1;
rhat = rhat + vn1;
if (rhat >= b)
break;
p = p - vn0;
}
#else
8 rhat = un64 - q1 * vn1;
9 while (q1 >= b || q1 * vn0 > b * rhat + un1) {
q1 = q1 - 1;
rhat = rhat + vn1;
if (rhat >= b)
break;
}
#endif
// Multiply and subtract
un21 = un64 * b + un1 - q1 * v;
// Compute the second quotient digit
q0 = un21 / vn1;
#ifdef OPTIMIZE
rhat = un21 % vn1;
p = q0 * vn0;
while (q0 >= b || p > b * rhat + un0) {
q0 = q0 - 1;
rhat = rhat + vn1;
if (rhat >= b)
break;
p = p - vn0;
}
#else
8 rhat = un21 - q0 * vn1;
9 while (q0 >= b || q0 * vn0 > b * rhat + un0) {
q0 = q0 - 1;
rhat = rhat + vn1;
if (rhat >= b)
break;
}
#endif
// If remainder is wanted, return it
if (r != NULL)
*r = (un21 * b + un0 - q0 * v) >> s;
return q1 * b + q0;
}
[…]
```

This implementation has an obvious bug and multiple deficiencies
(in order of appearance):
- The number base is 2
^{32}, i.e. 32 bits, not 16 bits, as stated in the comment. - The
normalised

(least significant) digits of dividend and divisor as well as the digits of the quotient fit in 32 bits, i.e. the variables`un1`

,`un0`

,`vn1`

,`vn0`

,`q1`

and`q0`

can and should of course be declared as`uint32_t`

, not as`uint64_t`

. - The proper name for the variable with the two most significant
digits of the dividend is
`un32`

, not`un64`

. - The variables
`un`

and~~64~~32`un10`

are superfluous: the dividend`u1`

and`u0`

can and should of course be normalised in place, like the divisor`v`

. - The term
`& (-s >> 31)`

is superfluous: in the original code it is used to avoid the conditional expression`if (s > 0) … else …`

and evaluates to either`& 0`

or`& -1`

(alias`& 0xFFFFFFFF`

) there; here it always evaluates to`& -1`

(alias`& 0xFFFFFFFFFFFFFFFF`

). - The term
`| u0;`

is wrong and**must**be removed. - The variables
`un1`

,`un0`

,`vn1`

and`vn0`

are superfluous: they can and of course should be replaced by the expressions`(uint32_t) (un10 >> 32)`

,`(uint32_t) un10`

,`(uint32_t) (v >> 32)`

and`(uint32_t) v`

respectively. - The remainders should be computed as
`un`

and~~64~~32 % vn1`un21 % vn1`

, not as`un`

and~~64~~32 - q1 * vn1`un21 - q0 * vn1`

using expensive 64×64-bit multiplications: both hardware division instructions and software division routines typically return quotient and remainder together. - The repeated (expensive) computation of the 64×64-bit
products
`q1 * vn0`

and`q0 * vn0`

inside the`while …`

loops should be moved outside the loops.

Algorithm Dfor unsigned 128÷64-bit division on 32-bit processors is based on a 64÷32-bit division returning a 64-bit quotient and a 32-bit remainder, trivially implemented per

long(alias

schoolbook) division using a 64÷32-bit division returning a 32-bit quotient and a 32-bit remainder:

```
// Copyleft © 2011-2019, Stefan Kanthak <stefan.kanthak@nexgo.de>
// Divide a 128-bit integer dividend, supplied as a pair of 64-bit
// integers in u0 and u1, by a 64-bit integer divisor, supplied in v;
// return the 64-bit quotient and optionally the 64-bit remainder in *r.
unsigned long long divllu(unsigned long long u0,
unsigned long long u1,
unsigned long long v,
unsigned long long *r) {
unsigned long long qhat; // A quotient.
unsigned long long rhat; // A remainder.
unsigned long long uhat; // A dividend digit pair.
unsigned long q0, q1; // Quotient digits.
unsigned long s; // Shift amount for norm.
if (u1 >= v) { // If overflow, set rem.
if (r != NULL) // to an impossible value,
*r = ~0ULL; // and return the largest
return ~0ULL; // possible quotient.
}
s = __builtin_clzll(v); // 0 <= s <= 63.
if (s != 0) {
v <<= s; // Normalize divisor.
u1 <<= s; // Shift dividend left.
u1 |= u0 >> (64 - s);
u0 <<= s;
}
// Compute high quotient digit.
qhat = u1 / (unsigned long) (v >> 32);
rhat = u1 % (unsigned long) (v >> 32);
while ((unsigned long) (qhat >> 32) != 0 ||
// Both qhat and rhat are less 2**32 here!
(unsigned long long) (unsigned long) qhat * (unsigned long) v >
((rhat << 32) | (unsigned long) (u0 >> 32))) {
qhat -= 1;
rhat += (unsigned long) (v >> 32);
if ((unsigned long) (rhat >> 32) != 0) break;
}
q1 = (unsigned long) qhat;
// Multiply and subtract.
uhat = ((u1 << 32) | (unsigned long) (u0 >> 32)) - q1 * v;
// Compute low quotient digit.
qhat = uhat / (unsigned long) (v >> 32);
rhat = uhat % (unsigned long) (v >> 32);
while ((unsigned long) (qhat >> 32) != 0 ||
// Both qhat and rhat are less 2**32 here!
(unsigned long long) (unsigned long) qhat * (unsigned long) v >
((rhat << 32) | (unsigned long) u0)) {
qhat -= 1;
rhat += (unsigned long) (v >> 32);
if ((unsigned long) (rhat >> 32) != 0) break;
}
q0 = (unsigned long) qhat;
if (r != NULL) // If remainder is wanted, return it.
*r = (((uhat << 32) | (unsigned long) u0) - q0 * v) >> s;
return ((unsigned long long) q1 << 32) | q0;
}
```

**Note:** this code runs on 33 year old
Intel^{®} i386 processors, and of course
on current compatible processors too, about twice as fast as the
ANSI C
version.

```
; Copyright © 2011-2019, Stefan Kanthak <stefan.kanthak@nexgo.de>
.386
.model flat, C
.code
divllu proc public
push ebx
push ebp
push edi
push esi
[…]
mov esi, [esp+40]
mov edi, [esp+36] ; esi:edi = divisor
NORMALIZE:
mov ecx, 31
bsr ebx, esi ; ebx = index of leading '1' bit in high dword of divisor
jz @F ; high dword of divisor (and high dword of dividend) = 0?
mov edx, [esp+32]
mov eax, [esp+28] ; edx:eax = high qword of dividend
sub ecx, ebx ; ecx = number of leading '0' bits in (high dword of) divisor
jz READY ; number of leading '0' bits = 0?
shld esi, edi, cl
shl edi, cl ; esi:edi = divisor'
mov [esp+40], esi
mov [esp+36], edi ; save normalised divisor'
mov ebx, [esp+24]
mov ebp, [esp+20] ; ebx:ebp = low qword of dividend
shld edx, eax, cl
shld eax, ebx, cl
shld ebx, ebp, cl
shl ebp, cl ; edx:eax:ebx:ebp = dividend'
mov [esp+32], edx
mov [esp+28], eax
mov [esp+24], ebx
mov [esp+20], ebp ; save normalised dividend'
jmp READY
@@:
mov edx, [esp+28]
mov eax, [esp+24]
mov ebx, [esp+20] ; edx:eax:ebx = lowest dwords of dividend
; = highest dwords of dividend'
mov esi, edi ; esi = low dword of divisor
; = high dword of divisor'
bsr edi, edi ; edi = index of leading '1' bit in low dword of divisor
; = index of leading '1' bit in high dword of divisor'
sub ecx, edi ; ecx = number of leading '0' bits in high dword of divisor'
lea ecx, [ecx+32] ; ecx = number of leading '0' bits in divisor
jz @F ; number of leading '0' bits = 32?
shl esi, cl ; esi:0 = divisor'
shld edx, eax, cl
shld eax, ebx, cl
shl ebx, cl ; edx:eax:ebx:0 = dividend'
@@:
xor edi, edi ; edi = 0
; = low dword of divisor'
; = low dword of dividend'
mov [esp+40], esi
mov [esp+36], edi ; save normalised divisor'
mov [esp+32], edx
mov [esp+28], eax
mov [esp+24], ebx
mov [esp+20], esi ; save normalised dividend'
READY:
push ecx ; save number of leading '0' bits in divisor
DIVISION1:
cmp esi, edx
jna OVERFLOW1 ; overflow with normal division?
; "normal" division
NORMAL1:
div esi ; eax = (low dword of) quotient',
; edx = (low dword of) remainder'
mov ebp, edx ; ebp = (low dword of) remainder'
mov ecx, eax
xor ebx, ebx ; ebx:ecx = quotient'
jmp CHECK1
; "long" division
OVERFLOW1:
mov ecx, eax
mov eax, edx
xor edx, edx
div esi ; eax = high dword of quotient',
; edx = high dword of remainder'
mov ebx, eax ; ebx = high dword of quotient'
mov eax, ecx
div esi ; eax = low dword of quotient',
; edx = (low dword of) remainder'
mov ebp, edx ; ebp = (low dword of) remainder'
mov ecx, eax ; ebx:ecx = quotient'
ADJUST1:
add ecx, -1
adc ebx, -1 ; ebx:ecx = quotient' - 1
add ebp, esi ; ebp = (low dword of) remainder'
; + high dword of divisor'
jc BREAK1 ; remainder' >= 2**32?
AGAIN1:
test ebx, ebx
jnz ADJUST1 ; quotient' >= 2**32?
CHECK1:
mov eax, edi ; eax = low dword of divisor'
mul ecx ; edx:eax = low dword of divisor'
; * low dword of quotient'
ifdef JMPLESS
cmp [esp+28], eax
mov eax, ebp
sbb eax, edx
jb ADJUST1
else
cmp edx, ebp
jb BREAK1
ja ADJUST1
cmp eax, [esp+28]
ja ADJUST1
endif
BREAK1:
push ecx ; save (low dword of) quotient'
mov eax, edi ; eax = low dword of divisor'
mul ecx ; edx:eax = low dword of divisor'
; * (low dword of) quotient'
imul ecx, esi ; ecx = (low dword of) quotient'
; * high dword of divisor'
add ecx, edx ; ecx:eax = divisor'
; * (low dword of) quotient'
mov ebx, eax ; ecx:ebx = divisor'
; * (low dword of) quotient'
mov eax, [esp+32]
mov edx, [esp+36] ; edx:eax = "inner" qword of dividend'
sub eax, ebx
sbb edx, ecx ; edx:eax = "inner" qword of dividend"
; = intermediate (normalised) remainder
push eax ; save low dword of "inner" qword of dividend"
DIVISION2:
cmp esi, edx
jna OVERFLOW2 ; overflow with normal division?
; "normal" division
NORMAL2:
div esi ; eax = (low dword of) quotient",
; edx = (low dword of) remainder"
mov ebp, edx ; ebp = (low dword of) remainder"
mov ecx, eax
xor ebx, ebx ; ebx:ecx = quotient"
jmp CHECK2
; "long" division
OVERFLOW2:
mov ecx, eax
mov eax, edx
xor edx, edx
div esi ; eax = high dword of quotient",
; edx = high dword of remainder"
mov ebx, eax ; ebx = high dword of quotient"
mov eax, ecx
div esi ; eax = low dword of quotient",
; edx = (low dword of) remainder"
mov ebp, edx ; ebp = (low dword of) remainder"
mov ecx, eax ; ebx:ecx = quotient"
ADJUST2:
add ecx, -1
adc ebx, -1 ; ebx:ecx = quotient" - 1
add ebp, esi ; ebp = (low dword of) remainder"
; + high dword of divisor'
jc BREAK2 ; remainder" >= 2**32?
AGAIN2:
test ebx, ebx
jnz ADJUST2 ; quotient" >= 2**32?
CHECK2:
mov eax, edi ; eax = low dword of divisor'
mul ecx ; edx:eax = low dword of divisor'
; * low dword of quotient"
ifdef JMPLESS
cmp [esp+32], eax
mov eax, ebp
sbb eax, edx
jb ADJUST2
else
cmp edx, ebp
jb BREAK2
ja ADJUST2
cmp eax, [esp+32]
ja ADJUST2
endif
BREAK2:
pop ebx ; ebx = low dword of "inner" qword of dividend"
push ecx ; save (low dword of) quotient"
mov ebp, [esp+56] ; ebp = address of remainder
test ebp, ebp
jz QUOTIENT ; address of remainder = 0?
REMAINDER:
mov eax, edi ; eax = low dword of divisor'
mul ecx ; edx:eax = low dword of divisor'
; * (low dword of) quotient"
imul ecx, esi ; ecx = (low dword of) quotient"
; * high dword of divisor'
add edx, ecx ; edx:eax = (low dword of) quotient"
; * divisor'
mov edi, [esp+32]
mov esi, ebx
sub edi, eax
sbb esi, edx ; esi:edi = (normalised) remainder
mov ecx, [esp+8] ; ecx = number of leading '0' bits in divisor
; = shift count
jecxz @F ; shift count = 0?
shrd edi, esi, cl
shr esi, cl ; esi:edi = remainder
test cl, 32
jz @F ; shift count < 32?
mov edi, esi
xor esi, esi ; esi:edi = remainder
@@:
mov [ebp], edi
mov [ebp+4], esi ; store remainder
QUOTIENT:
pop eax
pop edx ; edx:eax = quotient
pop ecx
pop esi
pop edi
pop ebp
pop ebx
ret
divllu endp
end
```

don’t hesitate to

**Notes:** I dislike
HTML (and
even weirder formats too) in email, I prefer to receive plain text.

I also expect to see your full (real) name as sender, not your
nickname!

Emails in weird formats and without a proper sender name are likely
to be discarded.

I abhor top posts and expect inline quotes in replies.

- The software and the documentation on this site are provided
as is

without any warranty, neither express nor implied.

In no event will the author be held liable for any damage(s) arising from the use of the software or the documentation. - Permission is granted to use the
**current**version of the software and the**current**version of the documentation solely for personal private and non-commercial purposes.

An individuals use of the software or the documentation in his or her capacity or function as an agent, (independent) contractor, employee, member or officer of a business, corporation or organization (commercial or non-commercial) does not qualify as personal private and non-commercial purpose. - Without written approval from the author the software or the
documentation
**must not**be used for a business, for commercial, corporate, governmental, military or organizational purposes of any kind, or in a commercial, corporate, governmental, military or organizational environment of any kind. - Redistribution of the software and the documentation is allowed
only in unmodified form of its
**current**version and free of charge.

cookies.

The service provider for *.homepage.t-online.de, Deutsche Telekom AG,

- records every visitor of this web site in a log file; IP adresses are pseudonymised, personal data are not stored.
- sets a
session cookie

.

Copyright © 2005–2019 • Stefan Kanthak • <stefan.kanthak@nexgo.de>