data:image/s3,"s3://crabby-images/a2743/a2743fed0a7f4e5bed2490d3c97575df5f49e748" alt="Division algorithm"
A division algorithm is an algorithm which, given two integers N and D (respectively the numerator and the denominator), computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software.
Division algorithms fall into two main categories: slow division and fast division. Slow division algorithms produce one digit of the final quotient per iteration. Examples of slow division include restoring, non-performing restoring, non-restoring, and SRT division. Fast division methods start with a close approximation to the final quotient and produce twice as many digits of the final quotient on each iteration.Newton–Raphson and Goldschmidt algorithms fall into this category.
Variants of these algorithms allow using fast multiplication algorithms. It results that, for large integers, the computer time needed for a division is the same, up to a constant factor, as the time needed for a multiplication, whichever multiplication algorithm is used.
Discussion will refer to the form , where
- N = numerator (dividend)
- D = denominator (divisor)
is the input, and
is the output.
Division by repeated subtraction
The simplest division algorithm, historically incorporated into a greatest common divisor algorithm presented in Euclid's Elements, Book VII, Proposition 1, finds the remainder given two positive integers using only subtractions and comparisons:
R := N Q := 0 while R ≥ D do R := R − D Q := Q + 1 end return (Q,R)
The proof that the quotient and remainder exist and are unique (described at Euclidean division) gives rise to a complete division algorithm, applicable to both negative and positive numbers, using additions, subtractions, and comparisons:
function divide(N, D) if D = 0 then error(DivisionByZero) end if D < 0 then (Q, R) := divide(N, −D); return (−Q, R) end if N < 0 then (Q,R) := divide(−N, D) if R = 0 then return (−Q, 0) else return (−Q − 1, D − R) end end -- At this point, N ≥ 0 and D > 0 return divide_unsigned(N, D) end function divide_unsigned(N, D) Q := 0; R := N while R ≥ D do Q := Q + 1 R := R − D end return (Q, R) end
This procedure always produces R ≥ 0. Although very simple, it takes Ω(Q) steps, and so is exponentially slower than even slow division algorithms like long division. It is useful if Q is known to be small (being an output-sensitive algorithm), and can serve as an executable specification.
Long division
Long division is the standard algorithm used for pen-and-paper division of multi-digit numbers expressed in decimal notation. It shifts gradually from the left to the right end of the dividend, subtracting the largest possible multiple of the divisor (at the digit level) at each stage; the multiples then become the digits of the quotient, and the final difference is then the remainder.
When used with a binary radix, this method forms the basis for the (unsigned) integer division with remainder algorithm below. Short division is an abbreviated form of long division suitable for one-digit divisors. Chunking – also known as the partial quotients method or the hangman method – is a less-efficient form of long division which may be easier to understand. By allowing one to subtract more multiples than what one currently has at each stage, a more freeform variant of long division can be developed as well.
Integer division (unsigned) with remainder
The following algorithm, the binary version of the famous long division, will divide N by D, placing the quotient in Q and the remainder in R. In the following pseudo-code, all values are treated as unsigned integers.
if D = 0 then error(DivisionByZeroException) end Q := 0 -- Initialize quotient and remainder to zero R := 0 for i := n − 1 .. 0 do -- Where n is number of bits in N R := R << 1 -- Left-shift R by 1 bit R(0) := N(i) -- Set the least-significant bit of R equal to bit i of the numerator if R ≥ D then R := R − D Q(i) := 1 end end
Example
If we take N=11002 (1210) and D=1002 (410)
Step 1: Set R=0 and Q=0
Step 2: Take i=3 (one less than the number of bits in N)
Step 3: R=00 (left shifted by 1)
Step 4: R=01 (setting R(0) to N(i))
Step 5: R < D, so skip statement
Step 2: Set i=2
Step 3: R=010
Step 4: R=011
Step 5: R < D, statement skipped
Step 2: Set i=1
Step 3: R=0110
Step 4: R=0110
Step 5: R>=D, statement entered
Step 5b: R=10 (R−D)
Step 5c: Q=10 (setting Q(i) to 1)
Step 2: Set i=0
Step 3: R=100
Step 4: R=100
Step 5: R>=D, statement entered
Step 5b: R=0 (R−D)
Step 5c: Q=11 (setting Q(i) to 1)
end
Q=112 (310) and R=0.
Slow division methods
Slow division methods are all based on a standard recurrence equation
where:
- Rj is the j-th partial remainder of the division
- B is the radix (base, usually 2 internally in computers and calculators)
- qn − (j + 1) is the digit of the quotient in position n−(j+1), where the digit positions are numbered from least-significant 0 to most significant n−1
- n is number of digits in the quotient
- D is the divisor
Restoring division
Restoring division operates on fixed-point fractional numbers and depends on the assumption 0 < D < N.[citation needed]
The quotient digits q are formed from the digit set {0,1}.
The basic algorithm for binary (radix 2) restoring division is:
R := N D := D << n -- R and D need twice the word width of N and Q for i := n − 1 .. 0 do -- For example 31..0 for 32 bits R := 2 * R − D -- Trial subtraction from shifted value (multiplication by 2 is a shift in binary representation) if R >= 0 then q(i) := 1 -- Result-bit 1 else q(i) := 0 -- Result-bit 0 R := R + D -- New partial remainder is (restored) shifted value end end -- Where: N = numerator, D = denominator, n = #bits, R = partial remainder, q(i) = bit #i of quotient
Non-performing restoring division is similar to restoring division except that the value of 2R is saved, so D does not need to be added back in for the case of R < 0.
Non-restoring division
Non-restoring division uses the digit set {−1, 1} for the quotient digits instead of {0, 1}. The algorithm is more complex, but has the advantage when implemented in hardware that there is only one decision and addition/subtraction per quotient bit; there is no restoring step after the subtraction, which potentially cuts down the numbers of operations by up to half and lets it be executed faster. The basic algorithm for binary (radix 2) non-restoring division of non-negative numbers is:[verification needed]
R := N D := D << n -- R and D need twice the word width of N and Q for i = n − 1 .. 0 do -- for example 31..0 for 32 bits if R >= 0 then q(i) := +1 R := 2 * R − D else q(i) := −1 R := 2 * R + D end if end -- Note: N=numerator, D=denominator, n=#bits, R=partial remainder, q(i)=bit #i of quotient.
Following this algorithm, the quotient is in a non-standard form consisting of digits of −1 and +1. This form needs to be converted to binary to form the final quotient. Example:
Convert the following quotient to the digit set {0,1}: | |
Start: | |
1. Form the positive term: | |
2. Mask the negative term: | |
3. Subtract: | |
|
If the −1 digits of are stored as zeros (0) as is common, then
is
and computing
is trivial: perform a ones' complement (bit by bit complement) on the original
.
Q := Q − bit.bnot(Q) -- Appropriate if −1 digits in Q are represented as zeros as is common.
Finally, quotients computed by this algorithm are always odd, and the remainder in R is in the range −D ≤ R < D. For example, 5 / 2 = 3 R −1. To convert to a positive remainder, do a single restoring step after Q is converted from non-standard form to standard form:
if R < 0 then Q := Q − 1 R := R + D -- Needed only if the remainder is of interest. end if
The actual remainder is R >> n. (As with restoring division, the low-order bits of R are used up at the same rate as bits of the quotient Q are produced, and it is common to use a single shift register for both.)
SRT division
SRT division is a popular method for division in many microprocessor implementations. The algorithm is named after D. W. Sweeney of IBM, James E. Robertson of University of Illinois, and K. D. Tocher of Imperial College London. They all developed the algorithm independently at approximately the same time (published in February 1957, September 1958, and January 1958 respectively).
SRT division is similar to non-restoring division, but it uses a lookup table based on the dividend and the divisor to determine each quotient digit.
The most significant difference is that a redundant representation is used for the quotient. For example, when implementing radix-4 SRT division, each quotient digit is chosen from five possibilities: { −2, −1, 0, +1, +2 }. Because of this, the choice of a quotient digit need not be perfect; later quotient digits can correct for slight errors. (For example, the quotient digit pairs (0, +2) and (1, −2) are equivalent, since 0×4+2 = 1×4−2.) This tolerance allows quotient digits to be selected using only a few most-significant bits of the dividend and divisor, rather than requiring a full-width subtraction. This simplification in turn allows a radix higher than 2 to be used.
Like non-restoring division, the final steps are a final full-width subtraction to resolve the last quotient bit, and conversion of the quotient to standard binary form.
The Intel Pentium processor's infamous floating-point division bug was caused by an incorrectly coded lookup table. Five of the 1066 entries had been mistakenly omitted.
Fast division methods
Newton–Raphson division
Newton–Raphson uses Newton's method to find the reciprocal of and multiply that reciprocal by
to find the final quotient
.
The steps of Newton–Raphson division are:
- Calculate an estimate
for the reciprocal
of the divisor
.
- Compute successively more accurate estimates
of the reciprocal. This is where one employs the Newton–Raphson method as such.
- Compute the quotient by multiplying the dividend by the reciprocal of the divisor:
.
In order to apply Newton's method to find the reciprocal of , it is necessary to find a function
that has a zero at
. The obvious such function is
, but the Newton–Raphson iteration for this is unhelpful, since it cannot be computed without already knowing the reciprocal of
(moreover it attempts to compute the exact reciprocal in one step, rather than allow for iterative improvements). A function that does work is
, for which the Newton–Raphson iteration gives
which can be calculated from using only multiplication and subtraction, or using two fused multiply–adds.
From a computation point of view, the expressions and
are not equivalent. To obtain a result with a precision of 2n bits while making use of the second expression, one must compute the product between
and
with double the given precision of
(n bits).[citation needed] In contrast, the product between
and
need only be computed with a precision of n bits, because the leading n bits (after the binary point) of
are zeros.
If the error is defined as , then:
This squaring of the error at each iteration step – the so-called quadratic convergence of Newton–Raphson's method – has the effect that the number of correct digits in the result roughly doubles for every iteration, a property that becomes extremely valuable when the numbers involved have many digits (e.g. in the large integer domain). But it also means that the initial convergence of the method can be comparatively slow, especially if the initial estimate is poorly chosen.
Initial estimate
For the subproblem of choosing an initial estimate , it is convenient to apply a bit-shift to the divisor D to scale it so that 0.5 ≤ D ≤ 1. Applying the same bit-shift to the numerator N ensures the quotient does not change. Once within a bounded range, a simple polynomial approximation can be used to find an initial estimate.
The linear approximation with mimimum worst-case absolute error on interval the interval is:
The coefficients of the linear approximation are determined as follows. The absolute value of the error is
. The minimum of the maximum absolute value of the error is determined by the Chebyshev equioscillation theorem applied to
. The local minimum of
occurs when
, which has solution
. The function at that minimum must be of opposite sign as the function at the endpoints, namely,
. The two equations in the two unknowns have a unique solution
and
, and the maximum error is
. Using this approximation, the absolute value of the error of the initial value is less than
The best quadratic fit to in the interval is
It is chosen to make the error equal to a re-scaled third order Chebyshev polynomial of the first kind, and gives an absolute value of the error less than or equal to 1/99. This improvement is equivalent to Newton–Raphson iterations, at a computational cost of less than one iteration.
It is possible to generate a polynomial fit of degree larger than 2, computing the coefficients using the Remez algorithm. The trade-off is that the initial guess requires more computational cycles but hopefully in exchange for fewer iterations of Newton–Raphson.
Since for this method the convergence is exactly quadratic, it follows that, from an initial error ,
iterations will give an answer accurate to
binary places. Typical values are:
Iterations | |||||
---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | |
3.09 | 7.17 | 15.35 | 31.70 | 64.40 | |
5.63 | 12.26 | 25.52 | 52.03 | 105.07 |
A quadratic initial estimate plus two iterations is accurate enough for IEEE single precision, but three iterations are marginal for double precision. A linear initial estimate plus four iterations is sufficient for both double and double extended formats.
Pseudocode
The following computes the quotient of N and D with a precision of P binary places:
Express D as M × 2e where 1 ≤ M < 2 (standard floating point representation) D' := D / 2e+1// scale between 0.5 and 1, can be performed with bit shift / exponent subtraction N' := N / 2e+1 X := 48/17 − 32/17 × D' // precompute constants with same precision as D repeat
times // can be precomputed based on fixed P X := X + X × (1 - D' × X) end return N' × X
For example, for a double-precision floating-point division, this method uses 10 multiplies, 9 adds, and 2 shifts.
Cubic iteration
There is an iteration which uses three multiplications to cube the error:
The Yiεi term is new.
Expanding out the above, can be written as
with the result that the error term
This is 3/2 the computation of the quadratic iteration, but achieves as much convergence, so is slightly more efficient. Put another way, two iterations of this method raise the error to the ninth power at the same computational cost as three quadratic iterations, which only raise the error to the eighth power.
The number of correct bits after iterations is
binary places. Typical values are:
Iterations | ||||
---|---|---|---|---|
0 | 1 | 2 | 3 | |
3.09 | 11.26 | 35.79 | 109.36 | |
5.63 | 18.89 | 58.66 | 177.99 |
A quadratic initial estimate plus two cubic iterations provides ample precision for an IEEE double-precision result. It is also possible to use a mixture of quadratic and cubic iterations.
Using at least one quadratic iteration ensures that the error is positive, i.e. the reciprocal is underestimated.: 370 This can simplify a following rounding step if an exactly-rounded quotient is required.
Using higher degree polynomials in either the initialization or the iteration results in a degradation of performance because the extra multiplications required would be better spent on doing more iterations.[citation needed]
Goldschmidt division
Goldschmidt division (after Robert Elliott Goldschmidt) uses an iterative process of repeatedly multiplying both the dividend and divisor by a common factor Fi, chosen such that the divisor converges to 1. This causes the dividend to converge to the sought quotient Q:
The steps for Goldschmidt division are:
- Generate an estimate for the multiplication factor Fi .
- Multiply the dividend and divisor by Fi .
- If the divisor is sufficiently close to 1, return the dividend, otherwise, loop to step 1.
Assuming N/D has been scaled so that 0 < D < 1, each Fi is based on D:
Multiplying the dividend and divisor by the factor yields:
After a sufficient number k of iterations .
The Goldschmidt method is used in AMD Athlon CPUs and later models. It is also known as Anderson Earle Goldschmidt Powers (AEGP) algorithm and is implemented by various IBM processors. Although it converges at the same rate as a Newton–Raphson implementation, one advantage of the Goldschmidt method is that the multiplications in the numerator and in the denominator can be done in parallel.
Binomial theorem
The Goldschmidt method can be used with factors that allow simplifications by the binomial theorem. Assume has been scaled by a power of two such that
. We choose
and
. This yields
.
After n steps , the denominator
can be rounded to 1 with a relative error
which is maximum at when
, thus providing a minimum precision of
binary digits.
Large-integer methods
Methods designed for hardware implementation generally do not scale to integers with thousands or millions of decimal digits; these frequently occur, for example, in modular reductions in cryptography. For these large integers, more efficient division algorithms transform the problem to use a small number of multiplications, which can then be done using an asymptotically efficient multiplication algorithm such as the Karatsuba algorithm, Toom–Cook multiplication or the Schönhage–Strassen algorithm. The result is that the computational complexity of the division is of the same order (up to a multiplicative constant) as that of the multiplication. Examples include reduction to multiplication by Newton's method as described above, as well as the slightly faster ,Barrett reduction and Montgomery reduction algorithms.[verification needed] Newton's method is particularly efficient in scenarios where one must divide by the same divisor many times, since after the initial Newton inversion only one (truncated) multiplication is needed for each division.
Division by a constant
The division by a constant D is equivalent to the multiplication by its reciprocal. Since the denominator is constant, so is its reciprocal (1/D). Thus it is possible to compute the value of (1/D) once at compile time, and at run time perform the multiplication N·(1/D) rather than the division N/D. In floating-point arithmetic the use of (1/D) presents little problem, but in integer arithmetic the reciprocal will always evaluate to zero (assuming |D| > 1).
It is not necessary to use specifically (1/D); any value (X/Y) that reduces to (1/D) may be used. For example, for division by 3, the factors 1/3, 2/6, 3/9, or 194/582 could be used. Consequently, if Y were a power of two the division step would reduce to a fast right bit shift. The effect of calculating N/D as (N·X)/Y replaces a division with a multiply and a shift. Note that the parentheses are important, as N·(X/Y) will evaluate to zero.
However, unless D itself is a power of two, there is no X and Y that satisfies the conditions above. Fortunately, (N·X)/Y gives exactly the same result as N/D in integer arithmetic even when (X/Y) is not exactly equal to 1/D, but "close enough" that the error introduced by the approximation is in the bits that are discarded by the shift operation.Barrett reduction uses powers of 2 for the value of Y to make division by Y a simple right shift.
As a concrete fixed-point arithmetic example, for 32-bit unsigned integers, division by 3 can be replaced with a multiply by 2863311531/233, a multiplication by 2863311531 (hexadecimal 0xAAAAAAAB) followed by a 33 right bit shift. The value of 2863311531 is calculated as 233/3, then rounded up. Likewise, division by 10 can be expressed as a multiplication by 3435973837 (0xCCCCCCCD) followed by division by 235 (or 35 right bit shift).: p230-234 OEIS provides sequences of the constants for multiplication as A346495 and for the right shift as A346496.
For general x-bit unsigned integer division where the divisor D is not a power of 2, the following identity converts the division into two x-bit addition/subtraction, one x-bit by x-bit multiplication (where only the upper half of the result is used) and several shifts, after precomputing and
:
In some cases, division by a constant can be accomplished in even less time by converting the "multiply by a constant" into a series of shifts and adds or subtracts. Of particular interest is division by 10, for which the exact quotient is obtained, with remainder if required.
Rounding error
This section needs expansion. You can help by adding to it. (September 2012) |
When a division operation is performed, the exact quotient and remainder
are approximated to fit within the computer’s precision limits. The Division Algorithm states:
where .
In floating-point arithmetic, the quotient is represented as
and the remainder
as
, introducing rounding errors
and
:
This rounding causes a small error, which can propagate and accumulate through subsequent calculations. Such errors are particularly pronounced in iterative processes and when subtracting nearly equal values - is told loss of significance. To mitigate these errors, techniques such as the use of guard digits or are employed.
See also
- Galley division
- Multiplication algorithm
- Pentium FDIV bug
Notes
- Despite how "little" problem the optimization causes, this reciprocal optimization is still usually hidden behind a "fast math" flag in modern compilers as it is inexact.
- Modern compilers commonly perform this integer multiply-and-shift optimization; for a constant only known at run-time, however, the program must implement the optimization itself.
References
- Rodeheffer, Thomas L. (2008-08-26). Software Integer Division (PDF) (Technical report). Microsoft Research, Silicon Valley.
- Morris, James E.; Iniewski, Krzysztof (2017-11-22). Nanoelectronic Device Applications Handbook. CRC Press. ISBN 978-1-351-83197-0.
- Shaw, Robert F. (1950). "Arithmetic Operations in a Binary Computer". Review of Scientific Instruments. 21 (8): 690. Bibcode:1950RScI...21..687S. doi:10.1063/1.1745692. ISSN 0034-6748. Archived from the original on 2022-02-28. Retrieved 2022-02-28.
- Flynn. "Stanford EE486 (Advanced Computer Arithmetic Division) – Chapter 5 Handout (Division)" (PDF). Stanford University. Archived (PDF) from the original on 2022-04-18. Retrieved 2019-06-24.
- Harris, David L.; Oberman, Stuart F.; Horowitz, Mark A. (9 September 1998). SRT Division: Architectures, Models, and Implementations (PDF) (Technical report). Stanford University. Archived (PDF) from the original on 24 December 2016. Retrieved 23 December 2016.
- McCann, Mark; Pippenger, Nicholas (2005). "SRT Division Algorithms as Dynamical Systems". SIAM Journal on Computing. 34 (6): 1279–1301. CiteSeerX 10.1.1.72.6993. doi:10.1137/S009753970444106X. hdl:2429/12179. Archived from the original on 2022-08-24. Retrieved 2022-08-24.
- Cocke, John; Sweeney, D.W. (11 February 1957), High speed arithmetic in a parallel device (Company Memo), IBM, p. 20, archived from the original on 24 August 2022, retrieved 24 August 2022
{{citation}}
: CS1 maint: location missing publisher (link) - Robertson, James (1958-09-01). "A New Class of Digital Division Methods". IRE Transactions on Electronic Computers. EC-7 (3). IEEE: 218–222. doi:10.1109/TEC.1958.5222579. hdl:2027/uiuo.ark:/13960/t0gt7529c. Archived from the original on 2022-08-24. Retrieved 2022-08-24.
- Tocher, K.D. (1958-01-01). "Techniques of Multiplication and Division for Automatic Binary Computers". The Quarterly Journal of Mechanics and Applied Mathematics. 11 (3): 364–384. doi:10.1093/qjmam/11.3.364. Archived from the original on 2022-08-24. Retrieved 2022-08-24.
- "Statistical Analysis of Floating Point Flaw". Intel Corporation. 1994. Archived from the original on 23 October 2013. Retrieved 22 October 2013.
- Oberman, Stuart F.; Flynn, Michael J. (July 1995). An Analysis of Division Algorithms and Implementations (PDF) (Technical report). Stanford University. CSL-TR-95-675. Archived (PDF) from the original on 2017-05-17. Retrieved 2016-12-23.
- Shirriff, Ken (28 Dec 2024). "Intel's $475 million error: the silicon behind the Pentium division bug". Righto. Retrieved 30 Dec 2024.
- Ercegovac, Miloš D.; Lang, Tomás (2004). "Chapter 7: Reciprocal. Division, Reciprocal Square Root, and Square Root by Iterative Approximation". Digital Arithmetic. Morgan Kaufmann. pp. 367–395. ISBN 1-55860-798-6.
- Goldschmidt, Robert E. (1964). Applications of Division by Convergence (PDF) (Thesis). M.Sc. dissertation. M.I.T. OCLC 34136725. Archived (PDF) from the original on 2015-12-10. Retrieved 2015-09-15.
- "Authors". IBM Journal of Research and Development. 11: 125–127. 1967. doi:10.1147/rd.111.0125. Archived from the original on 18 July 2018.
- Oberman, Stuart F. (1999). "Floating point division and square root algorithms and implementation in the AMD-K7 Microprocessor" (PDF). Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336). pp. 106–115. doi:10.1109/ARITH.1999.762835. ISBN 0-7695-0116-8. S2CID 12793819. Archived (PDF) from the original on 2015-11-29. Retrieved 2015-09-15.
- Soderquist, Peter; Leeser, Miriam (July–August 1997). "Division and Square Root: Choosing the Right Implementation". IEEE Micro. 17 (4): 56–66. doi:10.1109/40.612224.
- S. F. Anderson, J. G. Earle, R. E. Goldschmidt, D. M. Powers. The IBM 360/370 model 91: floating-point execution unit, IBM Journal of Research and Development, January 1997
- Guy, Even; Peter, Siedel; Ferguson, Warren (1 February 2005). "A parametric error analysis of Goldschmidt's division algorithm". Journal of Computer and System Sciences. 70 (1): 118–139. doi:10.1016/j.jcss.2004.08.004.
- Hasselström, Karl (2003). Fast Division of Large Integers: A Comparison of Algorithms (PDF) (M.Sc. in Computer Science thesis). Royal Institute of Technology. Archived from the original (PDF) on 8 July 2017. Retrieved 2017-07-08.
- Joachim Ziegler, Christoph Burnikel (1998), Fast Recursive Division, Max-Planck-Institut für Informatik, archived from the original on 2011-04-26, retrieved 2021-09-10
{{citation}}
: CS1 maint: location missing publisher (link) - Barrett, Paul (1987). "Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor". Proceedings on Advances in cryptology---CRYPTO '86. London, UK: Springer-Verlag. pp. 311–323. ISBN 0-387-18047-8.
- Granlund, Torbjörn; Montgomery, Peter L. (June 1994). "Division by Invariant Integers using Multiplication" (PDF). SIGPLAN Notices. 29 (6): 61–72. CiteSeerX 10.1.1.1.2556. doi:10.1145/773473.178249. Archived (PDF) from the original on 2019-06-06. Retrieved 2015-12-08.
- Möller, Niels; Granlund, Torbjörn (February 2011). "Improved Division by Invariant Integers" (PDF). IEEE Transactions on Computers. 60 (2): 165–175. doi:10.1109/TC.2010.143. S2CID 13347152. Archived (PDF) from the original on 2015-12-22. Retrieved 2015-12-08.
- ridiculous_fish. "Labor of Division (Episode III): Faster Unsigned Division by Constants" Archived 2022-01-08 at the Wayback Machine. 2011.
- ridiculous_fish. "libdivide, optimized integer division". Archived from the original on 23 November 2021. Retrieved 6 July 2021.
- Warren Jr., Henry S. (2013). Hacker's Delight (2 ed.). Addison Wesley - Pearson Education, Inc. ISBN 978-0-321-84268-8.
- LaBudde, Robert A.; Golovchenko, Nikolai; Newton, James; and Parker, David; Massmind: "Binary Division by a Constant" Archived 2022-01-09 at the Wayback Machine
- Vowels, R. A. (1992). "Division by 10". Australian Computer Journal. 24 (3): 81–85.
- L. Popyack, Jeffrey (June 2000). "Rounding Error". Drexel University.
- "9. Machine Numbers, Rounding Error and Error Propagation". College of Charleston. 8 February 2021.
Further reading
- Savard, John J. G. (2018) [2006]. "Advanced Arithmetic Techniques". quadibloc. Archived from the original on 2018-07-03. Retrieved 2018-07-16.
A division algorithm is an algorithm which given two integers N and D respectively the numerator and the denominator computes their quotient and or remainder the result of Euclidean division Some are applied by hand while others are employed by digital circuit designs and software Division algorithms fall into two main categories slow division and fast division Slow division algorithms produce one digit of the final quotient per iteration Examples of slow division include restoring non performing restoring non restoring and SRT division Fast division methods start with a close approximation to the final quotient and produce twice as many digits of the final quotient on each iteration Newton Raphson and Goldschmidt algorithms fall into this category Variants of these algorithms allow using fast multiplication algorithms It results that for large integers the computer time needed for a division is the same up to a constant factor as the time needed for a multiplication whichever multiplication algorithm is used Discussion will refer to the form N D Q R displaystyle N D Q R where N numerator dividend D denominator divisor is the input and Q quotient R remainder is the output Division by repeated subtractionThe simplest division algorithm historically incorporated into a greatest common divisor algorithm presented in Euclid s Elements Book VII Proposition 1 finds the remainder given two positive integers using only subtractions and comparisons R N Q 0 while R D do R R D Q Q 1 end return Q R The proof that the quotient and remainder exist and are unique described at Euclidean division gives rise to a complete division algorithm applicable to both negative and positive numbers using additions subtractions and comparisons function divide N D if D 0 then error DivisionByZero end if D lt 0 then Q R divide N D return Q R end if N lt 0 then Q R divide N D if R 0 then return Q 0 else return Q 1 D R end end At this point N 0 and D gt 0 return divide unsigned N D end function divide unsigned N D Q 0 R N while R D do Q Q 1 R R D end return Q R end This procedure always produces R 0 Although very simple it takes W Q steps and so is exponentially slower than even slow division algorithms like long division It is useful if Q is known to be small being an output sensitive algorithm and can serve as an executable specification Long divisionLong division is the standard algorithm used for pen and paper division of multi digit numbers expressed in decimal notation It shifts gradually from the left to the right end of the dividend subtracting the largest possible multiple of the divisor at the digit level at each stage the multiples then become the digits of the quotient and the final difference is then the remainder When used with a binary radix this method forms the basis for the unsigned integer division with remainder algorithm below Short division is an abbreviated form of long division suitable for one digit divisors Chunking also known as the partial quotients method or the hangman method is a less efficient form of long division which may be easier to understand By allowing one to subtract more multiples than what one currently has at each stage a more freeform variant of long division can be developed as well Integer division unsigned with remainder The following algorithm the binary version of the famous long division will divide N by D placing the quotient in Q and the remainder in R In the following pseudo code all values are treated as unsigned integers if D 0 then error DivisionByZeroException end Q 0 Initialize quotient and remainder to zero R 0 for i n 1 0 do Where n is number of bits in N R R lt lt 1 Left shift R by 1 bit R 0 N i Set the least significant bit of R equal to bit i of the numerator if R D then R R D Q i 1 end end Example If we take N 11002 1210 and D 1002 410 Step 1 Set R 0 and Q 0 Step 2 Take i 3 one less than the number of bits in N Step 3 R 00 left shifted by 1 Step 4 R 01 setting R 0 to N i Step 5 R lt D so skip statement Step 2 Set i 2 Step 3 R 010 Step 4 R 011 Step 5 R lt D statement skipped Step 2 Set i 1 Step 3 R 0110 Step 4 R 0110 Step 5 R gt D statement entered Step 5b R 10 R D Step 5c Q 10 setting Q i to 1 Step 2 Set i 0 Step 3 R 100 Step 4 R 100 Step 5 R gt D statement entered Step 5b R 0 R D Step 5c Q 11 setting Q i to 1 end Q 112 310 and R 0 Slow division methodsSlow division methods are all based on a standard recurrence equation Rj 1 B Rj qn j 1 D displaystyle R j 1 B times R j q n j 1 times D where Rj is the j th partial remainder of the division B is the radix base usually 2 internally in computers and calculators qn j 1 is the digit of the quotient in position n j 1 where the digit positions are numbered from least significant 0 to most significant n 1 n is number of digits in the quotient D is the divisorRestoring division Restoring division operates on fixed point fractional numbers and depends on the assumption 0 lt D lt N citation needed The quotient digits q are formed from the digit set 0 1 The basic algorithm for binary radix 2 restoring division is R N D D lt lt n R and D need twice the word width of N and Q for i n 1 0 do For example 31 0 for 32 bits R 2 R D Trial subtraction from shifted value multiplication by 2 is a shift in binary representation if R gt 0 then q i 1 Result bit 1 else q i 0 Result bit 0 R R D New partial remainder is restored shifted value end end Where N numerator D denominator n bits R partial remainder q i bit i of quotient Non performing restoring division is similar to restoring division except that the value of 2R is saved so D does not need to be added back in for the case of R lt 0 Non restoring division Non restoring division uses the digit set 1 1 for the quotient digits instead of 0 1 The algorithm is more complex but has the advantage when implemented in hardware that there is only one decision and addition subtraction per quotient bit there is no restoring step after the subtraction which potentially cuts down the numbers of operations by up to half and lets it be executed faster The basic algorithm for binary radix 2 non restoring division of non negative numbers is verification needed R N D D lt lt n R and D need twice the word width of N and Q for i n 1 0 do for example 31 0 for 32 bits if R gt 0 then q i 1 R 2 R D else q i 1 R 2 R D end if end Note N numerator D denominator n bits R partial remainder q i bit i of quotient Following this algorithm the quotient is in a non standard form consisting of digits of 1 and 1 This form needs to be converted to binary to form the final quotient Example Convert the following quotient to the digit set 0 1 Start Q 1111 11 11 displaystyle Q 111 bar 1 1 bar 1 1 bar 1 1 Form the positive term P 11101010 displaystyle P 11101010 2 Mask the negative term M 00010101 displaystyle M 00010101 3 Subtract P M displaystyle P M Q 11010101 displaystyle Q 11010101 Signed binary notation with ones complement without two s complement If the 1 digits of Q displaystyle Q are stored as zeros 0 as is common then P displaystyle P is Q displaystyle Q and computing M displaystyle M is trivial perform a ones complement bit by bit complement on the original Q displaystyle Q Q Q bit bnot Q Appropriate if 1 digits in Q are represented as zeros as is common Finally quotients computed by this algorithm are always odd and the remainder in R is in the range D R lt D For example 5 2 3 R 1 To convert to a positive remainder do a single restoring step after Q is converted from non standard form to standard form if R lt 0 then Q Q 1 R R D Needed only if the remainder is of interest end if The actual remainder is R gt gt n As with restoring division the low order bits of R are used up at the same rate as bits of the quotient Q are produced and it is common to use a single shift register for both SRT division SRT division is a popular method for division in many microprocessor implementations The algorithm is named after D W Sweeney of IBM James E Robertson of University of Illinois and K D Tocher of Imperial College London They all developed the algorithm independently at approximately the same time published in February 1957 September 1958 and January 1958 respectively SRT division is similar to non restoring division but it uses a lookup table based on the dividend and the divisor to determine each quotient digit The most significant difference is that a redundant representation is used for the quotient For example when implementing radix 4 SRT division each quotient digit is chosen from five possibilities 2 1 0 1 2 Because of this the choice of a quotient digit need not be perfect later quotient digits can correct for slight errors For example the quotient digit pairs 0 2 and 1 2 are equivalent since 0 4 2 1 4 2 This tolerance allows quotient digits to be selected using only a few most significant bits of the dividend and divisor rather than requiring a full width subtraction This simplification in turn allows a radix higher than 2 to be used Like non restoring division the final steps are a final full width subtraction to resolve the last quotient bit and conversion of the quotient to standard binary form The Intel Pentium processor s infamous floating point division bug was caused by an incorrectly coded lookup table Five of the 1066 entries had been mistakenly omitted Fast division methodsNewton Raphson division Newton Raphson uses Newton s method to find the reciprocal of D displaystyle D and multiply that reciprocal by N displaystyle N to find the final quotient Q displaystyle Q The steps of Newton Raphson division are Calculate an estimate X0 displaystyle X 0 for the reciprocal 1 D displaystyle 1 D of the divisor D displaystyle D Compute successively more accurate estimates X1 X2 XS displaystyle X 1 X 2 ldots X S of the reciprocal This is where one employs the Newton Raphson method as such Compute the quotient by multiplying the dividend by the reciprocal of the divisor Q NXS displaystyle Q NX S In order to apply Newton s method to find the reciprocal of D displaystyle D it is necessary to find a function f X displaystyle f X that has a zero at X 1 D displaystyle X 1 D The obvious such function is f X DX 1 displaystyle f X DX 1 but the Newton Raphson iteration for this is unhelpful since it cannot be computed without already knowing the reciprocal of D displaystyle D moreover it attempts to compute the exact reciprocal in one step rather than allow for iterative improvements A function that does work is f X 1 X D displaystyle f X 1 X D for which the Newton Raphson iteration gives Xi 1 Xi f Xi f Xi Xi 1 Xi D 1 Xi2 Xi Xi 1 DXi Xi 2 DXi displaystyle X i 1 X i f X i over f X i X i 1 X i D over 1 X i 2 X i X i 1 DX i X i 2 DX i which can be calculated from Xi displaystyle X i using only multiplication and subtraction or using two fused multiply adds From a computation point of view the expressions Xi 1 Xi Xi 1 DXi displaystyle X i 1 X i X i 1 DX i and Xi 1 Xi 2 DXi displaystyle X i 1 X i 2 DX i are not equivalent To obtain a result with a precision of 2n bits while making use of the second expression one must compute the product between Xi displaystyle X i and 2 DXi displaystyle 2 DX i with double the given precision of Xi displaystyle X i n bits citation needed In contrast the product between Xi displaystyle X i and 1 DXi displaystyle 1 DX i need only be computed with a precision of n bits because the leading n bits after the binary point of 1 DXi displaystyle 1 DX i are zeros If the error is defined as ei 1 DXi displaystyle varepsilon i 1 DX i then ei 1 1 DXi 1 1 D Xi 2 DXi 1 2DXi D2Xi2 1 DXi 2 ei2 displaystyle begin aligned varepsilon i 1 amp 1 DX i 1 amp 1 D X i 2 DX i amp 1 2DX i D 2 X i 2 amp 1 DX i 2 amp varepsilon i 2 end aligned This squaring of the error at each iteration step the so called quadratic convergence of Newton Raphson s method has the effect that the number of correct digits in the result roughly doubles for every iteration a property that becomes extremely valuable when the numbers involved have many digits e g in the large integer domain But it also means that the initial convergence of the method can be comparatively slow especially if the initial estimate X0 displaystyle X 0 is poorly chosen Initial estimate For the subproblem of choosing an initial estimate X0 displaystyle X 0 it is convenient to apply a bit shift to the divisor D to scale it so that 0 5 D 1 Applying the same bit shift to the numerator N ensures the quotient does not change Once within a bounded range a simple polynomial approximation can be used to find an initial estimate The linear approximation with mimimum worst case absolute error on interval the interval 0 5 1 displaystyle 0 5 1 is X0 4817 3217D displaystyle X 0 48 over 17 32 over 17 D The coefficients of the linear approximation T0 T1D displaystyle T 0 T 1 D are determined as follows The absolute value of the error is e0 1 D T0 T1D displaystyle varepsilon 0 1 D T 0 T 1 D The minimum of the maximum absolute value of the error is determined by the Chebyshev equioscillation theorem applied to F D 1 D T0 T1D displaystyle F D 1 D T 0 T 1 D The local minimum of F D displaystyle F D occurs when F D 0 displaystyle F D 0 which has solution D T0 2T1 displaystyle D T 0 2T 1 The function at that minimum must be of opposite sign as the function at the endpoints namely F 1 2 F 1 F T0 2T1 displaystyle F 1 2 F 1 F T 0 2T 1 The two equations in the two unknowns have a unique solution T0 48 17 displaystyle T 0 48 17 and T1 32 17 displaystyle T 1 32 17 and the maximum error is F 1 1 17 displaystyle F 1 1 17 Using this approximation the absolute value of the error of the initial value is less than e0 117 0 059 displaystyle vert varepsilon 0 vert leq 1 over 17 approx 0 059 The best quadratic fit to 1 D displaystyle 1 D in the interval is X 14033 6411D 25699D2 displaystyle X frac 140 33 frac 64 11 D frac 256 99 D 2 It is chosen to make the error equal to a re scaled third order Chebyshev polynomial of the first kind and gives an absolute value of the error less than or equal to 1 99 This improvement is equivalent to log2 log 99 log 17 0 7 displaystyle log 2 log 99 log 17 approx 0 7 Newton Raphson iterations at a computational cost of less than one iteration It is possible to generate a polynomial fit of degree larger than 2 computing the coefficients using the Remez algorithm The trade off is that the initial guess requires more computational cycles but hopefully in exchange for fewer iterations of Newton Raphson Since for this method the convergence is exactly quadratic it follows that from an initial error e0 displaystyle varepsilon 0 S displaystyle S iterations will give an answer accurate to P 2Slog2 e0 1 2Slog2 1 e0 1 displaystyle P 2 S log 2 varepsilon 0 1 2 S log 2 1 varepsilon 0 1 binary places Typical values are Binary digits of reciprocal accuracy e0 displaystyle varepsilon 0 Iterations0 1 2 3 41 17 displaystyle 1 17 3 09 7 17 15 35 31 70 64 401 99 displaystyle 1 99 5 63 12 26 25 52 52 03 105 07 A quadratic initial estimate plus two iterations is accurate enough for IEEE single precision but three iterations are marginal for double precision A linear initial estimate plus four iterations is sufficient for both double and double extended formats Pseudocode The following computes the quotient of N and D with a precision of P binary places Express D as M 2e where 1 M lt 2 standard floating point representation D D 2e 1 scale between 0 5 and 1 can be performed with bit shift exponent subtraction N N 2e 1 X 48 17 32 17 D precompute constants with same precision as D repeat log2 P 1log2 17 displaystyle left lceil log 2 frac P 1 log 2 17 right rceil times can be precomputed based on fixed P X X X 1 D X end return N X For example for a double precision floating point division this method uses 10 multiplies 9 adds and 2 shifts Cubic iteration There is an iteration which uses three multiplications to cube the error ei 1 DXi displaystyle varepsilon i 1 DX i Yi Xiei displaystyle Y i X i varepsilon i Xi 1 Xi Yi Yiei displaystyle X i 1 X i Y i Y i varepsilon i The Yiei term is new Expanding out the above Xi 1 displaystyle X i 1 can be written as Xi 1 Xi Xiei Xiei2 Xi Xi 1 DXi Xi 1 DXi 2 3Xi 3DXi2 D2Xi3 displaystyle begin aligned X i 1 amp X i X i varepsilon i X i varepsilon i 2 amp X i X i 1 DX i X i 1 DX i 2 amp 3X i 3DX i 2 D 2 X i 3 end aligned with the result that the error term ei 1 1 DXi 1 1 3DXi 3D2Xi2 D3Xi3 1 DXi 3 ei3 displaystyle begin aligned varepsilon i 1 amp 1 DX i 1 amp 1 3DX i 3D 2 X i 2 D 3 X i 3 amp 1 DX i 3 amp varepsilon i 3 end aligned This is 3 2 the computation of the quadratic iteration but achieves log 3 log 2 1 585 displaystyle log 3 log 2 approx 1 585 as much convergence so is slightly more efficient Put another way two iterations of this method raise the error to the ninth power at the same computational cost as three quadratic iterations which only raise the error to the eighth power The number of correct bits after S displaystyle S iterations is P 3Slog2 e0 1 3Slog2 1 e0 1 displaystyle P 3 S log 2 varepsilon 0 1 3 S log 2 1 varepsilon 0 1 binary places Typical values are Bits of reciprocal accuracy e0 displaystyle varepsilon 0 Iterations0 1 2 31 17 displaystyle 1 17 3 09 11 26 35 79 109 361 99 displaystyle 1 99 5 63 18 89 58 66 177 99 A quadratic initial estimate plus two cubic iterations provides ample precision for an IEEE double precision result It is also possible to use a mixture of quadratic and cubic iterations Using at least one quadratic iteration ensures that the error is positive i e the reciprocal is underestimated 370 This can simplify a following rounding step if an exactly rounded quotient is required Using higher degree polynomials in either the initialization or the iteration results in a degradation of performance because the extra multiplications required would be better spent on doing more iterations citation needed Goldschmidt division Goldschmidt division after Robert Elliott Goldschmidt uses an iterative process of repeatedly multiplying both the dividend and divisor by a common factor Fi chosen such that the divisor converges to 1 This causes the dividend to converge to the sought quotient Q Q NDF1F1F2F2F F displaystyle Q frac N D frac F 1 F 1 frac F 2 F 2 frac F ldots F ldots The steps for Goldschmidt division are Generate an estimate for the multiplication factor Fi Multiply the dividend and divisor by Fi If the divisor is sufficiently close to 1 return the dividend otherwise loop to step 1 Assuming N D has been scaled so that 0 lt D lt 1 each Fi is based on D Fi 1 2 Di displaystyle F i 1 2 D i Multiplying the dividend and divisor by the factor yields Ni 1Di 1 NiDiFi 1Fi 1 displaystyle frac N i 1 D i 1 frac N i D i frac F i 1 F i 1 After a sufficient number k of iterations Q Nk displaystyle Q N k The Goldschmidt method is used in AMD Athlon CPUs and later models It is also known as Anderson Earle Goldschmidt Powers AEGP algorithm and is implemented by various IBM processors Although it converges at the same rate as a Newton Raphson implementation one advantage of the Goldschmidt method is that the multiplications in the numerator and in the denominator can be done in parallel Binomial theorem The Goldschmidt method can be used with factors that allow simplifications by the binomial theorem Assume N D displaystyle N D has been scaled by a power of two such that D 12 1 displaystyle D in left tfrac 1 2 1 right We choose D 1 x displaystyle D 1 x and Fi 1 x2i displaystyle F i 1 x 2 i This yields N1 x N 1 x 1 x2 N 1 x 1 x2 1 x4 Q N N 1 x 1 x2 1 x2 n 1 D 1 x2n 1 displaystyle frac N 1 x frac N cdot 1 x 1 x 2 frac N cdot 1 x cdot 1 x 2 1 x 4 cdots Q frac N N cdot 1 x cdot 1 x 2 cdot cdot cdot 1 x 2 n 1 D 1 x 2 n approx 1 After n steps x 0 12 displaystyle left x in left 0 tfrac 1 2 right right the denominator 1 x2n displaystyle 1 x 2 n can be rounded to 1 with a relative error en Q N Q x2n displaystyle varepsilon n frac Q N Q x 2 n which is maximum at 2 2n displaystyle 2 2 n when x 12 displaystyle x tfrac 1 2 thus providing a minimum precision of 2n displaystyle 2 n binary digits Large integer methodsMethods designed for hardware implementation generally do not scale to integers with thousands or millions of decimal digits these frequently occur for example in modular reductions in cryptography For these large integers more efficient division algorithms transform the problem to use a small number of multiplications which can then be done using an asymptotically efficient multiplication algorithm such as the Karatsuba algorithm Toom Cook multiplication or the Schonhage Strassen algorithm The result is that the computational complexity of the division is of the same order up to a multiplicative constant as that of the multiplication Examples include reduction to multiplication by Newton s method as described above as well as the slightly faster Barrett reduction and Montgomery reduction algorithms verification needed Newton s method is particularly efficient in scenarios where one must divide by the same divisor many times since after the initial Newton inversion only one truncated multiplication is needed for each division Division by a constantThe division by a constant D is equivalent to the multiplication by its reciprocal Since the denominator is constant so is its reciprocal 1 D Thus it is possible to compute the value of 1 D once at compile time and at run time perform the multiplication N 1 D rather than the division N D In floating point arithmetic the use of 1 D presents little problem but in integer arithmetic the reciprocal will always evaluate to zero assuming D gt 1 It is not necessary to use specifically 1 D any value X Y that reduces to 1 D may be used For example for division by 3 the factors 1 3 2 6 3 9 or 194 582 could be used Consequently if Y were a power of two the division step would reduce to a fast right bit shift The effect of calculating N D as N X Y replaces a division with a multiply and a shift Note that the parentheses are important as N X Y will evaluate to zero However unless D itself is a power of two there is no X and Y that satisfies the conditions above Fortunately N X Y gives exactly the same result as N D in integer arithmetic even when X Y is not exactly equal to 1 D but close enough that the error introduced by the approximation is in the bits that are discarded by the shift operation Barrett reduction uses powers of 2 for the value of Y to make division by Y a simple right shift As a concrete fixed point arithmetic example for 32 bit unsigned integers division by 3 can be replaced with a multiply by 2863311531 233 a multiplication by 2863311531 hexadecimal 0xAAAAAAAB followed by a 33 right bit shift The value of 2863311531 is calculated as 233 3 then rounded up Likewise division by 10 can be expressed as a multiplication by 3435973837 0xCCCCCCCD followed by division by 235 or 35 right bit shift p230 234 OEIS provides sequences of the constants for multiplication as A346495 and for the right shift as A346496 For general x bit unsigned integer division where the divisor D is not a power of 2 the following identity converts the division into two x bit addition subtraction one x bit by x bit multiplication where only the upper half of the result is used and several shifts after precomputing k x log2 D displaystyle k x lceil log 2 D rceil and a 2kD 2x displaystyle a left lceil frac 2 k D right rceil 2 x ND N b2 b2k x 1 where b Na2x displaystyle left lfloor frac N D right rfloor left lfloor frac left lfloor frac N b 2 right rfloor b 2 k x 1 right rfloor text where b left lfloor frac Na 2 x right rfloor In some cases division by a constant can be accomplished in even less time by converting the multiply by a constant into a series of shifts and adds or subtracts Of particular interest is division by 10 for which the exact quotient is obtained with remainder if required Rounding errorThis section needs expansion You can help by adding to it September 2012 When a division operation is performed the exact quotient q displaystyle q and remainder r displaystyle r are approximated to fit within the computer s precision limits The Division Algorithm states a bq r displaystyle a bq r where 0 r lt b displaystyle 0 leq r lt b In floating point arithmetic the quotient q displaystyle q is represented as q displaystyle tilde q and the remainder r displaystyle r as r displaystyle tilde r introducing rounding errors ϵq displaystyle epsilon q ϵq displaystyle epsilon q and ϵr displaystyle epsilon r q q ϵq r r ϵr displaystyle tilde q q epsilon q tilde r r epsilon r This rounding causes a small error which can propagate and accumulate through subsequent calculations Such errors are particularly pronounced in iterative processes and when subtracting nearly equal values is told loss of significance To mitigate these errors techniques such as the use of guard digits or are employed See alsoGalley division Multiplication algorithm Pentium FDIV bugNotesDespite how little problem the optimization causes this reciprocal optimization is still usually hidden behind a fast math flag in modern compilers as it is inexact Modern compilers commonly perform this integer multiply and shift optimization for a constant only known at run time however the program must implement the optimization itself ReferencesRodeheffer Thomas L 2008 08 26 Software Integer Division PDF Technical report Microsoft Research Silicon Valley Morris James E Iniewski Krzysztof 2017 11 22 Nanoelectronic Device Applications Handbook CRC Press ISBN 978 1 351 83197 0 Shaw Robert F 1950 Arithmetic Operations in a Binary Computer Review of Scientific Instruments 21 8 690 Bibcode 1950RScI 21 687S doi 10 1063 1 1745692 ISSN 0034 6748 Archived from the original on 2022 02 28 Retrieved 2022 02 28 Flynn Stanford EE486 Advanced Computer Arithmetic Division Chapter 5 Handout Division PDF Stanford University Archived PDF from the original on 2022 04 18 Retrieved 2019 06 24 Harris David L Oberman Stuart F Horowitz Mark A 9 September 1998 SRT Division Architectures Models and Implementations PDF Technical report Stanford University Archived PDF from the original on 24 December 2016 Retrieved 23 December 2016 McCann Mark Pippenger Nicholas 2005 SRT Division Algorithms as Dynamical Systems SIAM Journal on Computing 34 6 1279 1301 CiteSeerX 10 1 1 72 6993 doi 10 1137 S009753970444106X hdl 2429 12179 Archived from the original on 2022 08 24 Retrieved 2022 08 24 Cocke John Sweeney D W 11 February 1957 High speed arithmetic in a parallel device Company Memo IBM p 20 archived from the original on 24 August 2022 retrieved 24 August 2022 a href wiki Template Citation title Template Citation citation a CS1 maint location missing publisher link Robertson James 1958 09 01 A New Class of Digital Division Methods IRE Transactions on Electronic Computers EC 7 3 IEEE 218 222 doi 10 1109 TEC 1958 5222579 hdl 2027 uiuo ark 13960 t0gt7529c Archived from the original on 2022 08 24 Retrieved 2022 08 24 Tocher K D 1958 01 01 Techniques of Multiplication and Division for Automatic Binary Computers The Quarterly Journal of Mechanics and Applied Mathematics 11 3 364 384 doi 10 1093 qjmam 11 3 364 Archived from the original on 2022 08 24 Retrieved 2022 08 24 Statistical Analysis of Floating Point Flaw Intel Corporation 1994 Archived from the original on 23 October 2013 Retrieved 22 October 2013 Oberman Stuart F Flynn Michael J July 1995 An Analysis of Division Algorithms and Implementations PDF Technical report Stanford University CSL TR 95 675 Archived PDF from the original on 2017 05 17 Retrieved 2016 12 23 Shirriff Ken 28 Dec 2024 Intel s 475 million error the silicon behind the Pentium division bug Righto Retrieved 30 Dec 2024 Ercegovac Milos D Lang Tomas 2004 Chapter 7 Reciprocal Division Reciprocal Square Root and Square Root by Iterative Approximation Digital Arithmetic Morgan Kaufmann pp 367 395 ISBN 1 55860 798 6 Goldschmidt Robert E 1964 Applications of Division by Convergence PDF Thesis M Sc dissertation M I T OCLC 34136725 Archived PDF from the original on 2015 12 10 Retrieved 2015 09 15 Authors IBM Journal of Research and Development 11 125 127 1967 doi 10 1147 rd 111 0125 Archived from the original on 18 July 2018 Oberman Stuart F 1999 Floating point division and square root algorithms and implementation in the AMD K7 Microprocessor PDF Proceedings 14th IEEE Symposium on Computer Arithmetic Cat No 99CB36336 pp 106 115 doi 10 1109 ARITH 1999 762835 ISBN 0 7695 0116 8 S2CID 12793819 Archived PDF from the original on 2015 11 29 Retrieved 2015 09 15 Soderquist Peter Leeser Miriam July August 1997 Division and Square Root Choosing the Right Implementation IEEE Micro 17 4 56 66 doi 10 1109 40 612224 S F Anderson J G Earle R E Goldschmidt D M Powers The IBM 360 370 model 91 floating point execution unit IBM Journal of Research and Development January 1997 Guy Even Peter Siedel Ferguson Warren 1 February 2005 A parametric error analysis of Goldschmidt s division algorithm Journal of Computer and System Sciences 70 1 118 139 doi 10 1016 j jcss 2004 08 004 Hasselstrom Karl 2003 Fast Division of Large Integers A Comparison of Algorithms PDF M Sc in Computer Science thesis Royal Institute of Technology Archived from the original PDF on 8 July 2017 Retrieved 2017 07 08 Joachim Ziegler Christoph Burnikel 1998 Fast Recursive Division Max Planck Institut fur Informatik archived from the original on 2011 04 26 retrieved 2021 09 10 a href wiki Template Citation title Template Citation citation a CS1 maint location missing publisher link Barrett Paul 1987 Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor Proceedings on Advances in cryptology CRYPTO 86 London UK Springer Verlag pp 311 323 ISBN 0 387 18047 8 Granlund Torbjorn Montgomery Peter L June 1994 Division by Invariant Integers using Multiplication PDF SIGPLAN Notices 29 6 61 72 CiteSeerX 10 1 1 1 2556 doi 10 1145 773473 178249 Archived PDF from the original on 2019 06 06 Retrieved 2015 12 08 Moller Niels Granlund Torbjorn February 2011 Improved Division by Invariant Integers PDF IEEE Transactions on Computers 60 2 165 175 doi 10 1109 TC 2010 143 S2CID 13347152 Archived PDF from the original on 2015 12 22 Retrieved 2015 12 08 ridiculous fish Labor of Division Episode III Faster Unsigned Division by Constants Archived 2022 01 08 at the Wayback Machine 2011 ridiculous fish libdivide optimized integer division Archived from the original on 23 November 2021 Retrieved 6 July 2021 Warren Jr Henry S 2013 Hacker s Delight 2 ed Addison Wesley Pearson Education Inc ISBN 978 0 321 84268 8 LaBudde Robert A Golovchenko Nikolai Newton James and Parker David Massmind Binary Division by a Constant Archived 2022 01 09 at the Wayback Machine Vowels R A 1992 Division by 10 Australian Computer Journal 24 3 81 85 L Popyack Jeffrey June 2000 Rounding Error Drexel University 9 Machine Numbers Rounding Error and Error Propagation College of Charleston 8 February 2021 Further readingSavard John J G 2018 2006 Advanced Arithmetic Techniques quadibloc Archived from the original on 2018 07 03 Retrieved 2018 07 16