FITUG e.V.

Förderverein Informationstechnik und Gesellschaft

Re: Bernstein's NFS machine

------- Forwarded message follows ------- From: Nomen Nescio <nobody@dizum.com> To: cryptography@wasabisystems.com Subject: Re: Bernstein's NFS machine Date sent: Sat, 2 Mar 2002 21:20:26 +0100 (CET)

More analysis of Dan Bernstein's factoring machine from http://cr.yp.to/papers.html#nfscircuit.

The NFS algorithm has two phases. The first searches for coefficients (a,b) from some interval which are relatively prime and which satisfy two smoothness bounds. The smoothness is with respect to a factor base consisting of all primes less than some bound. (Actually there may be two or more bounds used for different tests, but that complication will be ignored here.) Each success produces one relation and will be one row of a matrix whose columns correspond to the primes in the factor base.

Once enough relations are collected (more rows than columns) the second phase begins. This involves looking for linear dependencies among the rows of the matrix, where the math is in GF(2). Once a dependency is found, the corresponding rows can be combined to produce a value with two different square roots, which with good probability will lead to a factorization of n.

Dan Bernstein proposes methods to significantly reduce the cost (defined as space times time) of both phases.

L is defined as exp(log(n)^(1/3) * log(log(n))^(2/3)), where n is the number to be factored. It is a key parameter for the work cost of the NFS class of algorithms.

n L ------------- 2^512 2^33 2^1024 2^45 2^1536 2^54 2^2048 2^61

Let E represent the bound on the (a,b) coefficient values which are searched. Then E^2 values will be checked for relative primality and smoothness. Let B represent the bounds on the factor base. A smooth value is one for which all of its factors are less than B.

In previous work, sieving is used to test for smoothness, which is in part where the Number Field Sieve algorithm gets its name. Sieving E^2 values over a factor base of size B takes roughly E^2 log log B time and B space. We will ignore the log log B factor and just say E^2 and B space, for a total work factor of B*E^2.

With a factor base of size B, roughly B relations have to be found. Then a sparse B by B matrix must be solved in the second phase. Using conventional algorithms this solution takes roughly B^2 work on a machine that has B space. The total work factor is B^3.

For optimum balance, the work factor of these two phases is set equal, giving B^3 = B*E^2, or B=E. It is also necessary that E be large enough to produce at least B relations. This leads to the solution E = B = L^0.961. The corresponding work factor is this value cubed, or L^2.883. (Dan Bernstein gives a slightly lower figure due to an improvement by Don Coppersmith.)

Bernstein improves on both of these phases. He points out that for sufficiently large n, it will be less work to determine smoothness by simply trying to factor each value using the Elliptic Curve factoring method (ECM). This is the best factoring method known for finding relatively small factors, as is required here. Attempting to factor E^2 values over a factor base of size B takes time proportional to E^2 times a function of B which is larger than that in the sieve case, but still asymptotically small compared to E^2. So the time is basically proportional to E^2 as with the sieve.

The savings comes in the space. The sieve requires an array to hold the primes in the factor base, and an array to represent the values being sieved. The most efficient approach is to set these two arrays to roughly the same size, breaking the input into pieces of size B. This does not slow the algorithm but allows it to run in space B as assumed above.

In contrast, the ECM algorithm requires no large storage spaces. It does not have to store the factor base or any representation of a large block of numbers to be checked. It simply runs the ECM factoring algorithm on each value to be checked until it either finds a factor or times out. If it finds one, it divides it out and repeats the process with the residue. Eventually the residue is small enough that we know the number is smooth, or the algorithm fails indicating that the value is not smooth. Storage requirements are minimal. Based on this, the ECM approach takes time proportional to E^2 and space essentially a constant, for a total work of E^2. This is compared with B*E^2 for the sieve.

Bernstein also improves on the second phase, finding a dependency among the rows of the matrix. It still requires space of size B to hold the sparse matrix of size B by B. But by making the matrix elements be active instead of dumb memory cells, he can accelerate the matrix operations that are necessary. The result is that finding the dependencies is reduced to taking time B^1.5 rather than B^2. The total work is then B^2.5.

For optimality these two phases are balanced with E^2 = B^2.5 or E=B^1.25. Incorporating the requirement that E be big enough to generate the necessary B relations, the solution is E=L^0.988 and B=L^0.790. The total work is E^2 or L^1.976. This is compared to the L^2.883 value above, a significant savings.

It is interesting to compare the approaches used to accelerate the two phases. In each case we are essentially replacing memory with computational elements, but the kinds of computation being done are very different.

The matrix solution phase requires a high degree of parallelism. The matrix-storage elements need to be enhanced with high speed local interconnects and rudimentary processing capable of simple comparison and exchange operations. This would be an attractive candidate for a custom IC where the basic processing cell would be small and laid out in many replicating units on a chip.

In the smoothness testing phase, on the other hand, the requirements for the computing elements are much greater. Each one needs to be capable of running the ECM factoring algorithm on large values of hundreds of bits in size. This is a relatively complex algorithm and would probably require what is essentially a CPU with some kind of firmware to hold the program. It would probably not have to be as large and complex as modern CPUs, but would be much more than the simple augmented memory cell needed in the matrix solution phase.

The smoothness testing phase has the advantage that the individual tests can be run independently, without any need to interconnect the processing units (for other than I/O). There is more flexibility in terms of time/space tradeoffs in this phase. If we need to use fewer units and take more time, that will not affect the total work involved.

(Possibly a good architecture for the smoothness phase would be similar to the 1980s-era Connection Machine from Thinking Machines. The CM-2 had up to 64K processors with up to 32K of memory per processor. It used a SIMD (single instruction multiple data) architecture which meant that each processing node ran the same instruction at the same time. However there were condition flags which could be set to selectively disable instructions on certain processors based on the result of earlier tests. This allows different nodes to behave differently depending on the data.)

The question remains at what point these improvements become effective. In the first phase, the elementary step of the sieving machine is to add two numbers together. For the ECM machine it is to run the entire ECM algorithm on a test value. Both of these take asymptotically constant time, but obviously the ECM operation will be far slower than the sieving one. Only by replacing passive sieve memory with active custom ECM engines will the potential performance advantage of the ECM machine be realized.

Note that the improvements to the two phases can be applied independently. Either the speedup of the factoring or the speedup of the linear algebra could be used with the other phase being done in the conventional way. Because the nature of the machines for the two phases is very different, it is likely that they have different crossover points at which the new techniques become advantageous. So it is very likely that there are key sizes where only one of the new techniques should be used.

A final point: the factoring phase gets a greater improvement (exponent reduced from 3 to 2) than the linear algebra phase (exponent reduced from 3 to 2.5). This leads to the use of a smaller factor base than in the conventional method. A smaller factor base is helpful for the linear algebra since it gives a smaller matrix, but it is harmful to the smoothness test since fewer numbers will be smooth over the smaller factor base. The greater speedup of the smoothness phase accounts for this shift in emphasis.

--------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com ------- End of forwarded message -------

Zurück