Faster Mersenne Primes

How can the code given on the Mersenne page be made faster?

First a few words about high-precision arithmetic. We know that to add by hand two n-digit numbers takes n add-with-carry operations, each on a pair of digits, one from each number. However, multiplication is harder. Long multiplication requires n multiplications of the whole of the first number by a digit of the second, followed by adding n numbers, each of 2n digits. In all there are about 3n2 operations.

I believe that python uses this "n2" for "small" large numbers, and then switches to Karatsuba's algorithm, which was discovered in the 1960s and is order n1.59 rather than n2. I also believe that for a "digit", rather than use something which can represent 0 to 9 as a Human would, python uses something that can represent 0 to 215 on (most) 32-bit systems, and something that can represent 0 to 230 on (most) 64-bit systems.

From this one may conclude that this is a rare area where 64 bit code is likely to be significantly faster than 32 bit code, as numbers will be represented internally by half as many "digits", which will make addition twice as fast, and multiplication with Karatsuba's algorithm three times as fast. Also that multiplication is much more expensive than addition, and it might be reasonable to expect that the modulus operation is expensive too. However, bitwise operations should be fast, as python is using a binary internal representation.

The previous code took about 75s to find all Mersenne primes up to p=3300 on a Pi 4. The following code can do the same in just under 20s.

    for i in range (1,p-1):
        s=(s*s-2)
        if (s<0): s+=Mp
        s=(s&Mp)+(s>>p)
        s=(s&Mp)+(s>>p)
        if (s==Mp): s=0

(Just the new loop for calculating the s series is shown above, the rest of the code is the same.)

The modulus operation has been removed, and has been replaced by four lines of code. How do the new lines work?

The first simply ensures that s is always positive. If s is ever zero or one, the next term will be negative, and I cannot prove that this can never happen. Given that we only ever want the value of s modulo Mp, we are free to add Mp as often as we like.

After taking the modulus with respect to Mp, s will have at most p binary digits. So before the modulus operation, it will have at most 2p binary digits, as s2 has twice as many digits as s. So we could chose to write s in the following form:

  s = a*2p + b

with both a and b in the range 0<=a,b<2p. This is easy: b is simply the last p bits of s, which we can extract with a bitwise and operation as s&Mp given that Mp has p binary digits, all of which are set to one. And a is simply the first p bits of s (if s is expressed in 2p bits). So a can be found by shifting the bits of s to the right p times, which in python is written s>>p.

If we had wanted s modulo 2p then the answer would be simply b. But we want s modulo (2p-1). But we can use the result that

 (x*y) mod z = (x*(y mod z)) mod z

So we have

  s mod Mp = (a*2p + b) mod  Mp
            = (a*(2p mod Mp) + b) mod Mp
            = (a*1 + b) mod Mp
            = (a + b) mod Mp

At first glance this is no improvement, for we still have the mod Mp at the end. But a and b must both be less than 2p, or less than or equal to Mp. So their sum is less than or equal to 2Mp. And they cannot both be equal to Mp (for this would imply that s=22p-1 which cannot be achieved in a single step from s<Mp). So (a+b)<2Mp.

If we set s=(a+b) and try another step of this formula for finding the modulus, we are now in the position that a is either zero or one, and, if a is one, b is less than Mp. So the second step is guaranteed to give an (a+b) which is at most Mp. Finding the modulus with respect to Mp of a number which is no more than Mp is easy: if the number is Mp the answer is zero, else it is the original number.

One could improve this code further, but for now I conclude with the thought that setting it to test the single number with p=44497 produces the result that M44497 is indeed prime after just over two minutes. This prime was found in 1979 using a Cray I. In that era it was common for newly-launched supercomputers to prove their worth by finding a new prime. To be fair, verifying that M44497 is prime is much easier than searching for it and discarding thousands(?) of non-prime candidates. Searching through all Mersenne numbers up to 11213 inclusive took this new code just under 22 minutes.

The algorithms used in practice to find very large Mersenne primes use the Strassen algorithm for the multiplication, and are able to combine the multiplication and modulus operation into a single step which is faster than the multiplication without the modulus would be. Some rather dusty Fortran77 code I have which uses this approach managed to do the above search up to M11213 in under six minutes. More impressively, it verified that M44497 is prime in about twenty seconds. (Still all on a Pi 4.)