# Faster Mersenne Primes

How can the code given on the Mersenne page be made faster?

First a few words about high-precision arithmetic. We know that to
add by hand two n-digit numbers takes n add-with-carry operations,
each on a pair of digits, one from each number. However,
multiplication is harder. Long multiplication requires n
multiplications of the whole of the first number by a digit of the
second, followed by adding n numbers, each of 2n digits. In all
there are about 3n^{2} operations.

I believe that python uses this "n^{2}" for "small" large
numbers, and then switches
to Karatsuba's
algorithm, which was discovered in the 1960s and is order
n^{1.59} rather than n^{2}. I also believe that for
a "digit", rather than use something which can represent 0 to 9 as a
Human would, python uses something that can represent 0 to
2^{15} on (most) 32-bit systems, and something that can
represent 0 to 2^{30} on (most) 64-bit systems.

From this one may conclude that this is a rare area where 64 bit code is likely to be significantly faster than 32 bit code, as numbers will be represented internally by half as many "digits", which will make addition twice as fast, and multiplication with Karatsuba's algorithm three times as fast. Also that multiplication is much more expensive than addition, and it might be reasonable to expect that the modulus operation is expensive too. However, bitwise operations should be fast, as python is using a binary internal representation.

The previous code took about 75s to find all Mersenne primes up to p=3300 on a Pi 4. The following code can do the same in just under 20s.

for i in range (1,p-1): s=(s*s-2) if (s<0): s+=Mp s=(s&Mp)+(s>>p) s=(s&Mp)+(s>>p) if (s==Mp): s=0

(Just the new loop for calculating the s series is shown above, the rest of the code is the same.)

The modulus operation has been removed, and has been replaced by four lines of code. How do the new lines work?

The first simply ensures that s is always positive. If s is ever
zero or one, the next term will be negative, and I cannot prove that
this can never happen. Given that we only ever want the value of s
modulo M_{p}, we are free to add M_{p} as often as
we like.

After taking the modulus with respect to M_{p}, s will have
at most p binary digits. So before the modulus operation, it will
have at most 2p binary digits, as s^{2} has twice as many
digits as s. So we could chose to write s in the following form:

s = a*2^{p}+ b

with both a and b in the range 0<=a,b<2^{p}. This is
easy: b is simply the last p bits of s, which we can extract with a
bitwise and operation as s&M_{p} given that
M_{p} has p binary digits, all of which are set to one. And
a is simply the first p bits of s (if s is expressed in 2p bits). So
a can be found by shifting the bits of s to the right p times,
which in python is written s>>p.

If we had wanted s modulo 2^{p} then the answer would be
simply b. But we want s modulo (2^{p}-1). But we can use the
result that

(x*y) mod z = (x*(y mod z)) mod z

So we have

s mod M_{p}= (a*2^{p}+ b) mod M_{p}= (a*(2^{p}mod M_{p}) + b) mod M_{p}= (a*1 + b) mod M_{p}= (a + b) mod M_{p}

At first glance this is no improvement, for we still have the
mod M_{p} at the end. But a and b must both be less
than 2^{p}, or less than or equal to M_{p}. So their
sum is less than or equal to 2M_{p}. And they cannot both be
equal to M_{p} (for this would imply that s=2^{2p}-1
which cannot be achieved in a single step from s<M_{p}).
So (a+b)<2M_{p}.

If we set s=(a+b) and try another step of this formula for finding
the modulus, we are now in the position that a is either zero or
one, and, if a is one, b is less than M_{p}. So the second
step is guaranteed to give an (a+b) which is at most
M_{p}. Finding the modulus with respect to M_{p} of
a number which is no more than M_{p} is easy: if the number
is M_{p} the answer is zero, else it is the original
number.

One could improve this code further, but for now I conclude with
the thought that setting it to test the single number with p=44497
produces the result that M_{44497} is indeed prime after
just over two minutes. This prime was found in 1979 using a
Cray I. In that era it was common for newly-launched
supercomputers to prove their worth by finding a new prime. To be
fair, verifying that M_{44497} is prime is much easier than
searching for it and discarding thousands(?) of non-prime
candidates. Searching through all Mersenne numbers up to 11213
inclusive took this new code just under 22 minutes.

The algorithms used in practice to find very large Mersenne primes
use
the Strassen
algorithm for the multiplication, and are able to combine the
multiplication and modulus operation into a single step which is
faster than the multiplication without the modulus would be. Some
rather dusty Fortran77 code I have which uses this approach managed
to do the above search up to M_{11213} in under six
minutes. More impressively, it verified that M_{44497} is
prime in about twenty seconds. (Still all on a Pi 4.)