Yet Faster Fixed Point Mandelbrot Set

Interpreted languages such as python are slow. One thing which makes python particularly slow is its dynamic typing. Every time it meets a line such as x=x+y it has to determine what type of variables x and y are. Is the addition required integer addition, floating point addition, or string concatenation? Or is it an error condition because we are trying to add apples and pears, or some other incompatible types? Perhaps, given our example code, it is an GMP MPZ addition. Python's typing is so dynamic that the types may have changed from the last time that the statement was executed. The result is the the interpreter may take longer to determine what sort of addition is required than it takes to perform the addition!

A traditional compiled language such as C does not have this issue. So we will try rewritting the core loop of our code in C, and then attempt to use it from python. We cannot use numba in this case, for numba does not understand python's inbuilt arbitrary precision integers, and insists that they get converted to fixed-length C integers, and numba also does not understand the MPZ objects fromthe gmpy2 module.

Using GMP with C is not very pleasant. C does not permit the usual operators to be overloaded, so one is left calling functions to perform every operation, and the result looks rather too much like assembler.

#include <gmp.h>
  
int mandel (mpz_t x0, mpz_t y0, long bshift, int maxiter){
  mpz_t x,y,x2,y2,mod_test,tmp;
  int i;

  mpz_inits(x,y,x2,y2,mod_test,tmp,NULL);

  mpz_set(x0,MPZ(x0_py));
  mpz_set(y0,MPZ(y0_py));

  /* mod_test = 4<<(2*bshift) */
  mpz_set_ui(mod_test,(unsigned long)4);
  mpz_mul_2exp(mod_test,mod_test,2*bshift);
  
  mpz_set(x,x0);
  mpz_set(y,y0);
  mpz_mul(x2,x,x);
  mpz_mul(y2,y,y);

  for(i=0;i<maxiter;i++){
    /* tmp=x2+y2 */
    mpz_add(tmp,x2,y2);
    if (mpz_cmp(tmp,mod_test)>=0) break;
    /* y=((x*y)>>(bshift-1))+y0 */
    mpz_mul(y,x,y);
    mpz_tdiv_q_2exp(y,y,bshift-1);
    mpz_add(y,y,y0);
    /* x=((x2-y2)>>bshift)+x0 */
    mpz_sub(x,x2,y2);
    mpz_tdiv_q_2exp(x,x,bshift);
    mpz_add(x,x,x0);
    /* x2=x*x */
    mpz_mul(x2,x,x);
    /* y2=y*y */
    mpz_mul(y2,y,y);
  }
  
  mpz_clears(x,y,x2,y2,mod_test,tmp,NULL);
  return i;
}

But trying to interface this to python creates another layer of unpleasantness. Firstly python will pass its arguments as a tuple which needs unpacking, and will expect the return value to be a python object.

#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <gmp.h>
  
static PyObject *mandel(PyObject *self, PyObject *args){
  int maxiter;
  long bshift;
  PyObject *x0_py,*y0_py;
  mpz_t x,y,x0,y0,x2,y2,mod_test,tmp;
  int i;
  
  if (!PyArg_ParseTuple(args,"OOli",&x0_py,&y0_py,&bshift,&maxiter)){
    fprintf(stderr,"Argument error\n");
    exit(1);
  }

  mpz_inits(x,y,x0,y0,x2,y2,mod_test,tmp,NULL);
  mpz_set(x0,MPZ(x0_py));
  mpz_set(y0,MPZ(y0_py));

[...]

  return PyLong_FromLong((long)i);
}

And then there is the matter of adding an initialisation function too.

Finally one needs to compile it. The recommended route is via distutils.

A recipe

(This recipe confines iteself to a single directory, and does not clutter your collection of installed python modules. It is thus useful whilst developing. It needs the GMP development package to be installed, and some versions of Debian seem not to realise that that depends on two more Gnu arbitrary precision pacakges. One will probably need

  sudo apt-get install libgmp-dev libmpfr-dev libmpc-dev

as well as a standard C compilation environment.)

Place in the same directory

Then build the module.

  $ python3 setup.py build_ext --inplace

which will place the module (shared library) in the current directory. It will have a very long name, something like
gmp_mandel.cpython-39-aarch64-linux-gnu.so
Then one should be able to run it.

  $ python3 mandel_tk_gmpc_par.py

Was it worth it? Continuing the timings presented earlier, the default set took 5.2s to display, and with the bit shift set to 160 it took just under 8s. It is still a lot slower than the standard double precision version using numba, which takes under 0.5s, but it is about 3.5x as fast as our previous best attempt at writing our own fixed point code, and it has shown how to write a C function which is callable directly from python without using cython.

Improvements?

One might wonder if trying to convert to C the whole of the loop generating a column would produce a further improvement. But what we have done has converted to C the majority of the code in terms of where the execution time is spent, and, once the number of iterations per pixel starts to rise, as it will at high zoom levels, the amount of work outside of the inner loop for a pixel becomes negligible.

But as an exercise to learn a few more techniques, perhaps moving the whole of the mandel_column function to C is worthwhile. To make it more interesting, we could pass the C function standard python integers, and have the C part convert them to the GMP MPZ type, leaving the python completely free of GMP and hence of the gmpy2 module too.

Done. It is about 0.7s faster on the two tests above, and the time saved is expected to depend on the number of pixels, but not the value of maxiter or the bit shift. The function PyLong2GMP shows how python stores integers. A count of "digits", with negative counts indicating a negative number, and the number of "digits" being the absolute value of the count. Each "digit" contains the next PyLong_SHIFT bits of the number stored. On 64-bit platforms a "digit" is a 32-bit unsigned integer, and PyLong_SHIFT is 30.

The code also shows how to return a newly-created Numpy array from C to python.