Yet Faster Fixed Point Mandelbrot Set
Interpreted languages such as python are slow. One thing which
makes python particularly slow is its dynamic typing. Every time it
meets a line such as x=x+y
it has to determine what
type of variables x
and y
are. Is the
addition required integer addition, floating point addition, or
string concatenation? Or is it an error condition because we are
trying to add apples and pears, or some other incompatible types?
Perhaps, given our example code, it is an GMP MPZ addition. Python's
typing is so dynamic that the types may have changed from the last
time that the statement was executed. The result is the the
interpreter may take longer to determine what sort of addition is
required than it takes to perform the addition!
A traditional compiled language such as C does not have this issue. So we will try rewritting the core loop of our code in C, and then attempt to use it from python. We cannot use numba in this case, for numba does not understand python's inbuilt arbitrary precision integers, and insists that they get converted to fixed-length C integers, and numba also does not understand the MPZ objects fromthe gmpy2 module.
Using GMP with C is not very pleasant. C does not permit the usual operators to be overloaded, so one is left calling functions to perform every operation, and the result looks rather too much like assembler.
#include <gmp.h> int mandel (mpz_t x0, mpz_t y0, long bshift, int maxiter){ mpz_t x,y,x2,y2,mod_test,tmp; int i; mpz_inits(x,y,x2,y2,mod_test,tmp,NULL); mpz_set(x0,MPZ(x0_py)); mpz_set(y0,MPZ(y0_py)); /* mod_test = 4<<(2*bshift) */ mpz_set_ui(mod_test,(unsigned long)4); mpz_mul_2exp(mod_test,mod_test,2*bshift); mpz_set(x,x0); mpz_set(y,y0); mpz_mul(x2,x,x); mpz_mul(y2,y,y); for(i=0;i<maxiter;i++){ /* tmp=x2+y2 */ mpz_add(tmp,x2,y2); if (mpz_cmp(tmp,mod_test)>=0) break; /* y=((x*y)>>(bshift-1))+y0 */ mpz_mul(y,x,y); mpz_tdiv_q_2exp(y,y,bshift-1); mpz_add(y,y,y0); /* x=((x2-y2)>>bshift)+x0 */ mpz_sub(x,x2,y2); mpz_tdiv_q_2exp(x,x,bshift); mpz_add(x,x,x0); /* x2=x*x */ mpz_mul(x2,x,x); /* y2=y*y */ mpz_mul(y2,y,y); } mpz_clears(x,y,x2,y2,mod_test,tmp,NULL); return i; }
But trying to interface this to python creates another layer of unpleasantness. Firstly python will pass its arguments as a tuple which needs unpacking, and will expect the return value to be a python object.
#define PY_SSIZE_T_CLEAN #include <Python.h> #include <gmp.h> static PyObject *mandel(PyObject *self, PyObject *args){ int maxiter; long bshift; PyObject *x0_py,*y0_py; mpz_t x,y,x0,y0,x2,y2,mod_test,tmp; int i; if (!PyArg_ParseTuple(args,"OOli",&x0_py,&y0_py,&bshift,&maxiter)){ fprintf(stderr,"Argument error\n"); exit(1); } mpz_inits(x,y,x0,y0,x2,y2,mod_test,tmp,NULL); mpz_set(x0,MPZ(x0_py)); mpz_set(y0,MPZ(y0_py)); [...] return PyLong_FromLong((long)i); }
And then there is the matter of adding an initialisation function too.
Finally one needs to compile it. The recommended route is via distutils.
A recipe
(This recipe confines iteself to a single directory, and does not clutter your collection of installed python modules. It is thus useful whilst developing. It needs the GMP development package to be installed, and some versions of Debian seem not to realise that that depends on two more Gnu arbitrary precision pacakges. One will probably need
sudo apt-get install libgmp-dev libmpfr-dev libmpc-dev
as well as a standard C compilation environment.)
Place in the same directory
Then build the module.
$ python3 setup.py build_ext --inplace
which will place the module (shared library) in the current
directory. It will have a very long name, something like
gmp_mandel.cpython-39-aarch64-linux-gnu.so
Then one
should be able to run it.
$ python3 mandel_tk_gmpc_par.py
Was it worth it? Continuing the timings presented earlier, the default set took 5.2s to display, and with the bit shift set to 160 it took just under 8s. It is still a lot slower than the standard double precision version using numba, which takes under 0.5s, but it is about 3.5x as fast as our previous best attempt at writing our own fixed point code, and it has shown how to write a C function which is callable directly from python without using cython.
Improvements?
One might wonder if trying to convert to C the whole of the loop generating a column would produce a further improvement. But what we have done has converted to C the majority of the code in terms of where the execution time is spent, and, once the number of iterations per pixel starts to rise, as it will at high zoom levels, the amount of work outside of the inner loop for a pixel becomes negligible.
But as an exercise to learn a few more techniques, perhaps moving
the whole of the mandel_column
function to C is
worthwhile. To make it more interesting, we could pass the C
function standard python integers, and have the C part convert them
to the GMP MPZ type, leaving the python completely free of GMP and
hence of the gmpy2 module too.
Done. It is about 0.7s faster on the two tests above, and the time
saved is expected to depend on the number of pixels, but not the
value of maxiter or the bit shift. The
function PyLong2GMP
shows how python stores integers. A
count of "digits", with negative counts indicating a negative
number, and the number of "digits" being the absolute value of the
count. Each "digit" contains the next PyLong_SHIFT
bits of the number stored. On 64-bit platforms a "digit" is a 32-bit
unsigned integer, and PyLong_SHIFT
is 30.
The code also shows how to return a newly-created Numpy array from C to python.