A Faster Fixed Point Mandelbrot Set
The example fixed point Mandelbrot Set is very slow, and inevitably it gets slower as the number of bits used increases. It is instructive as its slowness arises from many different areas, some of which can be improved.
while loop into a
tends to help in python (and to be of no consequence in many other
languages), and with a little care we can eliminate one shift per
loop iteration. Also the multiplication by two can be eliminated by
adjusting the shift which immediately follows it.
When Humans work with long numbers, they can add or multiply single digits in a trice, and break larger numbers down into strings of single digits. The effort involved in adding two n digit numbers simply scales as n, whereas that involved in multiplying two n digit numbers scales as n2 for standard long multiplication.
Computers differ in two regards. Firstly the range of numbers that they can calculate on with trivial constant-time operations is not zero to ten, as it is for most Humans, but zero to 232 (about four million) for 32-bit computers, and zero to 264 for 64-bit computers. Beyond this numbers need breaking down into multiple chunks (digits?), just as Humans would for much smaller numbers.
Secondly whilst the effort involved in addition is just proportional to the number of chunks, multiplication is much more interesting. One can use the familiar n2 long multiplication algorithm, but there are also more efficient algorithms for large values of n. The Gnu Multi Precision project currently lists seven.
Python uses a chunk size of just 230 on 64-bit platforms, and only 215 on 32-bit platforms. (It finds it convenient if squaring a chunk does not cause an overflow.) It then knows two different multiplication algorithms, standard long multiplication and Karatsuba's algorithm, which scales as n1.58. However, if we switch to the GMP library using python's gmpy2 module we can make use of a larger chunk size, which will halve the number of chunks big numbers are broken down into, and better multiplication algorithms.
- mandel_tk_int_par.py: Python Mandelbrot, fixed point, parallel, using Python's integers
- mandel_tk_gmp_par.py: Python Mandelbrot, fixed point, parallel, using GMP's integers
It may be the case that you do not have the python gmpy2 module installed, in which case you will need to install it using one of the below commands, the first being easier on Debian-based Linux systems (including Raspberry Pi OS), and the second for most other systems, or for a Linux system to which you do not have root access:
sudo apt-get install python3-gmpy2 pip3 install --user gmpy2
A quick test suggests that our new code is actually slower. But both codes default to using a bit shift of eight, which is well within the capabilities of both for working with single-chunk numbers. Change the bit shift to 80, at which point the code becomes useful as its precision is then greater than double precision, and, on the machine I tested on, the python version slows from 17s to 26.5s, but the GMP version slows from 24s to just 26.5s. Increasing the bit shift to 160 slowed the python version to 33.5s, and the GMP version was now leading at 29s. Which code is faster? It depends on the bit shift (i.e. the level of zoom). At high zoom levels GMP wins.
But neither is fast, and neither works with numba. So the next approach is to try calling C from python.