A Better Mandelbrot Set

How well optimised is the Mandelbrot Set code? What is meant by "well optimised"?

Too often one thinks solely of speed when considering optimisation. But perhaps there are other issues too which should be considered.

For the above code, the human interface is poor (need to edit the source and re-run to change any parameters), the accuracy is possibly not an issue, the algorithmic efficiency is rather basic, and so is the efficiency of its implimentation.

Is the code a complete mess? Not quite: it could be claimed that it was optimised for brevity and readability. And the human interface is not a complete disaster, for the mouse cursor is tracked with co-ordinates, and there is an option to save the image. Some Mandelbrot set generators simply produce an image file and expect one to find another program to view this image, and this code is a step above that, particularly as it updates the on-screen image whilst calculating.

But it could be better, and, as a small step in the direction of "better" the following is offered.

mandel_improved.py (or, for MacOS users who have failed to install a numpy package, mandel_improved_macos.py).

(Any error on running about colorama being insufficiently recent can be ignored, and arises from an inconsistency between the packages in Raspbian.)

What is better? There are three areas of improvement, and one of regression.

Mandelbrot Set

How much faster is this new code, and why? We can separate the time spent computing the set, and the time spent updating the display, by comparing times with the normal code, and with the if statement which controls the screen updates changed from

if (i%10==0):

to

if (i%10==11):

This will never be true, so no update to the screen will occur until the whole set is calculated. The original code runs in about 38s on a Pi 4, or 31s with the updates removed, suggesting that the calculation takes around 31s, and the graphics around 7s. Having the helpful updates in the output add almost a quarter to the run-time is not ideal, but just about acceptable.

The improved code runs in 5.6s with the screen updates, or 0.35s without them. So the speed of the graphics has been improved marginally from around 7s to around 5.3s, but the calculation time has been reduced from 31s to 0.35s. How?

Python is an interpreted language. C, C++ and Fortran are compiled languages. Compiled languages are converted into the language of the CPU which is to execute them before they are run, and then the CPU runs this native code. The same piece of C++ might be translated into 32 bit ARM code for running on a Raspberry Pi, or 64 bit x86_64 code for running on a PC. Either way, the CPU is presented with its native language, and runs fast.

Python is not like that. Although it is translated from the human-readable source to a more computer-friendly "byte code", the same byte code is executed on Pi, PC, or any other computer. The CPU has to run an interpreter to execute these foreign (to it) instructions in some virtual python machine. If the individual instructions involve a lot of processing, this extra translation step is not too costly. As an extreme example, on the Linpack page there is a single line of python,

X=linalg.solve(A,B)

which takes seconds to execute just once. The extra overhead of presenting that as byte code rather than in native instructions will be tiny, and the code used inside linalg.solve will be native. The core of the Mandelbrot algorithm is the opposite.

    while (i<maxiter)and(z.real**2+z.imag**2<4):
        z=z*z+z0
        i=i+1

Each iteration of that loop will require much more effort to interpret than the computational effort required to perform the operations represented. In a compiled language, the compiler translates to native code before the program is first run, and the compiler will compile that loop once, and then it will execute hundreds of thousands of times as native code. In an interpreted language, each iteration sees the overhead of interpretation. The amount of computation in those three line is quite small: six floating point multiplies, four floating point adds, one integer add, one floating point comparison, and one integer comparison.

Python has a trick to address this. The module numba is able to compile some functions, and it will attempt to compile those "decorated" by adding @jit on the preceeding line. However, numba is not in python.org's distribution, so Mac users have to face a more complicated installation process, not described here, before they can use it. Those using a conda environment on MacOS may well have this package.

This leaves a code where the overhead of the graphics is not around 22%, but around 1500%. There is no point trying to optimise the calculation part further unless the graphics can be speeded up considerably, and one might well wish to replace (i%10==0) with (i%20==0).

(The compilation performed by numba occurs the first time a function is called. It seems that if one runs this program, and then presses recalculate to recalculate the same image once it has finished, the recalculated image appears about 0.7s faster. Remove the @jit line, and this time difference between the first and subsequent calculations disappears. So one might conclude that numba took around 0.7s to process the one function it compiles.)

Yet Faster Graphics

I failed to find a matplotlib solution which was significantly faster than this, and which updated the screen whilst it calculated. So eventually I abandoned matplotlib, and rewrote the code to use GTK3 directly for its graphics. The aim of making it faster was certainly achieved: the 5.6s benchmark above dropped in execution time to about 0.65s.

Then I realised that using TK would result in better portability than GTK3, so I rewrote the code to use TK. The result was marginally faster, and it is this version which I have continued to develop and which I used to produce the gallery of Mandelbrot images.