32- or 64-bit

Today when one refers to a computer as being 32-bit, or 64-bit, one is refering to the size of the pointer used to address memory. If memory addresses are constrained to 32 bits, then there are only 232 different ways of setting 32 bits, and one can address up to 4GB of memory. As 4GB of memory can be purchased for under £20, this is not very much. If memory addresses can have 64 bits, then one can have four thousand million times as much memory.

Floating point

Floating point numbers are also refered to as being 64 bit (or, equivalently, double precision), or 32 bit (single precision). However, 32 bit computers, and even pre 32 bit computers, can still use 64-bit datatypes, such as double precision numbers. The original IBM PC (launched 1981) had an optional maths co-processor supporting the same 64 bit double precision standard that is used today. Its integer registers were merely 16 bits in size, and memory addresses were constrained to 20 bits (1MB).

Integer

Most current CPUs, including ARM and Intel, use "general purpose" integer registers which can hold either data or addresses (pointers). Thus on 64 bit computers these will be 64 bits in size, and on 32 bit ones 32 bits.

Performance

A 64 bit pointer will take twice as much memory as a 32 one, and may take twice as long to access (but rarely). However, usually a code is processing data, not pointers, so this is not generally significant.

Using 64 bit integers on hardware with 64 bit integer registers is much faster than emulating 64 bit integers on hardware with 32 bit registers. But most code, even on 64 bit computers, is written to use 32 bit integers for most purposes. So this performance benefit is rarely seen.

And Another Thing

Changing a CPU from 32 bit to 64 bit is quite a major design change. So other changes not required by the increase from 32 bits to 64 bits might be made at the same time. This is the case for both the AMD/Intel line of CPUs, and for the ARM line found in the Raspberry Pis.

When AMD extended Intel's 32 bit instruction set to 64 bits, it needed to increase the size of the eight 32 bit integer registers to 64 bit. However, it also chose to increase their number to sixteen.

The eight SSE vector registers did not need any changes, as they are never used for storing addresses, and were 128 bits long anyway. However, AMD also doubled their number to sixteen.

In the case of ARM the story is similar. The transition to 64 bits in the ARMv8-A instruction set also saw a doubling of the integer registers from sixteen to thirty-one, and of the 128 bit vector registers from sixteen to thirty-two. Additionally, the vector support for floating point was limited to single precision in the 32 bit instruction set. A 128 bit vector register could hold one, two or four single precision floating point numbers (or many combinations of different length integers). With the move to ARMv8, it became possible for the vector registers to hold one or two double precision floating point values. However early ARMv8 cores, including the Cortex A72 of the Pi 4, take longer to process a full vector register than a half-full one, so this does not represent as much of a performance benefit as might be imagined.

Doubling the number of registers is likely to have a beneficial impact on performance. (If it were not of benefit, it would not have been done!) In many cases this benefit will be greater than the other performance impacts of the move to 64 bits.

The OS

Most 64 bit CPUs retain all the earlier 32 bit instructions from older members of their respective series. So they can still run 32 bit operating systems. However, doing so generally turns off all the changes associated with the move to 64 bits.

Not only is the OS responsible for allocating memory, and will be unable to permit a program to use 64 bit addresses if it itself does not understand them, but it is responsible for saving a program's state, including all of its registers, whenever multitasking requires that a program is suspended. Again, it cannot do this correctly if additional registers of which it is unaware exist and are in use.

(Many 32 bit CPUs, including both ARM and Intel, offer some form of "Physical Address Extension" which allows a 32 bit OS to address more than 4GB of physical memory, but programs running remain confined to 4GB of memory each.)

It is possible for a 64 bit OS to support both 32 and 64 bit programs. However, this does lead to significant memory and disk overheads, as two versions will be required of most libraries, and the kernel will need extra code to interface with 32 bit applications too. For machines with large amounts of memory and disk space, this is a very minor issue. But arguably the Pi is not such a machine.

Raspbian is (currently, April 2020) 32-bit only, and the above paragraph suggests that a gentle transition, rather than an abrupt change, may be less suited to the Pi ecosystem than it was in the world of desktop PCs (where Linux, MacOS and Windows have all made the transition).