The University College London now has a computer that is said to never crash. We’ve all experienced it: computers do crash, so what makes this particular one impervious to crash events? To explain that, let’s look at why computer crash to start with: at the highest (and simplest) level, computer programs are a linear sequence of instructions, that are executed in order. If for some reason, the execution stops (divide by zero, memory access fault…), that particular program will crash. Usually the Operating System (OS) can recover from that, and terminate that crashed task. Sometime, the crash takes out the OS itself too (that’s the BSOD or Blue Screen of Death on Windows, and of course other systems experience that too from time to time).
The researchers who worked on a self-repairing computer, or “systemic” computer point out that humans don’t “crash” or “freeze” and that’s because the nature of the computations happening into our brains is radically different. Instead of having a small number of critical tasks going on, there are millions or billions of tasks whose results aggregate to form a larger result. This way, if something happens (a neuron dies?), the system is not threatened and can recover by re-computing the lost data later. The bottom-line is: things do go wrong eventually, so we need a way to guarantee that the system can continue working.
Also, a major cause of computer crash is data corruption, whether in memory or on-disk. To prevent that, the systemic computer has built-in redundancy (like a bee swarm, or the brain’s network of neurons), and data is always copied in multiple places. The next step for the research team is to make the machine able to “rewire its own code” to react to changing conditions. This is rather ambitious, but definitely interesting work.
Now, the question is : what can this computer do, and what is the programming model? Massive parallelism is great to scale both performance and redundancy, and with a statistical approach to certain computations, it is true that the “system” becomes much more resilient to failures. However, massively parallel computing is often not suited for tasks that are not parallel in nature.