The idea of computational error correction has been around for over half a century. The motivation has largely been to mitigate unreliable devices, manufacturing defects or harsh environments, primarily as a mandatory measure to preserve reliability, or more recently, as a means to lower energy by allowing soft errors to occasionally creep. While residue codes have shown great promise for this purpose, there have been several orthogonal non-residue based techniques. In this article, we provide a high level outline of some of these non-residual approaches.
OverviewWe first classify various approaches to computational error correction into two broad categories:1. Temporal Redundancy. This approach is based on the hypothesis that the probability of transient errors that occur at the same place to have temporal multiplicity is very low. In other words, a soft error occurs infrequently at the same device, and as such, repeated measurements in some manner would serve as an indicator to the correct computation.2. Spatial Redundancy. This approach is based on the hypothesis that the probability of multiple identical computations to all be in error at the same time is very low. In other words, by replicating a computation, any error in a small fraction of the replicas can be masked / overpowered by the other correct replicas.These principles, it turns out, are fundamental to any sort of error correction including computation (ex. arithmetic), storage (ex. memory) and transmission (ex. networking). Some proposals favor spatial redundancy over temporal redundancy, some vice versa, and some employ both, depending upon the target fault model and environment. Given a technique, it is relatively straightforward to determine presence of temporal and/or spatial redundancy, as such, we leave this to the interested reader.Von Neumann [1] was among the first to propose using redundant components to overcome the effects of defective devices. He introduced the now widely used technique of Triple Modular Redundancy (TMR), which essentially uses three devices instead of one and uses a majority voter to infer a correct output. To note here is that such a mechanism can correct a single error (meaning that at least two of the three devices are not in error), or detect most double errors (where at least