Error Tolerant Multimedia Stream Processing: There's Plenty of Room at the Top (of the System Stack)

IEEE Trans. Signal Process.

2016

Self Cite

Abstract-A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on M integer data streams (M ≥ 3), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method, the M input integer data streams are linearly superimposed to form M numerically-entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The results are extracted from the M entangled output streams by additions and arithmetic shifts. Any soft errors affecting any single disentangled output stream are guaranteed to be detectable via a specific post-computation reliability check. In addition, when utilizing a separate processor core for each of the M streams, the proposed approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entanglement, extraction and validation of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via several types of operations: fast Fourier transforms, circular convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between 0.03% to 7% reduction in processing throughput for a wide variety of LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in faultgenerating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy.

Section: Abft/mr Methods Versus Numerical Entanglementmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Reliable Linear, Sesquilinear, and Bijective Operations on Integer Data Streams Via Numerical Entanglement

IEEE Trans. Signal Process.

2016

Self Cite

“…E RROR-TOLERANT multimedia processing [1] comprises any system that: (i) processes large volumes of input data (image pixels, sensor measurements, database entries, etc.) with performance-critical digital signal processing (DSP) or linear algebra kernels (filtering, decomposition, factorization, feature extraction, principal components, probability mixtures, Monte-Carlo methods, etc.)…”

Section: Introductionmentioning

confidence: 99%

Precision–Energy–Throughput Scaling of Generic Matrix Multiplication and Convolution Kernels via Linear Projections

Whatmough

IEEE Trans. Circuits Syst. Video Technol.

2014

Self Cite

Abstract-Generic matrix multiplication (GEMM) and onedimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute-and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the number of projections computed by each kernel, which in turn produces approximate results, i.e. changes the precision of the performed computation. Results derived from a voltage-and frequency-scaled ARM Cortex A15 processor running face recognition and music matching algorithms demonstrate that the proposed approach allows for 280% ∼ 440% increase of processing throughput and 75% ∼ 80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the obtained recognition or matching accuracy. Even higher gains can be obtained if one is willing to tolerate some reduction in the accuracy of the recognition and matching applications.

“…E RROR-TOLERANT multimedia processing [1] comprises any system that processes large volumes of input data (image pixels, sensor measurements, database entries, etc.) with performance-critical digital signal processing (DSP) or linear algebra kernels (filtering, decomposition, factorization, feature extraction, principal components, probability mixtures, Monte-Carlo methods, etc.)…”

Section: Introductionmentioning

confidence: 99%

“…When aiming for high-throughput/low-energy performance, the critical issues of the execution environment of Figure 1 are [1], [39], [40]: (i) the data movement to/from cores; (ii) the processing time and energy consumption per core; (iii) the limited concurrency when the top-level processing allows for only a few blocks. These issues are addressed in our proposal by viewing the process between L2 and L3 as a computation channel [38] that returns approximate results.…”

Section: Introductionmentioning

confidence: 99%

Precision-energy-throughput scaling of generic matrix multiplication and discrete convolution kernels via linear projections

Whatmough

The 11th IEEE Symposium on Embedded Systems for Real-Time Multimedia

2013

Self Cite

Generic matrix multiplication (GEMM) and onedimensional discrete convolution/cross-correlation (CONV) kernels perform the bulk of the compute-and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such errortolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by decreasing the number of projections computed by each kernel, which in turn produces approximate results, i.e. lowers the precision of the performed computation. Existing realizations of error-tolerant multimedia applications can opt to utilize a small number of the input projections (typically just one) in order to save energy and processing cycles, while all error-intolerant systems can compute all input projections and obtain full-precision outputs. Results derived from a voltage-and frequency-scaled ARM Cortex A15 processor running face recognition demonstrate that the proposed approach allows for 5-fold to 10-fold increase of processing throughput and more than 80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the expected recognition and matching precision.