Abstract:Abstract-There is a growing realization that the expected fault rates and energy dissipation stemming from increases in CMOS integration will lead to the abandonment of traditional system reliability in favor of approaches that offer reliability to hardware-induced errors across the application, runtime support, architecture, device and integrated-circuit (IC) layers. Commercial stakeholders of multimedia stream processing (MSP) applications, such as information retrieval, stream mining systems, and high-throu… Show more
“…1(a). Beyond the single LSB operator indicated in (2) and illustrated in Fig. 1(a), we can also assume series of such operators applied consecutively in order to realize higher-level algorithmic processing, e.g., multiple consecutive additions, subtractions and scaling operations with pre-established kernels followed by circular convolutions and permutation operations.…”
Section: Abft/mr Methods Versus Numerical Entanglementmentioning
confidence: 99%
“…System-induced faults in DSP routines manifest as [1], [2], [18]: (i) transient faults, where execution continues uninterrupted on all input data streams-albeit with corrupted data and possibly carrying out erroneous logic or arithmetic operations-or (ii) fail-stop failures, where the execution on one of the processor cores halts due to a fail-stop exception (e.g., overflow detection, memory leak assertion, etc.) or a system crash.…”
Abstract-A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on M integer data streams (M ≥ 3), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method, the M input integer data streams are linearly superimposed to form M numerically-entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The results are extracted from the M entangled output streams by additions and arithmetic shifts. Any soft errors affecting any single disentangled output stream are guaranteed to be detectable via a specific post-computation reliability check. In addition, when utilizing a separate processor core for each of the M streams, the proposed approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entanglement, extraction and validation of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via several types of operations: fast Fourier transforms, circular convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between 0.03% to 7% reduction in processing throughput for a wide variety of LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in faultgenerating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy.
“…1(a). Beyond the single LSB operator indicated in (2) and illustrated in Fig. 1(a), we can also assume series of such operators applied consecutively in order to realize higher-level algorithmic processing, e.g., multiple consecutive additions, subtractions and scaling operations with pre-established kernels followed by circular convolutions and permutation operations.…”
Section: Abft/mr Methods Versus Numerical Entanglementmentioning
confidence: 99%
“…System-induced faults in DSP routines manifest as [1], [2], [18]: (i) transient faults, where execution continues uninterrupted on all input data streams-albeit with corrupted data and possibly carrying out erroneous logic or arithmetic operations-or (ii) fail-stop failures, where the execution on one of the processor cores halts due to a fail-stop exception (e.g., overflow detection, memory leak assertion, etc.) or a system crash.…”
Abstract-A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on M integer data streams (M ≥ 3), such as: scaling, additions/subtractions, inner or outer vector products, permutations and convolutions. In the proposed method, the M input integer data streams are linearly superimposed to form M numerically-entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The results are extracted from the M entangled output streams by additions and arithmetic shifts. Any soft errors affecting any single disentangled output stream are guaranteed to be detectable via a specific post-computation reliability check. In addition, when utilizing a separate processor core for each of the M streams, the proposed approach can recover all outputs after any single fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, the number of operations required for the entanglement, extraction and validation of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via several types of operations: fast Fourier transforms, circular convolutions, and matrix multiplication operations. Our analysis and experiments reveal that the proposed approach incurs between 0.03% to 7% reduction in processing throughput for a wide variety of LSB operations. This overhead is 5 to 1000 times smaller than that of the equivalent ABFT method that uses a checksum stream. Thus, our proposal can be used in faultgenerating processor hardware or safety-critical applications, where high reliability is required without the cost of ABFT or modular redundancy.
“…E RROR-TOLERANT multimedia processing [1] comprises any system that: (i) processes large volumes of input data (image pixels, sensor measurements, database entries, etc.) with performance-critical digital signal processing (DSP) or linear algebra kernels (filtering, decomposition, factorization, feature extraction, principal components, probability mixtures, Monte-Carlo methods, etc.)…”
Abstract-Generic matrix multiplication (GEMM) and onedimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute-and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the number of projections computed by each kernel, which in turn produces approximate results, i.e. changes the precision of the performed computation. Results derived from a voltage-and frequency-scaled ARM Cortex A15 processor running face recognition and music matching algorithms demonstrate that the proposed approach allows for 280% ∼ 440% increase of processing throughput and 75% ∼ 80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the obtained recognition or matching accuracy. Even higher gains can be obtained if one is willing to tolerate some reduction in the accuracy of the recognition and matching applications.
“…E RROR-TOLERANT multimedia processing [1] comprises any system that processes large volumes of input data (image pixels, sensor measurements, database entries, etc.) with performance-critical digital signal processing (DSP) or linear algebra kernels (filtering, decomposition, factorization, feature extraction, principal components, probability mixtures, Monte-Carlo methods, etc.)…”
Section: Introductionmentioning
confidence: 99%
“…When aiming for high-throughput/low-energy performance, the critical issues of the execution environment of Figure 1 are [1], [39], [40]: (i) the data movement to/from cores; (ii) the processing time and energy consumption per core; (iii) the limited concurrency when the top-level processing allows for only a few blocks. These issues are addressed in our proposal by viewing the process between L2 and L3 as a computation channel [38] that returns approximate results.…”
Generic matrix multiplication (GEMM) and onedimensional discrete convolution/cross-correlation (CONV) kernels perform the bulk of the compute-and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such errortolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by decreasing the number of projections computed by each kernel, which in turn produces approximate results, i.e. lowers the precision of the performed computation. Existing realizations of error-tolerant multimedia applications can opt to utilize a small number of the input projections (typically just one) in order to save energy and processing cycles, while all error-intolerant systems can compute all input projections and obtain full-precision outputs. Results derived from a voltage-and frequency-scaled ARM Cortex A15 processor running face recognition demonstrate that the proposed approach allows for 5-fold to 10-fold increase of processing throughput and more than 80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the expected recognition and matching precision.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.