The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2017
DOI: 10.1145/3001935
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Idle Hardware to Provide Low Overhead Fault Tolerance for VLIW Processors

Abstract: Because of technology scaling, the soft error rate has been increasing in digital circuits, which affects system reliability. Therefore, modern processors, including VLIW architectures, must have means to mitigate such effects to guarantee reliable computing. In this scenario, our work proposes three low overhead fault tolerance approaches based on instruction duplication with zero latency detection, which uses a rollback mechanism to correct soft errors in the pipelanes of a configurable VLIW processor. The f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 41 publications
0
8
0
1
Order By: Relevance
“…Hardware-based approaches duplicate the instructions at runtime using specific hardware using the compiler's result. To do so, coupling of the VLIW pipelines is applied [4], [13]. When the duplicated instructions do not fit in the current bundle, an additional time slot is added.…”
Section: Related Workmentioning
confidence: 99%
“…Hardware-based approaches duplicate the instructions at runtime using specific hardware using the compiler's result. To do so, coupling of the VLIW pipelines is applied [4], [13]. When the duplicated instructions do not fit in the current bundle, an additional time slot is added.…”
Section: Related Workmentioning
confidence: 99%
“…First concepts involving coarse-grained lockstepping are promising [18]- [20], but do not address the specific challenges to FT in space [21]. FT using thread-level very-long-instruction word architectures [22], [23] has also been explored, though the approach still requires pipelinelevel voters in hardware. Most implement checkpoint & rollback or restart, which makes them unsuitable for spacecraft command & control applications [24], others ignore fault-detection [25], [26], or require external, infallible fault detection entities with deep knowledge about application-intrinsics [27] but no concept of how this could be obtained.…”
Section: Related Workmentioning
confidence: 99%
“…These solutions are noninvasive, flexible, and have been proven in GPGPUs [8], but can be very costly in terms of performance [9]. In [10], the authors developed fault-tolerance solutions for parallel processors by adjusting the instruction-level parallelism, increasing the reliability at the cost of workload performance. On the other hand, authors in [11] propose a reduced precision Duplication with Comparison (DWC) approach to increase the reliability in GPUs by replicating instructions and operating them in execution units at different precision, so obtaining redundancy at zero cost, but degrading performance and output precision.…”
Section: Introductionmentioning
confidence: 99%