2018
DOI: 10.1145/3158229
|View full text |Cite
|
Sign up to set email alerts
|

Continuous Online Self-Monitoring Introspection Circuitry for Timing Repair by Incremental Partial-Reconfiguration (COSMIC TRIP)

Abstract: We show that continuously monitoring on-chip delays at the LUT-to-LUT link level during operation allows a field-programmable gate array to detect and self-adapt to aging and environmental timing effects. Using a lightweight (<4% added area) mechanism for monitoring transition timing, a Difference Detector with First-Fail Latch, we can estimate the timing margin on circuits and identify the individual links that have degraded and whose delay is determining the worst-case circuit delay. Combined with Choose-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 41 publications
0
2
0
Order By: Relevance
“…In the active case, explicit and dynamic redundancy is used to detect and manage the fault effects by mechanisms like detection, localization, adaptation, retraining, or self-healing, which may lead to complexity [46]. For example, in the case of the DNN accelerator, this approach can be used for fault recovery by remapping, error detection, or reconfiguration [47]. Overall, not all failure scenarios can be considered at design time to provide passive or active arrangements.…”
Section: E Terms and Concepts Of Dependabilitymentioning
confidence: 99%
See 1 more Smart Citation
“…In the active case, explicit and dynamic redundancy is used to detect and manage the fault effects by mechanisms like detection, localization, adaptation, retraining, or self-healing, which may lead to complexity [46]. For example, in the case of the DNN accelerator, this approach can be used for fault recovery by remapping, error detection, or reconfiguration [47]. Overall, not all failure scenarios can be considered at design time to provide passive or active arrangements.…”
Section: E Terms and Concepts Of Dependabilitymentioning
confidence: 99%
“…4) TE forecasting (TEF) prevents TE occurrence by relying on runtime TEs prediction and monitoring like the Canary circuit [69] or dynamic timing analysis [110]. We have different runtime adaptation mechanisms considering TEF, such as model retraining [95], hardware reconfiguration [47], and DVFS adaption [114]. The Razor is a circuit-level mechanism for timing speculation based on the dynamic detection of critical path errors.…”
Section: ) Te Detection (Ted) Tries To Detect Errors In Runtime Usingmentioning
confidence: 99%