2019
DOI: 10.1007/s00778-019-00565-w
|View full text |Cite
|
Sign up to set email alerts
|

On the performance and convergence of distributed stream processing via approximate fault tolerance

Abstract: Fault tolerance is critical for distributed stream processing systems, yet achieving error-free fault tolerance often incurs substantial performance overhead. We present AF-Stream, a distributed stream processing system that addresses the trade-off between performance and accuracy in fault tolerance. AF-Stream builds on a notion called approximate fault tolerance, whose idea is to mitigate backup overhead by adaptively issuing backups, while ensuring that the errors upon failures are bounded with theoretical g… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(1 citation statement)
references
References 54 publications
0
1
0
Order By: Relevance
“…Approximate fault tolerance is another method used to improve fault-tolerance, which addresses the trade-off between accuracy and performance. 33,34 This approach checkpoints the state and the unprocessed data only when the number of errors due to failures is higher than the user-defined bound. Although approximate fault tolerance does not always guarantee accuracy, it gives better performance as the number of performed checkpoints are fewer compared to a system that does periodic checkpoints.…”
Section: Include Message Delay Between Operatorsmentioning
confidence: 99%
“…Approximate fault tolerance is another method used to improve fault-tolerance, which addresses the trade-off between accuracy and performance. 33,34 This approach checkpoints the state and the unprocessed data only when the number of errors due to failures is higher than the user-defined bound. Although approximate fault tolerance does not always guarantee accuracy, it gives better performance as the number of performed checkpoints are fewer compared to a system that does periodic checkpoints.…”
Section: Include Message Delay Between Operatorsmentioning
confidence: 99%