Exposing numerical bugs in deep learning via gradient back-propagation

Yan, Ming; Chen, Junjie; Zhang, Xiangyu; Tan, Lin; Gan, Wang; Wang, Zan

doi:10.1145/3468264.3468612

Cited by 30 publications

(13 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Training stage tends to be time-consuming due to heavy numerical computation based on a large amount of training data. As presented in the existing study [51], the typical training time ranges from a few minutes to several days. Hence, the bugs observed at this stage, especially those Incorrect Functionality bugs (account for 30.7% of bugs observed at this stage), may be manifested after hours or even days into the training process.…”

Section: Symptom Distributionmentioning

confidence: 99%

“…A DL system typically involves three levels [51]: the production level (i.e., DL models), program level (i.e., DL programs used for training DL models), and framework level (i.e., DL frameworks used by developers for implementing DL programs). Bugs in any level could affect the overall quality of the DL system.…”

Section: Introductionmentioning

confidence: 99%

“…Thus, it is necessary to ensure DL systems' quality at all these levels. Over the years, a lot of researches focus on the production level by designing various DL model testing metrics [28,32,34] or proposing various adversarial input generation methods [17,29], as well as the program level by studying the characteristics of DL program bugs [23,24,58] or designing bug detection and diagnosis methods [49,52,57]. However, there is little attention on the framework level.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Toward Understanding Deep Learning Framework Bugs

Chen¹,

Liang²,

Shen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

DL frameworks are the basis of constructing all DL programs and models, and thus their bugs could lead to the unexpected behaviors of any DL program or model relying on them. Such wide effect demonstrates the necessity and importance of guaranteeing DL frameworks' quality. Understanding the characteristics of DL framework bugs is a fundamental step for this quality assurance task, facilitating to design effective bug detection and debugging approaches. Hence, in this work we conduct the most large-scale study on 800 bugs from four popular and diverse DL frameworks (i.e., TensorFlow, PyTorch, MXNet, and DL4J). By analyzing the root causes and symptoms of DL framework bugs associated with 5 components decomposed from DL frameworks, as well as measuring test coverage achieved by three state-of-the-art testing techniques and developers' efforts on fixing those bugs, we obtain 14 major findings for the comprehensive understanding of DL framework bugs and the current status of existing DL framework testing and debugging practice, and then provide a series of actionable guidelines for better DL framework bug detection and debugging.

show abstract

Section: Symptom Distributionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Toward Understanding Deep Learning Framework Bugs

Chen¹,

Liang²,

Shen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…For example, when generating 20-operator graphs, FP exceptional values occur in 56.8% of generated graphs if we use random weight and inputs. This is because some operators, which we refer to as vulnerable operators [63], produce real (e.g., √ 𝑥 returns 𝑁 𝑎𝑁 if 𝑥 < 0) or stable (e.g., 𝑥 𝑦 returns 𝐼𝑛𝑓 for large 𝑥 and 𝑦) results only for a subset of their input domain. If a vulnerable operator's input lies outside of this domain, the operator outputs an FP exceptional value, which propagates through the model and impacts the model's output, preventing us from comparing model outputs during differential testing.…”

Section: Improving Numeric Validity With Gradientsmentioning

confidence: 99%

Finding Deep-Learning Compilation Bugs with NNSmith

Liu¹,

Lin²,

Ruffy³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep-learning (DL) compilers such as TVM and TensorRT are increasingly used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can produce optimized models whose semantics differ from the original models, and produce incorrect results impacting the correctness of down stream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach uses (i) light-weight operator specifications to generate diverse yet valid DNN models allowing us to exercise a large part of the compiler's transformation logic; (ii) a gradient-based search process for finding model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) differential testing to identify bugs. We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM, TensorRT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by project maintainers.

show abstract

“…Software testing is one of the most important parts of the entire cycle of software development and is vital to guarantee the quality of software [20,47,60,61]. As a kind of important software, similar to the quality assurance of other software, compiler testing is also one of the most widely-used ways of guaranteeing the quality of compilers and has received extensive attention from both practitioners and researchers in the area of software engineering [8, 19, 22ś24, 36, 56].…”

Section: Introductionmentioning

confidence: 99%

Boosting Compiler Testing via Compiler Optimization Exploration

Chen

Suo

2022

ACM Trans. Softw. Eng. Methodol.

Self Cite

View full text Add to dashboard Cite

Compilers are a kind of important software, and similar to the quality assurance of other software, compiler testing is one of the most widely-used ways of guaranteeing their quality. Compiler bugs tend to occur in compiler optimizations. Detecting optimization bugs needs to consider two main factors: 1) the optimization flags controlling the accessability of the compiler buggy code should be turned on; and 2) the test program should be able to trigger the buggy code. However, existing compiler testing approaches only consider the latter to generate effective test programs, but just run them under several pre-defined optimization levels (e.g., -O0 , -O1 , -O2 , -O3 , -Os in GCC). To better understand the influence of compiler optimizations on compiler testing, we conduct the first empirical study, and find that 1) all the bugs detected under the widely-used optimization levels are also detected under the explored optimization settings (we call a combination of optimization flags turned on for compilation an optimization setting ), while 83.54% of bugs are only detected under the latter; 2) there exist both inhibition effect and promotion effect among optimization flags for compiler testing, indicating the necessity and challenges of considering the factor of compiler optimizations in compiler testing. We then propose the first approach, called COTest , by considering both factors to test compilers. Specifically, COTest first adopts machine learning (the XGBoost algorithm) to model the relationship between test programs and optimization settings, to predict the bug-triggering probability of a test program under an optimization setting. Then, it designs a diversity augmentation strategy to select a set of diverse candidate optimization settings for prediction for a test program. Finally, Top-K optimization settings are selected for compiler testing according to the predicted bug-triggering probabilities. The experiments on GCC and LLVM demonstrate its effectiveness, especially COTest detects 17 previously unknown bugs, 11 of which have been fixed or confirmed by developers.

show abstract

Exposing numerical bugs in deep learning via gradient back-propagation

Cited by 30 publications

References 36 publications

Toward Understanding Deep Learning Framework Bugs

Toward Understanding Deep Learning Framework Bugs

Finding Deep-Learning Compilation Bugs with NNSmith

Boosting Compiler Testing via Compiler Optimization Exploration

Contact Info

Product

Resources

About