Dynamically typed languages such as JavaScript and Python have have emerged as the most popular programming languages in use today. However, when possible to do so, there are also important benefits that accrue from including static type annotations in dynamically typed programs, e.g., improved documentation, improved static analysis of program errors, and improved code optimization. This approach to gradual typing is exemplified by the TypeScript programming system which allows programmers to specify partially typed programs, and then uses static analysis to infer as many remaining types as possible. However, in general, static type inference is unable to infer all types in a program; and, in practice, the effectiveness of static type inference depends on the complexity of the program's structure and the initial types specified by the programmer. As a result, there is a strong motivation for new approaches that can advance the state of the art in statically predicting types in dynamically typed programs, and that do so with acceptable performance for use in interactive programming environments.Previous work has demonstrated the promise of probabilistic type analysis techniques that use deep learning methods such as recurrent neural networks and graph neural networks (GNNs) to predict types for variable declarations and occurrences. In this paper, we advance past work by introducing a range of graph-based deep learning models that operate on a novel type flow graph (TFG) representation. The TFG represents an input program's elements as graph nodes connected with syntax edges and over-approximated data flow edges, and our GNN models are trained to predict the type labels in the TFG for a given input program.We study different design choices for our GNN-based type inference system for the 100 most common types in our evaluation corpus, and show that our best GNN configuration for accuracy (R-GNN NS-CTX ) achieves a top-1 accuracy of 87.76%. This outperforms the two most closely related deep learning type inference approaches from past work -DeepTyper with a top-1 accuracy of 84.62% and LambdaNet with a top-1 accuracy of 79.45%. Alternatively, we can state the error (100% -accuracy) for R-GNN NS-CTX is 0.80× that of DeepTyper and 0.60× that of LambdaNet. Further, the average inference throughput of R-GNN NS-CTX is 353.8 files/second, compared to 186.7 files/second for DeepTyper and 1,050.3 files/second for LambdaNet. If inference throughput is a higher priority, then the recommended model to use from our approach is the next best GNN configuration from the perspective of accuracy (R-GNN NS ) which achieved a top-1 accuracy of 86.89% and an average inference throughput of 1,303.9 files/second. In summary, our work introduces advances in graph-based deep learning that yield superior accuracy and performance to past work on probabilistic type analysis, while also providing a range of GNN models that could be applicable in the future to other graph structures used in program analysis beyond the TFG.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.