Optimal and Perfectly Parallel Algorithms for On-demand Data-Flow Analysis

Chatterjee, Krishnendu; Goharshady, Amir Kafshdar; Ibsen-Jensen, Rasmus; Pavlogiannis, Andreas

doi:10.1007/978-3-030-44914-8_5

“…Sub-cubic algorithms do exist, but they only offer logarithmic speedups [Chaudhuri 2008]. When the underlying graph is a Recursive State Machine (RSM) with constant entries and exits, treewidth has been shown to lead to fast on-demand reachability queries [Chatterjee et al 2020[Chatterjee et al , 2015. Despite the cubic hardness of the general problem, it is known to have sub-cubic certificates for both positive and negative instances [Chistikov et al 2021].…”

Section: Related Workmentioning

confidence: 99%

The decidability and complexity of interleaved bidirected Dyck reachability

Kjelstrøm

¹

,

Pavlogiannis

²

2022

Proc. ACM Program. Lang.

Self Cite

View full text Add to dashboard Cite

Dyck reachability is the standard formulation of a large domain of static analyses, as it achieves the sweet spot between precision and efficiency, and has thus been studied extensively. Interleaved Dyck reachability (denoted D k ⊙ D k ) uses two Dyck languages for increased precision (e.g., context and field sensitivity) but is well-known to be undecidable. As many static analyses yield a certain type of bidirected graphs, they give rise to interleaved bidirected Dyck reachability problems. Although these problems have seen numerous applications, their decidability and complexity has largely remained open. In a recent work, Li et al. made the first steps in this direction, showing that (i) D 1 ⊙ D 1 reachability (i.e., when both Dyck languages are over a single parenthesis and act as counters) is computable in O ( n 7 ) time, while (ii) D k ⊙ D k reachability is NP-hard. However, despite this recent progress, most natural questions about this intricate problem are open. In this work we address the decidability and complexity of all variants of interleaved bidirected Dyck reachability. First, we show that D 1 ⊙ D 1 reachability can be computed in O ( n 3 · α( n )) time, significantly improving over the existing O ( n 7 ) bound. Second, we show that D k ⊙ D 1 reachability (i.e., when one language acts as a counter) is decidable, in contrast to the non-bidirected case where decidability is open. We further consider D k ⊙ D 1 reachability where the counter remains linearly bounded. Our third result shows that this bounded variant can be solved in O ( n 2 · α( n )) time, while our fourth result shows that the problem has a (conditional) quadratic lower bound, and thus our upper bound is essentially optimal. Fifth, we show that full D k ⊙ D k reachability is undecidable. This improves the recent NP-hardness lower-bound, and shows that the problem is equivalent to the non-bidirected case. Our experiments on standard benchmarks show that the new algorithms are very fast in practice, offering many orders-of-magnitude speedups over previous methods.

show abstract

“…Sub-cubic algorithms do exist, but they only offer logarithmic speedups [Chaudhuri 2008]. When the underlying graph is a Recursive State Machine (RSM) with constant entries and exits, treewidth has been shown to lead to fast on-demand reachability queries [Chatterjee et al 2020[Chatterjee et al , 2015. Despite the cubic hardness of the general problem, it is known to have sub-cubic certificates for both positive and negative instances [Chistikov et al 2021].…”

Section: Related Workmentioning

confidence: 99%

The Decidability and Complexity of Interleaved Bidirected Dyck Reachability

Kjelstrøm¹,

Pavlogiannis²

2021

Preprint

Self Cite

0

View full text Add to dashboard Cite

Dyck reachability is the standard formulation of a large domain of static analyses, as it achieves the sweet spot between precision and efficiency, and has thus been studied extensively. Interleaved Dyck reachability (denoted D 𝑘 ⊙ D 𝑘 ) uses two Dyck languages for increased precision (e.g., context and field sensitivity) but is well-known to be undecidable. As many static analyses yield a certain type of bidirected graphs, they give rise to interleaved bidirected Dyck reachability problems. Although these problems have seen numerous applications, their decidability and complexity has largely remained open. In a recent work, Li et al. made the first steps in this direction, showing that (i) D 1 ⊙ D 1 reachability (i.e., when both Dyck languages are over a single parenthesis and act as counters) is computable in 𝑂 (𝑛 7 ) time, while (ii) D 𝑘 ⊙ D 𝑘 reachability is NP-hard. However, despite this recent progress, most natural questions about this intricate problem are open.In this work we address the decidability and complexity of all variants of interleaved bidirected Dyck reachability. First, we show that D 1 ⊙ D 1 reachability can be computed in 𝑂 (𝑛 3 • 𝛼 (𝑛)) time, significantly improving over the existing 𝑂 (𝑛 7 ) bound. Second, we show that D 𝑘 ⊙ D 1 reachability (i.e., when one language acts as a counter) is decidable, in contrast to the non-bidirected case where decidability is open. We further consider D 𝑘 ⊙ D 1 reachability where the counter remains linearly bounded. Our third result shows that this bounded variant can be solved in 𝑂 (𝑛 2 • 𝛼 (𝑛)) time, while our fourth result shows that the problem has a (conditional) quadratic lower bound, and thus our upper bound is essentially optimal. Fifth, we show that full D 𝑘 ⊙ D 𝑘 reachability is undecidable. This improves the recent NP-hardness lower-bound, and shows that the problem is equivalent to the non-bidirected case. Our experiments on standard benchmarks show that the new algorithms are very fast in practice, offering many orders-of-magnitude speedups over previous methods.CCS Concepts: • Software and its engineering → Software verification and validation; • Theory of computation → Theory and algorithms for application domains; Program analysis.

show abstract

“…Due to the expensiveness of exhaustive data-flow analysis, i.e. an analysis that considers every possible starting point, many works in the literature have turned their focus to on-demand analysis [45,22,6,77,81,82,34,67]. In this setting, the algorithm can first run a preprocessing phase in which it collects some information about the program and produces summaries that can be used to speedup the query phase.…”

Section: Introductionmentioning

confidence: 99%

“…It is also noteworthy that on-demand algorithms commonly use information found in previous queries to handle the current query more efficiently. On-demand analyses are especially important in just-in-time compilers and their speculative optimizations [22,28,53,7,37], in which having dynamic information about the current state of the program can dramatically decrease the overhead for the compiler. In addition, on-demand analyses have the following merits (quoted from [45,68]):…”

Section: Introductionmentioning

confidence: 99%

“…The work [22] provides a parameterized algorithm for a special case of the on-demand IFDS problem. The main idea in [22] is to observe that control-flow graphs of real-world programs are sparse and tree-like and that this sparsity can be exploited to find faster algorithms for same-context IFDS analysis. More specifically, the sparsity is formalized by a graph parameter called treewidth [71,70].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Interprocedural Data-Flow Analysis Using Treedepth and Treewidth

Goharshady

¹

,

Zaher

²

2023

Lecture Notes in Computer Science

Self Cite

3

0

View full text Add to dashboard Cite

We consider interprocedural data-flow analysis as formalized by the standard IFDS framework, which can express many widelyused static analyses such as reaching definitions, live variables, and nullpointer. We focus on the well-studied on-demand setting in which queries arrive one-by-one in a stream and each query should be answered as fast as possible. While the classical IFDS algorithm provides a polynomialtime solution for this problem, it is not scalable in practice. More specifically, it will either require a quadratic-time preprocessing phase or takes linear time per query, both of which are untenable for modern huge codebases with hundreds of thousands of lines. Previous works have already shown that parameterizing the problem by the treewidth of the program's control-flow graph is promising and can lead to significant gains in efficiency. Unfortunately, these results were only applicable to the limited special case of same-context queries. In this work, we obtain significant speedups for the general case of ondemand IFDS with queries that are not necessarily same-context. This is achieved by exploiting a new graph sparsity parameter, namely the treedepth of the program's call graph. Our approach is the first to exploit the sparsity of control-flow graphs and call graphs at the same time and parameterize by both the treewidth and the treedepth. We obtain an algorithm with a linear preprocessing phase that can answer each query in constant time wrt the size of the input. Finally, our experimental results demonstrate that our approach significantly outperforms the classical IFDS and its on-demand variant.

show abstract

Optimal and Perfectly Parallel Algorithms for On-demand Data-Flow Analysis

Cited by 7 publications

References 72 publications

The decidability and complexity of interleaved bidirected Dyck reachability

The decidability and complexity of interleaved bidirected Dyck reachability

The Decidability and Complexity of Interleaved Bidirected Dyck Reachability

Efficient Interprocedural Data-Flow Analysis Using Treedepth and Treewidth

Contact Info

Product

Resources

About