Abstract:Interprocedural data-flow analyses form an expressive and useful paradigm of numerous static analysis applications, such as live variables analysis, alias analysis and null pointers analysis. The most widely-used framework for interprocedural data-flow analysis is IFDS, which encompasses distributive data-flow functions over a finite domain. On-demand data-flow analyses restrict the focus of the analysis on specific program locations and data facts. This setting provides a natural split between (i) an offline … Show more
“…Sub-cubic algorithms do exist, but they only offer logarithmic speedups [Chaudhuri 2008]. When the underlying graph is a Recursive State Machine (RSM) with constant entries and exits, treewidth has been shown to lead to fast on-demand reachability queries [Chatterjee et al 2020[Chatterjee et al , 2015. Despite the cubic hardness of the general problem, it is known to have sub-cubic certificates for both positive and negative instances [Chistikov et al 2021].…”
Dyck reachability is the standard formulation of a large domain of static analyses, as it achieves the sweet spot between precision and efficiency, and has thus been studied extensively. Interleaved Dyck reachability (denoted
D
k
⊙
D
k
) uses two Dyck languages for increased precision (e.g., context and field sensitivity) but is well-known to be undecidable. As many static analyses yield a certain type of bidirected graphs, they give rise to interleaved bidirected Dyck reachability problems. Although these problems have seen numerous applications, their decidability and complexity has largely remained open. In a recent work, Li et al. made the first steps in this direction, showing that (i)
D
1
⊙
D
1
reachability (i.e., when both Dyck languages are over a single parenthesis and act as counters) is computable in
O
(
n
7
) time, while (ii)
D
k
⊙
D
k
reachability is NP-hard. However, despite this recent progress, most natural questions about this intricate problem are open.
In this work we address the decidability and complexity of all variants of interleaved bidirected Dyck reachability. First, we show that
D
1
⊙
D
1
reachability can be computed in
O
(
n
3
· α(
n
)) time, significantly improving over the existing
O
(
n
7
) bound. Second, we show that
D
k
⊙
D
1
reachability (i.e., when one language acts as a counter) is decidable, in contrast to the non-bidirected case where decidability is open. We further consider
D
k
⊙
D
1
reachability where the counter remains linearly bounded. Our third result shows that this bounded variant can be solved in
O
(
n
2
· α(
n
)) time, while our fourth result shows that the problem has a (conditional) quadratic lower bound, and thus our upper bound is essentially optimal. Fifth, we show that full
D
k
⊙
D
k
reachability is undecidable. This improves the recent NP-hardness lower-bound, and shows that the problem is equivalent to the non-bidirected case. Our experiments on standard benchmarks show that the new algorithms are very fast in practice, offering many orders-of-magnitude speedups over previous methods.
“…Sub-cubic algorithms do exist, but they only offer logarithmic speedups [Chaudhuri 2008]. When the underlying graph is a Recursive State Machine (RSM) with constant entries and exits, treewidth has been shown to lead to fast on-demand reachability queries [Chatterjee et al 2020[Chatterjee et al , 2015. Despite the cubic hardness of the general problem, it is known to have sub-cubic certificates for both positive and negative instances [Chistikov et al 2021].…”
Dyck reachability is the standard formulation of a large domain of static analyses, as it achieves the sweet spot between precision and efficiency, and has thus been studied extensively. Interleaved Dyck reachability (denoted
D
k
⊙
D
k
) uses two Dyck languages for increased precision (e.g., context and field sensitivity) but is well-known to be undecidable. As many static analyses yield a certain type of bidirected graphs, they give rise to interleaved bidirected Dyck reachability problems. Although these problems have seen numerous applications, their decidability and complexity has largely remained open. In a recent work, Li et al. made the first steps in this direction, showing that (i)
D
1
⊙
D
1
reachability (i.e., when both Dyck languages are over a single parenthesis and act as counters) is computable in
O
(
n
7
) time, while (ii)
D
k
⊙
D
k
reachability is NP-hard. However, despite this recent progress, most natural questions about this intricate problem are open.
In this work we address the decidability and complexity of all variants of interleaved bidirected Dyck reachability. First, we show that
D
1
⊙
D
1
reachability can be computed in
O
(
n
3
· α(
n
)) time, significantly improving over the existing
O
(
n
7
) bound. Second, we show that
D
k
⊙
D
1
reachability (i.e., when one language acts as a counter) is decidable, in contrast to the non-bidirected case where decidability is open. We further consider
D
k
⊙
D
1
reachability where the counter remains linearly bounded. Our third result shows that this bounded variant can be solved in
O
(
n
2
· α(
n
)) time, while our fourth result shows that the problem has a (conditional) quadratic lower bound, and thus our upper bound is essentially optimal. Fifth, we show that full
D
k
⊙
D
k
reachability is undecidable. This improves the recent NP-hardness lower-bound, and shows that the problem is equivalent to the non-bidirected case. Our experiments on standard benchmarks show that the new algorithms are very fast in practice, offering many orders-of-magnitude speedups over previous methods.
“…Sub-cubic algorithms do exist, but they only offer logarithmic speedups [Chaudhuri 2008]. When the underlying graph is a Recursive State Machine (RSM) with constant entries and exits, treewidth has been shown to lead to fast on-demand reachability queries [Chatterjee et al 2020[Chatterjee et al , 2015. Despite the cubic hardness of the general problem, it is known to have sub-cubic certificates for both positive and negative instances [Chistikov et al 2021].…”
Dyck reachability is the standard formulation of a large domain of static analyses, as it achieves the sweet spot between precision and efficiency, and has thus been studied extensively. Interleaved Dyck reachability (denoted D 𝑘 ⊙ D 𝑘 ) uses two Dyck languages for increased precision (e.g., context and field sensitivity) but is well-known to be undecidable. As many static analyses yield a certain type of bidirected graphs, they give rise to interleaved bidirected Dyck reachability problems. Although these problems have seen numerous applications, their decidability and complexity has largely remained open. In a recent work, Li et al. made the first steps in this direction, showing that (i) D 1 ⊙ D 1 reachability (i.e., when both Dyck languages are over a single parenthesis and act as counters) is computable in 𝑂 (𝑛 7 ) time, while (ii) D 𝑘 ⊙ D 𝑘 reachability is NP-hard. However, despite this recent progress, most natural questions about this intricate problem are open.In this work we address the decidability and complexity of all variants of interleaved bidirected Dyck reachability. First, we show that D 1 ⊙ D 1 reachability can be computed in 𝑂 (𝑛 3 • 𝛼 (𝑛)) time, significantly improving over the existing 𝑂 (𝑛 7 ) bound. Second, we show that D 𝑘 ⊙ D 1 reachability (i.e., when one language acts as a counter) is decidable, in contrast to the non-bidirected case where decidability is open. We further consider D 𝑘 ⊙ D 1 reachability where the counter remains linearly bounded. Our third result shows that this bounded variant can be solved in 𝑂 (𝑛 2 • 𝛼 (𝑛)) time, while our fourth result shows that the problem has a (conditional) quadratic lower bound, and thus our upper bound is essentially optimal. Fifth, we show that full D 𝑘 ⊙ D 𝑘 reachability is undecidable. This improves the recent NP-hardness lower-bound, and shows that the problem is equivalent to the non-bidirected case. Our experiments on standard benchmarks show that the new algorithms are very fast in practice, offering many orders-of-magnitude speedups over previous methods.CCS Concepts: • Software and its engineering → Software verification and validation; • Theory of computation → Theory and algorithms for application domains; Program analysis.
“…Due to the expensiveness of exhaustive data-flow analysis, i.e. an analysis that considers every possible starting point, many works in the literature have turned their focus to on-demand analysis [45,22,6,77,81,82,34,67]. In this setting, the algorithm can first run a preprocessing phase in which it collects some information about the program and produces summaries that can be used to speedup the query phase.…”
Section: Introductionmentioning
confidence: 99%
“…It is also noteworthy that on-demand algorithms commonly use information found in previous queries to handle the current query more efficiently. On-demand analyses are especially important in just-in-time compilers and their speculative optimizations [22,28,53,7,37], in which having dynamic information about the current state of the program can dramatically decrease the overhead for the compiler. In addition, on-demand analyses have the following merits (quoted from [45,68]):…”
Section: Introductionmentioning
confidence: 99%
“…The work [22] provides a parameterized algorithm for a special case of the on-demand IFDS problem. The main idea in [22] is to observe that control-flow graphs of real-world programs are sparse and tree-like and that this sparsity can be exploited to find faster algorithms for same-context IFDS analysis. More specifically, the sparsity is formalized by a graph parameter called treewidth [71,70].…”
We consider interprocedural data-flow analysis as formalized by the standard IFDS framework, which can express many widelyused static analyses such as reaching definitions, live variables, and nullpointer. We focus on the well-studied on-demand setting in which queries arrive one-by-one in a stream and each query should be answered as fast as possible. While the classical IFDS algorithm provides a polynomialtime solution for this problem, it is not scalable in practice. More specifically, it will either require a quadratic-time preprocessing phase or takes linear time per query, both of which are untenable for modern huge codebases with hundreds of thousands of lines. Previous works have already shown that parameterizing the problem by the treewidth of the program's control-flow graph is promising and can lead to significant gains in efficiency. Unfortunately, these results were only applicable to the limited special case of same-context queries. In this work, we obtain significant speedups for the general case of ondemand IFDS with queries that are not necessarily same-context. This is achieved by exploiting a new graph sparsity parameter, namely the treedepth of the program's call graph. Our approach is the first to exploit the sparsity of control-flow graphs and call graphs at the same time and parameterize by both the treewidth and the treedepth. We obtain an algorithm with a linear preprocessing phase that can answer each query in constant time wrt the size of the input. Finally, our experimental results demonstrate that our approach significantly outperforms the classical IFDS and its on-demand variant.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.