Abstract. Operator precedence languages were introduced half a century ago by Robert Floyd to support deterministic and efficient parsing of context-free languages. Recently, we renewed our interest in this class of languages thanks to a few distinguishing properties that make them attractive for exploiting various modern technologies. Precisely, their local parsability enables parallel and incremental parsing, whereas their closure properties make them amenable for automatic verification techniques, including model checking. In this paper we provide a fairly complete theory of this class of languages: we introduce a class of automata with the same recognizing power as the generative power of their grammars; we provide a characterization of their sentences in terms of monadic second order logic as it has been done in previous literature for more restricted language classes such as regular, parenthesis, and input-driven ones; we investigate preserved and lost properties when extending the language sentences from finite length to infinite length (ω-languages). As a result, we obtain a class of languages that enjoys many nice properties of regular languages (closure and decidability properties, logic characterization) but is considerably larger than other families -typically parenthesis and input-driven ones-with the same properties, covering "almost" all deterministic languages. 1 Key words. Operator Precedence, Visibly Pushdown Languages, Monadic Second Order Logic, Omegalanguages.AMS subject classifications. 03D05, 68Q45.Introduction. Operator precedence grammars and languages (OPGs and OPLs) certainly deserve an important place in the history of formal languages and compilers. They were invented by Robert Floyd [23] with the major motivation of enabling efficient, deterministic parsing of programming languages. In fact Floyd's intuition was inspired by arithmetic expressions whose structure is determined either by explicit parentheses or by the conventional, "hidden" precedence of multiplicative operators over additive ones. By generalizing this observation Floyd defined three basic relations between terminal symbols, namely yields and takes precedence and equal in precedence (respectively denoted by symbols ⋖, ⋗,=), in such a way that the right hand side (r.h.s.) of an operator precedence grammar rule is enclosed within a pair ⋖, ⋗, and= holds between consecutive terminal symbols thereof (in OPGs nonterminal symbols are "transparent", i.e., irrelevant, w.r.t. the precedence relations [23]).Subsequently, under the main motivation of grammar inference, it was shown that, once an operator precedence matrix (OPM) is given such that at most one relation holds between any two terminal characters, the family of OPLs sharing the given OPM is a Boolean algebra [19]. This result somewhat generalizes closure properties enjoyed by regular languages and by context-free languages whose structure, i.e., the syntax tree, is immediately visible in the terminal sentences, such as parenthesis languages [31] and tree-automata lan...
The property of local parsability allows to parse inputs through inspecting only a bounded-length string around the current token. This in turn enables the construction of a scalable, data-parallel parsing algorithm, which is presented in this work. Such an algorithm is easily amenable to be automatically generated via a parser generator tool, which was realized, and is also presented in the following. Furthermore, to complete the framework of a parallel input analysis, a parallel scanner can also combined with the parser. To prove the practicality of a parallel lexing and parsing approach, we report the results of the adaptation of JSON and Lua to a form fit for parallel parsing (i.e. an operator-precedence grammar) through simple grammar changes and scanning transformations. The approach is validated with performance figures from both high performance and embedded multicore platforms, obtained analyzing real-world inputs as a test-bench. The results show that our approach matches or dominates the performances of production-grade LR parsers in sequential execution, and achieves significant speedups and good scaling on multicore machines. The work is concluded by a broad and critical survey of the past work on parallel parsing and future directions on the integration with semantic analysis and incremental parsing.
Abstract. The increasing use of multicore processors has deeply transformed computing paradigms and applications. The wide availability of multicore systems had an impact also in the field of compiler technology, although the research on deterministic parsing did not prove to be effective in exploiting the architectural advantages, the main impediment being the inherent sequential nature of traditional LL and LR algorithms. We present PAPAGENO, an automated parser generator relying on operator precedence grammars. We complemented the PAPAGENO-generated parallel parsers with parallel lexing techniques, obtaining near-linear speedups on multicore machines, and the same speed as Bison parsers on sequential execution. Keywords: Parser generation, Parallel Parsing, Operator Precedence Grammars IntroductionParsing, or syntactic analysis, plays a fundamental role in a wide variety of computing applications, from compilation to browsing of structured and semi-structured data, from natural language processing to genomics, etc. In the last years all these fields have experienced increasingly demanding requirements in terms of time and energy consumption or size of the data sets to be processed, which urged for new effective parsing solutions. Some attempts have been made to devise new parsing algorithms, or obtain relevant speedups from the classical deterministic ones, by leveraging on the computing capability offered by modern multiprocessor architectures, but they had almost no success except for a few overly specific cases (as e.g. for ad-hoc parsers for XML and HTML).The classical parsing algorithms used for deterministic context-free (DCF) languages as LR and LL, in fact, can be efficiently implemented (in linear-time) on serial machines, but they do not speedup on multicore architectures because of their inherent left-to-right sequential nature: if an input string is split into several parts, handled by different processors, the parsing actions may require communication among the different processing nodes, with considerable additional overhead. Although this work is no place for a comprehensive survey, we point out the works of Mickunas and Schell [1] and the more recent ones of [2] as an example of such issues.Recently we focused on a subclass of DCF the Operator precedence languages (OPLs), and their grammars (Operator precedence grammars, OPGs) which have been defined by Robert Floyd a few decades ago [3] and represent a precursor of LR languages. OPLs have some limits in terms of expressive power and they had been soon overtaken by parsing techniques based on the more expressive LR family: still, OPGs are adequate for many common programming languages [4]. The remarkable -and until now unnoticed -aspect of OPLs, is that differently from the larger class of DCF languages they enjoy a property of local parsability, which makes them suitable for efficient parallel parsing. Local parsability means that parsing of any substring of a string according to an OPG depends only on information that can be obtained from a ...
Abstract. Operator Precedence Grammars (OPGs) define a deterministic class of context-free languages, which extend input-driven languages and still enjoy many properties: they are closed w.r.t. Boolean operations, concatenation and Kleene star; the emptiness problem is decidable; they are recognized by a suitable model of pushdown automaton; they can be characterized in terms of a monadic second-order logic. Also, they admit efficient parallel parsing. In this paper we introduce a subclass of OPGs, namely Free Grammars (FrGs); we prove some of its basic properties, and that, for each such grammar G, a firstorder logic formula ψ can effectively be built so that L(G) is the set of all and only strings satisfying ψ. FrGs were originally introduced for grammatical inference of programming languages. Our result can naturally boost their applicability; to this end, a tool is made freely available for the semiautomatic construction of FrGs.
Abstract. Recent literature extended the analysis of ω-languages from the regular ones to various classes of languages with "visible syntax structure", such as visibly pushdown languages (VPLs). Operator precedence languages (OPLs), instead, were originally defined to support deterministic parsing and exhibit interesting relations with these classes of languages: OPLs strictly include VPLs, enjoy all relevant closure properties and have been characterized by a suitable automata family and a logic notation. We introduce here operator precedence ω-languages (ωOPLs), investigating various acceptance criteria and their closure properties. Whereas some properties are natural extensions of those holding for regular languages, others require novel investigation techniques. Applicationoriented examples show the gain in expressiveness and verifiability offered by ωOPLs w.r.t. smaller classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.