Packrat parsing is a linear-time implementation method of recursive descent parsers. The trick is a memoization mechanism, where all parsing results are memorized to avoid redundant parsing in cases of backtracking. An arising problem is extremely huge heap consumption in memoization, resulting in the fact that the cost of memoization is likely to outweigh its benefits. In many cases, developers need to make a difficult choice to abandon packrat parsing despite the possible exponential time parsing. Elastic packrat parsing is developed in order to avoid such a difficult choice. The heap consumption is upper-bounded since memorized results are stored on a sliding window buffer. In addition, the buffer capacity is adjusted by tracing each of nonterminal backtracking activities at runtime. Elastic packrat parsing is implemented in a part of our Nez parser. We demonstrate that the elastic packrat parsing achieves stable and robust performance against a variety of inputs with different backtracking activities.
Nez is a PEG(Parsing Expressing Grammar)-based open grammar language that allows us to describe complex syntax constructs without action code. Since open grammars are declarative and free from a host programming language of parsers, software engineering tools and other parser applications can reuse once-defined grammars across programming languages.A key challenge to achieve practical open grammars is the expressiveness of syntax constructs and the resulting parser performance, as the traditional action code approach has provided very pragmatic solutions to these two issues. In Nez, we extend the symbol-based state management to recognize context-sensitive language syntax, which often appears in major programming languages. In addition, the Abstract Syntax Tree constructor allows us to make flexible tree structures, including the left-associative pair of trees. Due to these extensions, we have demonstrated that Nez can parse not all but many grammars.Nez can generate various types of parsers since all Nez operations are independent of a specific parser language. To highlight this feature, we have implemented Nez with dynamic parsing, which allows users to integrate a Nez parser as a parser library that loads a grammar at runtime. To achieve its practical performance, Nez operators are assembled into low-level virtual machine instructions, including automated state modifications when backtracking, transactional controls of AST construction, and efficient memoization in packrat parsing. We demonstrate that Nez dynamic parsers achieve very competitive performance compared to existing efficient parser generators.
Abstract:We address a declarative construction of abstract syntax trees with Parsing Expression Grammars. AST operators (constructor, connector, and tagging) are newly defined to specify flexible AST constructions. A new challenge coming with PEGs is the consistency management of ASTs in backtracking and packrat parsing. We make the transaction AST machine in order to perform AST operations in the context of the speculative parsing of PEGs. All the consistency control is automated by the analysis of AST operators. The proposed approach is implemented in the Nez parser, written in Java. The performance study shows that the transactional AST machine requires 25% approximately more time in CSV, XML, and C grammars.
Parsing Expression Grammars are a popular foundation for describing syntax. Unfortunately, several syntax of programming languages are still hard to recognize with pure PEGs. Notorious cases appears: typedef-defined names in C/C++, indentation-based code layout in Python, and HERE document in many scripting languages. To recognize such PEG-hard syntax, we have addressed a declarative extension to PEGs. The "declarative" extension means no programmed semantic actions, which are traditionally used to realize the extended parsing behavior. Nez is our extended PEG language, including symbol tables and conditional parsing. This paper demonstrates that the use of Nez Extensions can realize many practical programming languages, such as C, C#, Ruby, and Python, which involve PEG-hard syntax.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.