Adaptive Pattern Matching on Binary Data

Gustafsson, Per E.; Sagonas, Konstantinos

doi:10.1007/978-3-540-24725-8_10

Cited by 6 publications

(4 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many of its commands are directly inspired by the design of domain-specific languages and language extensions such as PADS [10,11,21], DATASCRIPT [4], PACKETTYPES [23], Demeter [20], BINPAC [27] and Erlang binaries [32,15].…”

Section: Related Workmentioning

confidence: 99%

A context-free markup language for semi-structured text

XiQian

WalkerDavid

2010

SIGPLAN Not.

View full text Add to dashboard Cite

An ad hoc data format is any non-standard, semi-structured data format for which robust data processing tools are not available. In this paper, we present ANNE, a new kind of mark-up language designed to help users generate documentation and data processing tools for ad hoc text data. More specifically, given a new ad hoc data source, an ANNE programmer will edit the document to add a number of simple annotations, which serve to specify its syntactic structure. Annotations include elements that specify constants, optional data, alternatives, enumerations, sequences, tabular data, and recursive patterns. The ANNE system uses a combination of user annotations and the raw data itself to extract a context-free grammar from the document. This context-free grammar can then be used to parse the data and transform it into an XML parse tree, which may be viewed through a browser for analysis or debugging purposes. In addition, the ANNE system will generate a PADS/ML description [21], which may be saved as lasting documentation of the data format or compiled into a host of useful data processing tools ranging from parsers, printers and traversal librari es to format translators and query engines. Overall, ANNE simplifies the process of generating descriptions for data formats and improves the productivity of programmers who work with ad hoc data regularly.In addition to designing and implementing ANNE, we have devised a semantic theory for the core elements of the language. This semantic theory describes the editing process, which translates a raw, unannotated text document into an annotated document, and the grammar extraction process, which generates a context-free grammar from an annotated document. We also present an alternative characterization of system behavior by drawing upon ideas from the field of relevance logic. This secondary characterization, which we call relevance analysis, specifies a direct relationship between unannotated documents and the context-free grammars that our system can generate from them. Relevance analysis allows us to prove a number of important theorems concerning the expressiveness and utility of our system.

show abstract

Section: Related Workmentioning

confidence: 99%

A context-free markup language for semi-structured text

XiQian

WalkerDavid

2010

SIGPLAN Not.

View full text Add to dashboard Cite

show abstract

“…DATASCRIPT has been used to manipulate Java jar files and ELF object files. The developers of Erlang have also introduced language extensions that they refer to as binaries [Wikström and Rogvall 1999;Gustafsson and Sagonas 2004] to aid in packet processing and protocol programming. Finally, we are part of a group developing PADS, another system for managing ad hoc data.…”

Section: Promising Solutionsmentioning

confidence: 99%

The next 700 data description languages

FisherKathleen¹,

MandelbaumYitzhak²,

WalkerDavid³

2006

SIGPLAN Not.

View full text Add to dashboard Cite

XML. HTML. CSV. JPEG. MPEG. These data formats represent vast quantities of industrial, governmental, scientific, and private data. Because they have been standardized and are widely used, many reliable, efficient, and convenient tools for processing data in these formats are readily available. For instance, your favorite programming language undoubtedly has libraries for parsing XML and HTML as well as reading and transforming images in JPEG or movies in MPEG. Query engines are available for querying XML documents. Widely-used applications like Microsoft Word and Excel automatically translate documents between HTML and other standard formats. In short, life is good when working with standard data formats. In an ideal world, all data would be in such formats. In reality, however, we are not nearly so fortunate. An ad hoc data format is any non-standard data format. Typically, such formats do not have parsing, querying, analysis, or transformation tools readily available. Every day, network administrators, financial analysts, computer scientists, biologists, chemists, astronomers, and physicists deal with ad hoc data in a myriad of complex formats. Figure 1 gives a partial sense of the range and pervasiveness of such data. Since off-the-shelf tools for processing these ad hoc data formats do not exist or are not readily available, talented scientists, data analysts, and programmers must waste their time on low-level chores like parsing and format translation to extract the valuable information they need from their data.

show abstract

“…Some of the different forms of type specifiers are shown in Table 1 together with a brief description of their use; they are explained in detail below. The specifiers for signedness and endianess are not described in this paper, but a description of these specifiers can be found in [1]. If all type specifiers are used, the syntax of each segment expression is:…”

Section: Segments Each Segment Expression Has the General Syntaxmentioning

confidence: 99%

“…[{X,Y} || X <- [1,2,3], Y <- [4,5], is odd(X)] produces the list of pairs: [{1,4},{1,5},{3,4},{3,5}]. There is nothing wrong with multiple generators, but our experience is that they are rarely used in practice.…”

Section: Extended Comprehensions With Multiple Generatorsmentioning

confidence: 99%

Bit-level binaries and generalized comprehensions in Erlang

Gustafsson

Sagonas

2005

Proceedings of the 2005 ACM SIGPLAN Workshop on Erlang

Self Cite

View full text Add to dashboard Cite

Binary (i.e., bit stream) data are omnipresent in computer and network applications but most functional programming languages currently do not provide sufficient support for them. Erlang is an exception since it does support direct manipulation of binary data, albeit currently restricted to byte streams, not bit streams. To ameliorate the situation, we extend Erlang's built-in binary datatype so that it becomes flexible enough to handle bit streams properly. To further simplify programming on bit streams we then show how binary comprehensions can be introduced in the language and how binary and list comprehensions can be extended to allow both binary and list generators.

show abstract

Adaptive Pattern Matching on Binary Data

Cited by 6 publications

References 11 publications

A context-free markup language for semi-structured text

A context-free markup language for semi-structured text

The next 700 data description languages

Bit-level binaries and generalized comprehensions in Erlang

Contact Info

Product

Resources

About