Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Keywords: text processing, structured text, pattern matching, regular expressions, grammars, web automation, repetitive text editing, machine learning, programming-by-demonstration, simultaneous editing, outlier finding, LAPIS
AbstractPattern matching is heavily used for searching, filtering, and transforming text, but existing pattern languages offer few opportunities for reuse. Lightweight structure is a new approach that solves the reuse problem. Lightweight structure has three parts: a model of text structure as contiguous segments of text, or regions; an extensible library of structure abstractions (e.g., HTML elements, Java expressions, or English sentences) that can be implemented by any kind of pattern or parser; and a region algebra for composing and reusing structure abstractions. Lightweight structure does for text pattern matching what procedure abstraction does for programming, enabling construction of a reusable library.Lightweight structure has been implemented in LAPIS, a web browser/text editor that demonstrates several novel techniques:• Text constraints is a new pattern language for composing structure abstractions, based on the region algebra. Text constraint patterns are simple and high-level, and user studies have shown that users can generate and comprehend them.