A model is presented for converting a collection of documents to hypertext by means of indexing. The documents are assumed to be semistructured, i.e., their text is a hierarchy of parts, and some of the parts consist of natural language. The model is intended as a framework for specifying hypertextual reading capabilities for specific application areas and for developing new automated tools for the conversion of semistructured text to hypertext. In the model, two well-known paradigms—formal grammars and document indexing—are combined.
The structure of the source text is defined by a schema that is a constrained context-free grammar. The hierarchic structure of the source may thus be modeled by a parse tree for the grammar. The effect of indexing is described by grammar transformations. The new grammar, called an indexing schema, is associated with a new parse tree where some text parts are index elements. The indexing schema may hide some parts of the original documents or the structure of some parts. For information retrieval, parts of the indexed text are considered to be nodes of a hypergraph. In the hypergraph-based information access, the navigation capabilities of the hypertext systems are combined with the querying capabilities of information retrieval systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.