This paper presents a theory of the syntactic aspects of human sentence production. An important characteristic of unprepared speech is that overt pronunciation of a sentence can be initiated before the speaker has completely worked out the meaning content he or she is going to express in that sentence. Apparently, the speaker is able to build up a syntactically coherent utterance out of a series of syntactic fragments each rendering a new part of the meaning content. This incremental, left‐to‐right mode of sentence production is the central capability of the proposed Incremental Procedural Grammar (IPG). Certain other properties of spontaneous speech, as derivable from speech errors, hesitations, self‐repairs, and language pathology, are accounted for as well. The psychological plausibility thus gained by the grammar appears compatible with a satisfactory level of linguistic plausibility in that sentences receive structural descriptions which are in line with current theories of grammar. More importantly, an explanation for the existence of configurational conditions on transformations and other linguistics rules is proposed. The basic design feature of IPG which gives rise to these psychologically and linguistically desirable properties, is the “Procedures + Stack” concept. Sentences are built not by a central constructing agency which overlooks the whole process but by a team of syntactic procedures (modules) which work—in parallel—on small parts of the sentence, have only a limited overview, and whose sole communication channel is a stack. IPG covers object complement constructions, interrogatives, and word order in main and subordinate clauses. It handles unbounded dependencies, crossserial dependencies and coordination phenomena such as gapping and conjunction reduction. It is also capable of generating self‐repairs and elliptical answers to questions. IPG has been implemented as an incremental Dutch sentence generator written in LISP.
When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
In most intent recognition studies, annotations of query intent are created post hoc by external assessors who are not the searchers themselves. It is important for the field to get a better understanding of the quality of this process as an approximation for determining the searcher's actual intent. Some studies have investigated the reliability of the query intent annotation process by measuring the interassessor agreement. However, these studies did not measure the validity of the judgments, that is, to what extent the annotations match the searcher's actual intent. In this study, we asked both the searchers themselves and external assessors to classify queries using the same intent classification scheme. We show that of the seven dimensions in our intent classification scheme, four can reliably be used for query annotation. Of these four, only the annotations on the topic and spatial sensitivity dimension are valid when compared with the searcher's annotations. The difference between the interassessor agreement and the assessor‐searcher agreement was significant on all dimensions, showing that the agreement between external assessors is not a good estimator of the validity of the intent classifications. Therefore, we encourage the research community to consider using query intent classifications by the searchers themselves as test data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.