Human interaction research has always been inventive in its use of the latest technology.Even 50 years ago, Bales (1951) adopted one-way mirrors to observe his groups and then had to design motorized paper scrolls so that his observers could keep up during live scoring. Since then, signal recording technologies have advanced significantly; video cameras are portable, microphones can be arranged to pick up individualsubjects even without the use of wires, and multiple signals can be synchronized using a mixing desk. Not only that, but now that every garage band makes music videos, these technologies are so cheap that researchers can focus less on cost and more on what would make for their ideal data capture. With these advances in signal recording come new ideas about what sort of data to collect and how to use them.One research area that can benefit greatly from better signal recording is the study of how people use language. When people communicate, gestures, postural shifts, facial expressions, backchannel continuers such as "mmhmm," and spoken turns from the subjects all work in concert to bring about mutual understanding (Goodwin, 1981). Apart from the scientific good of understanding how this process works, information about it is in demand for applications ranging from the documentation of endangered languages to animation for computer games. Observational analysis packages can help us determine some things about the timing, frequency, and sequencing of communicative behaviors, but that is not enough. In language data, behaviors are related less by their timing than by their structure: Pronouns have discourse referents, answers relate to questions, and deictic instances of the word "that" are resolved by pointing gestures that themselves relate to real-world objects, but with no guarantees about when the related behavior will occur. Linguistic analysis reveals this structure, but current tools only support specific codes and structures and only allow them to be imposed over the top of a textual transcription. This approach discards temporal information and makes it difficult to describe behaviors from different subjects that happen at the same time.We gratefully acknowledge support of the NITE project by the European Commission's Human Language Technologies Programme. The samples described in the paper use data kindly provided either to us personally or to the community at large by the Smartkom project (http:// smartkom.dfki.de/), by ISIP's Switchboard project (http://www.isip. msstate.edu/projects/switchboard/ ), and by the University of Edinburgh's Human Communication Research Centre (http://www.hcrc.ed. ac.uk/). The software described in this paper is available for download from http://www.ltg.ed.ac.uk/NITE. Correspondence concerning this article should be addressed to J. Carletta, Human Communication Research Centre and Language Technology Group, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, Scotland (e-mail: j.carletta@edinburgh.ac.uk). Multimodal corpora that show humans interacting ...
The NITE XML Toolkit (NXT) is open source software for working with language corpora, with particular strengths for multimodal and heavily cross-annotated data sets. In NXT, annotations are described by types and attribute value pairs, and can relate to signal via start and end times, to representations of the external environment, and to each other via either an arbitrary graph structure or a multirooted tree structure characterized by both temporal and structural orderings. Simple queries in NXT express variable bindings for n-tuples of objects, optionally constrained by type, and give a set of conditions on the n-tuples combined with boolean operators. The defined operators for the condition tests allow full access to the timing and structural properties of the data model. A complex query facility passes variable bindings from one query to another for filtering, returning a tree structure. In addition to describing NXT's core data handling and search capabilities, we explain the stand-off XML data storage format that it employs and illustrate its use with examples from an early adopter of the technology.
This paper deals with computational linguistic tools and methods for the extraction of raw material for terminological glossaries from machine-readable text. We concentrate on monolingual German term candidates, and only briefly hint at tools and procedures for the creation of bilingual glossaries. Most of the examples we use to illustrate methods and results of our work come from technical texts provided by the translation services of Daimler Chrysler AG1 and from legal texts made available by the European Academy in Bozen, Sudtirol. The Academy is working on translations of legal documents for bilingual South Tyrol, and, in this context, on the creation, upgrading, and maintenance of terminological resources.
Electronic dictionaries should support dictionary users by giving them guidance in text production and text reception, alongside a user-definable offer of lexicographic data for cognitive purposes. In this article, we sketch the principles of an interactive and dynamic electronic dictionary aimed at text production and text reception guiding users in innovative ways, especially with respect to difficult, complicated or confusing issues. The lexicographer has to do a very careful analysis of the nature of the possible problems to suggest an optimal solution for a specific problem. We are of the opinion that there are numerous complex situations where users need more detailed support than currently available in e-dictionaries, enabling them to make valid and correct choices. For highly complex situations, we suggest guidance through a decision tree-like device. We assume that the solutions proposed here are not specific to one language only but can, after careful analysis, be applied to e-dictionaries in different languages across the world.
Some recent dictionaries include corpus lines, links to concordances or to internet pages, or other links to dictionary-external data. Lexicographers present such external data to the user either to complement their lexicographic descriptions, or even instead of such lexicographically processed material, when the latter would be redundant to an authoritative source, or when it is simply not available. In this article, we critically review some dictionaries that offer such devices, and we make an attempt at a classifi cation of the ways in which dictionary-internal data and dictionary-external material are related. On this basis, we come up with some proposals both for future research on the topic and for future lexicographic realizations. 1
We present a proposal for the structuring of collocation knowledge 1 in the lexicon of a multilingual generation system and show to what extent it can be used in the process of lexical selection. This proposal is part of Polygloss, a new research project on multilingual generation, and it has been inspired by work carried out in the S EM-SYN project (see e.g. [I~(~SNEtt 198812). The descriptive approach presented in this proposal is based on a combination of results from recent lexicographical research and the application of Meaning-Text-Theory (MTT) (see e.g. [MEL'CUK et al. 1981], [MEL'CUK et al. 1984]). We first outline the overall structure of the dictionary system that is needed by a multilingual generator; section 2 gives an overview of the results of lexicographical work on collocations and compares them with "lexical functions" as used in Meaning-Text-Theory. Section 3 shows how we intend to integrate collocations in the generation dic-1We use the term "collocation" in the sense of [HAUSMANN 1985] referring to constraints on the cooccurrence of two lexeme words; the two elements are not completely freely combined, but one of them semantically determines the other one. Examples are for instance solve a problem, turn dark, expose someone to a risk, etc. For a more detailed definition see section 2. 2 Research reported in this paper is supported by the German Bundesministerium fiir Forschung und Technologie, BMFT, under grant No. 08 B 3116 3. The views and conclusions contained herein are those of the authors and should not be interpreted as positions of the project as a whole. tionary and how "lexical functions" can be used in generation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.