We describe an annotation scheme and a tool developed for creating linguistically annotated corpora for non-configurational languages. Since the requirements for such a formalism differ from those posited for configurational languages, several features have been added, influencing the architecture of the scheme. The resulting scheme reflects a stratificational notion of language, and makes only minimal assumptions about the interrelation of the particular representational strata.
This paper describes applications of stochastic and symbolic NLP methods to treebank annotation. In paxticular we focus on (1) the automation of treebank annotation, (2) the comparison of conflicting annotations for the same sentence and (3) the automatic detection of inconsistencies. These techniques are currently employed for building a German treebank.
Abstract. We present a flexible rule compiler developed for a text-to-speech (TTS) system. The compiler converts a set of rules into a finite-state transducer (FST). The input and output of the FST are subject to parameterization, so that the system can be applied to strings and sequences of feature-structures. The resulting transducer is guaranteed to realize a function (as opposed to a relation), and therefore can be implemented as a deterministic device (either a deterministic FST or a bimachine).
MotivationImplementations of TTS systems are often based on operations transforming one sequence of symbols or objects into another. Starting from the input string, the system creates a sequence of tokens which are subject to part-of-speech tagging, homograph disambiguation rules, lexical lookup and grapheme-to-phoneme conversion. The resulting phonetic transcriptions are also transformed by syllabification rules, post-lexical reductions, etc.The character of the above transformations suggests finite-state transducers (FSTs) as a modelling framework ,Mohri, 1997. However, this is not always straightforward for two reasons.Firstly, the transformations are more often expressed by rules than encoded directly in finite-state networks. In order to overcome this difficulty, we need an adequate compiler converting the rules into an FST.Secondly, finite-state machines require a finite alphabet of symbols while it is often more adequate to encode linguistic information using structured representations (e.g. feature structures) the inventory of which might be potentially infinite. Thus, the compilation method must be able to reduce the inifinite set of feature structures to a finite FST input alphabet.In this paper, we show how these two problems have been solved in rVoice, a speech synthesis system developed at Rhetorical Systems.
Definitions and NotationA deterministic finite-state automaton (acceptor, DFSA) over a finite alphabet Σ is a quintuple A = (Σ, Q, q 0 , δ, F ) such that:Q is a finite set of states, and q 0 ∈ Q is the initial state of A; δ : Q × Σ → Q is the transition function of A; F ⊂ Q is a non-empty set of final states.
This paper describes a novel method of compiling ranked tagging rules into a deterministic finite-state device called a bimachine. The rules are formulated in the framework of regular rewrite operations and allow unrestricted regular expressions in both left and right rule contexts. The compiler is illustrated by an application within a speech synthesis system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.