Minimally Supervised Number Normalization

Gorman, Kyle; Sproat, Richard

doi:10.1162/tacl_a_00114

Cited by 19 publications

(23 citation statements)

References 17 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Silly errors This category consists of those "bizarre" errors which defy any purely linguistic characterization. In addition to the aforementioned case of *membled, such errors have also been reported for other language generation tasks such as machine translation (Arthur et al 2016) and text normalization (Gorman and Sproat 2016, Sproat and Jaitly 2017, Zhang et al 2019.…”

Section: Error Taxonomymentioning

confidence: 69%

Weird Inflects but OK: Making Sense of Morphological Generation Errors

Gorman¹,

McCarthy

Cotterell

et al. 2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Self Cite

View full text Add to dashboard Cite

We conduct a manual error analysis of the CoNLL-SIGMORPHON 2017 Shared Task on Morphological Reinflection. In this task, systems are given a word in citation form (e.g., hug) and asked to produce the corresponding inflected form (e.g., the simple past hugged). This design lets us analyze errors much like we might analyze children's production errors. We propose an error taxonomy and use it to annotate errors made by the top two systems across twelve languages. Many of the observed errors are related to inflectional patterns sensitive to inherent linguistic properties such as animacy or affect; many others are failures to predict truly unpredictable inflectional behaviors. We also find nearly one quarter of the residual "errors" reflect errors in the gold data.

show abstract

Section: Error Taxonomymentioning

confidence: 69%

Weird Inflects but OK: Making Sense of Morphological Generation Errors

Gorman¹,

McCarthy

Cotterell

et al. 2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our general process of minimally supervised number names induction is described in [4]. In essence, we use a set of training data consisting of digits mapped to their verbalization (like 123 → one hundred twenty three) to induce a finite state transducer (FST) which can produce the factorization for any number.…”

Section: Number Names Inductionmentioning

confidence: 99%

“…Verbalizers for most semiotic classes depend on underlying core number names grammars specifying the verbalization of numbers like English one, two, three. We first describe how we modify the induction algorithm in [4] to build these number names grammars across a wider range of languages. We then describe a system which builds on this algorithm to induce verbalization grammars for ASR and TTS systems alike.…”

Section: Introductionmentioning

confidence: 99%

“…This variation, and the recursive complexity of number naming in general, makes them less amenable to templates than other semiotic classes. [4] induce the basic factorization and other features of number names systems from a small set of labeled examples. In Section 2, we describe some extensions to this system which provide better support for factorizations commonly found in the languages of South Asia and sub-Saharan Africa.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Unified Verbalization for Speech Recognition & Synthesis Across Languages

et al. 2019

Self Cite

View full text Add to dashboard Cite

We describe a new approach to converting written tokens to their spoken form, which can be shared by automatic speech recognition (ASR) and text-to-speech synthesis (TTS) systems. Both ASR and TTS need to map from the written to the spoken domain, and we present an approach that enables us to share verbalization grammars between the two systems while exploiting linguistic commonalities to provide simple default verbalizations. We also describe improvements to an induction system for number names grammars. Between these shared ASR/TTS verbalizers and the improved induction system for number names grammars, we achieve significant gains in development time and scalability across languages.

show abstract

“…One solution to this problem is to use covering grammars, (usually) finite-state models that can constrain the neural models to a reasonable (context-independent) space of options so that 2mA could be read as two milliamperes or two m a, but not two million liters [3]. These covering grammars can be learned in whole or in part from data [7].…”

Section: Introductionmentioning

confidence: 99%

Dual Encoder Classifier Models as Constraints in Neural Text Normalization

Gokcen¹,

Zhang

Sproat

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

Neural text normalization systems can achieve low error rates; however, the errors they make include not only ones from which the hearer can recover (such as reading $3 as three dollar) but also unrecoverable errors, such as reading $3 as three euros. FST decoding constraints have proven effective at reducing unrecoverable errors. In this paper we explore an alternative approach to error mitigation: using dual encoder classifiers trained with both positive and negative examples to implement soft constraints on acceptability. Since the error rates are very low, it is difficult to determine when improvement is significant, but qualitative analysis suggests that soft dual encoder constraints can help reduce the number of unrecoverable errors.

show abstract

Minimally Supervised Number Normalization

Cited by 19 publications

References 17 publications

Weird Inflects but OK: Making Sense of Morphological Generation Errors

Weird Inflects but OK: Making Sense of Morphological Generation Errors

Unified Verbalization for Speech Recognition & Synthesis Across Languages

Dual Encoder Classifier Models as Constraints in Neural Text Normalization

Contact Info

Product

Resources

About