Stochastic Contextual Edit Distance and Probabilistic FSTs

Cotterell, Ryan; Peng, Nanyun; Eisner, Jason

doi:10.3115/v1/p14-2102

Cited by 30 publications

(44 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most downstream NLP systems have simply employed a static edit distance module to decide whether two names can be matched (Chen et al, 2010;Cassidy et al, 2011;Martschat et al, 2012). An exception is work on training finite state transducers for edit distance metrics (Ristad and Yianilos, 1998;Bouchard-Côté et al, 2008;Dreyer et al, 2008;Cotterell et al, 2014). More recently, presented a phylogenetic model of string variation using transducers that applies to pairs of names string (supervised) and unpaired collections (unsupervised).…”

Section: Name Matching Methodsmentioning

confidence: 99%

“…Transducers are common choices for learning edit dis-tance metrics for strings, and they perform better than string similarity (Ristad and Yianilos, 1998;Cotterell et al, 2014). We use the probabilistic transducer of Cotterell et al (2014) to learn a stochastic edit distance. The model represent the conditional probability p(y|x; θ), where y is a generated string based on editing x according to parameters θ.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

An Empirical Study of Chinese Name Matching and Applications

Peng

Dredze

2015

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere

Self Cite

View full text Add to dashboard Cite

Methods for name matching, an important component to support downstream tasks such as entity linking and entity clustering, have focused on alphabetic languages, primarily English. In contrast, logogram languages such as Chinese remain untested. We evaluate methods for name matching in Chinese, including both string matching and learning approaches. Our approach, based on new representations for Chinese, improves both name matching and a downstream entity clustering task.

show abstract

Section: Name Matching Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

An Empirical Study of Chinese Name Matching and Applications

Peng

Dredze

2015

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere

Self Cite

View full text Add to dashboard Cite

show abstract

“…Substringto-substring edit operations -or equivalently, (monotone) many-to-many alignments -have appeared in the NLP context, e.g., in (Deligne et al, 1995), (Brill and Moore, 2000), (Jiampojamarn et al, 2007), (Bisani and Ney, 2008), (Jiampojamarn et al, 2010), or, significantly earlier, in (Ukkonen, 1985, (Véronis, 1988). Learning edit distance/monotone alignments in an unsupervised manner has been the topic of, e.g., (Ristad and Yianilos, 1998), (Cotterell et al, 2014), besides the works already mentioned. All of these approaches are special cases of our unigram model outlined in Section 2 -i.e., they consider particular S (most prominently, S = {(1, 0), (0, 1), (1, 1)}) and/or restrict attention to only N = 2 strings.…”

Section: Related Workmentioning

confidence: 99%

Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment

Eger

2015

Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Confere

View full text Add to dashboard Cite

We investigate multiple many-to-many alignments as a primary step in integrating supplemental information strings in string transduction. Besides outlining DP based solutions to the multiple alignment problem, we detail an approximation of the problem in terms of multiple sequence segmentations satisfying a coupling constraint. We apply our approach to boosting baseline G2P systems using homogeneous as well as heterogeneous sources of supplemental information.

show abstract

“…The above choice of F corresponds to the "(0, 1, 1) topology" in the more general scheme of Cotterell et al (2014). For practical reasons, we actually modify it to limit the number of consecutive INS edits to 3.…”

Section: Transducer Topologymentioning

confidence: 99%

Weighting Finite-State Transductions With Neural Context

Rastogi

Cotterell

Eisner

2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

View full text Add to dashboard Cite

How should one apply deep learning to tasks such as morphological reinflection, which stochastically edit one string to get another? A recent approach to such sequence-to-sequence tasks is to compress the input string into a vector that is then used to generate the output string, using recurrent neural networks. In contrast, we propose to keep the traditional architecture, which uses a finite-state transducer to score all possible output strings, but to augment the scoring function with the help of recurrent networks. A stack of bidirectional LSTMs reads the input string from leftto-right and right-to-left, in order to summarize the input context in which a transducer arc is applied. We combine these learned features with the transducer to define a probability distribution over aligned output strings, in the form of a weighted finite-state automaton. This reduces hand-engineering of features, allows learned features to examine unbounded context in the input string, and still permits exact inference through dynamic programming. We illustrate our method on the tasks of morphological reinflection and lemmatization.

show abstract

Stochastic Contextual Edit Distance and Probabilistic FSTs

Cited by 30 publications

References 16 publications

An Empirical Study of Chinese Name Matching and Applications

An Empirical Study of Chinese Name Matching and Applications

Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment

Weighting Finite-State Transductions With Neural Context

Contact Info

Product

Resources

About