Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) 2014
DOI: 10.3115/v1/p14-2102
|View full text |Cite
|
Sign up to set email alerts
|

Stochastic Contextual Edit Distance and Probabilistic FSTs

Abstract: String similarity is most often measured by weighted or unweighted edit distance d(x, y). Ristad and Yianilos (1998) defined stochastic edit distance-a probability distribution p(y | x) whose parameters can be trained from data. We generalize this so that the probability of choosing each edit operation can depend on contextual features. We show how to construct and train a probabilistic finite-state transducer that computes our stochastic contextual edit distance. To illustrate the improvement from conditionin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
43
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
3
1

Relationship

4
5

Authors

Journals

citations
Cited by 30 publications
(44 citation statements)
references
References 16 publications
0
43
1
Order By: Relevance
“…Most downstream NLP systems have simply employed a static edit distance module to decide whether two names can be matched (Chen et al, 2010;Cassidy et al, 2011;Martschat et al, 2012). An exception is work on training finite state transducers for edit distance metrics (Ristad and Yianilos, 1998;Bouchard-Côté et al, 2008;Dreyer et al, 2008;Cotterell et al, 2014). More recently, presented a phylogenetic model of string variation using transducers that applies to pairs of names string (supervised) and unpaired collections (unsupervised).…”
Section: Name Matching Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Most downstream NLP systems have simply employed a static edit distance module to decide whether two names can be matched (Chen et al, 2010;Cassidy et al, 2011;Martschat et al, 2012). An exception is work on training finite state transducers for edit distance metrics (Ristad and Yianilos, 1998;Bouchard-Côté et al, 2008;Dreyer et al, 2008;Cotterell et al, 2014). More recently, presented a phylogenetic model of string variation using transducers that applies to pairs of names string (supervised) and unpaired collections (unsupervised).…”
Section: Name Matching Methodsmentioning
confidence: 99%
“…Transducers are common choices for learning edit dis-tance metrics for strings, and they perform better than string similarity (Ristad and Yianilos, 1998;Cotterell et al, 2014). We use the probabilistic transducer of Cotterell et al (2014) to learn a stochastic edit distance. The model represent the conditional probability p(y|x; θ), where y is a generated string based on editing x according to parameters θ.…”
Section: Methodsmentioning
confidence: 99%
“…Substringto-substring edit operations -or equivalently, (monotone) many-to-many alignments -have appeared in the NLP context, e.g., in (Deligne et al, 1995), (Brill and Moore, 2000), (Jiampojamarn et al, 2007), (Bisani and Ney, 2008), (Jiampojamarn et al, 2010), or, significantly earlier, in (Ukkonen, 1985, (Véronis, 1988). Learning edit distance/monotone alignments in an unsupervised manner has been the topic of, e.g., (Ristad and Yianilos, 1998), (Cotterell et al, 2014), besides the works already mentioned. All of these approaches are special cases of our unigram model outlined in Section 2 -i.e., they consider particular S (most prominently, S = {(1, 0), (0, 1), (1, 1)}) and/or restrict attention to only N = 2 strings.…”
Section: Related Workmentioning
confidence: 99%
“…The above choice of F corresponds to the "(0, 1, 1) topology" in the more general scheme of Cotterell et al (2014). For practical reasons, we actually modify it to limit the number of consecutive INS edits to 3.…”
Section: Transducer Topologymentioning
confidence: 99%