Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.152
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Sentence Mining by Constrained Decoding

Abstract: We present a novel method to extract parallel sentences from two monolingual corpora, using neural machine translation. Our method relies on translating sentences in one corpus, but constraining the decoding by a prefix tree built on the other corpus. We argue that a neural machine translation system by itself can be a sentence similarity scorer and it efficiently approximates pairwise comparison with a modified beam search. When benchmarked on the BUCC shared task, our method achieves results comparable to ot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…They obtained more than 10 million parallel sentences in 112 languages. [27] leveraged machine translation model to derive sentence representations. Moreover, [14] has demonstrated that the shared word embeddings space is well in cross-lingual NLP applications by transfer learning.…”
Section: Supervised Parallel Sentences Miningmentioning
confidence: 99%
“…They obtained more than 10 million parallel sentences in 112 languages. [27] leveraged machine translation model to derive sentence representations. Moreover, [14] has demonstrated that the shared word embeddings space is well in cross-lingual NLP applications by transfer learning.…”
Section: Supervised Parallel Sentences Miningmentioning
confidence: 99%
“…To exploit the event schema knowledge, we propose to employ a trie-based constrained decoding algorithm (Chen et al, 2020a;Cao et al, 2021) for event generation. During constrained decoding, the event schema knowledge is injected as the prompt of the decoder and ensures the generation of valid event structures.…”
Section: Constrained Decodingmentioning
confidence: 99%
“…Like TEXT2EVENT in this paper, TANL (Paolini et al, 2021) and GRIT (Du et al, 2021) also employ neural generation models for event extraction, but they focus on sequence generation, rather than structure generation. Different from previous works that extract text span via labeling (Straková et al, 2019) or copy/pointer mechanism (Zeng et al, 2018;Du et al, 2021), TEXT2EVENT directly generate event schemas and text spans to form event records via constrained decoding (Cao et al, 2021;Chen et al, 2020a), which allows TEXT2EVENT to handle various event types and transfer to new types easily.…”
Section: Related Workmentioning
confidence: 99%
“…Agrawal et al (2021) investigate alternative techniques to estimate direct translation probability for reference-free quality estimation. In the context of parallel corpus filtering (Junczys-Dowmunt, 2018), Chen et al (2020) propose trie-constrained decoding to improve the efficiency of pairwise comparisons. Future work could apply their method to the other translation-based measures.…”
Section: Related Workmentioning
confidence: 99%