2005
DOI: 10.1007/11562214_11
|View full text |Cite
|
Sign up to set email alerts
|

Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web

Abstract: This paper presents a lightweight method for unsupervised extraction of paraphrases from arbitrary textual Web documents. The method differs from previous approaches to paraphrase acquisition in that 1) it removes the assumptions on the quality of the input data, by using inherently noisy, unreliable Web documents rather than clean, trustworthy, properly formatted documents; and 2) it does not require any explicit clue indicating which documents are likely to encode parallel paraphrases, as they report on the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
25
0

Year Published

2006
2006
2016
2016

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(27 citation statements)
references
References 14 publications
1
25
0
Order By: Relevance
“…Hence, it is expensive to obtain and use on a large scale. The second category of techniques perform paraphrase pair extraction using standard text data [13,19]. These are motivated by the distributional similarity theory [8], which postulates that phrase pairs often sharing the same left and right contexts are likely to be paraphrases to each other.…”
Section: Paraphrase Phrase Pair Extractionmentioning
confidence: 99%
“…Hence, it is expensive to obtain and use on a large scale. The second category of techniques perform paraphrase pair extraction using standard text data [13,19]. These are motivated by the distributional similarity theory [8], which postulates that phrase pairs often sharing the same left and right contexts are likely to be paraphrases to each other.…”
Section: Paraphrase Phrase Pair Extractionmentioning
confidence: 99%
“…To obtain sufficient phrase coverage, a large number of paraphrase phrase pairs are required. As it is impractical to obtain expert semantic labelling at the phrase level, a distributional similarity [8] based statistical paraphrase extraction scheme that operates on standard text data [14,26,1,21] is employed. The n-gram paraphrase induction algorithm proposed in [18] is used.…”
Section: Paraphrase Model Estimationmentioning
confidence: 99%
“…the number of documents or the size in MB) ** These papers do not report the number of paraphrases extracted, or such a number does not exist in their approach Table 4: Comparison with the precision and paraphrases generated per input sentence (PPS) of relevant prior work While we wanted to show a meaningful comparison with another method from previous work, none of them do what we are doing here -extraction of sentence-size paraphrasal templates from a non-aligned corpus -and so a comparison using the same data would not be fair (and in most cases, not possible). While it seems that providing the results of human evaluation without comparison to prior methods is the norm in most relevant prior work (Ibrahim et al, 2003;Paşca and Dienes, 2005;Bannard and Callison-Burch, 2005;Fujita et al, 2012), we wanted to at least get some sense of where we stand in comparison to other methods, and so we provide a list of (not directly comparable) results reported by other authors in Table 4. 4 While it is impossible to meaningfully compare and rate such different methods, these numbers support the conclusion that our singlecorpus, domain-agnostic approach achieves a precision that is similar to or better than other methods.…”
Section: Discussionmentioning
confidence: 99%
“…Another line of research is based on contextual similarity (Lin and Pantel, 2001;Paşca and Dienes, 2005;Bhagat and Ravichandran, 2008). Here, shorter (phrase-level) paraphrases are extracted from a single corpus when they appear in a similar lexical (and in later approaches, also syntactic) context.…”
Section: Related Workmentioning
confidence: 99%