Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computationa 2014
DOI: 10.3115/v1/e14-3002
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Relation Extraction of In-Domain Data from Focused Crawls

Abstract: This thesis proposal approaches unsupervised relation extraction from web data, which is collected by crawling only those parts of the web that are from the same domain as a relatively small reference corpus. The first part of this proposal is concerned with the efficient discovery of web documents for a particular domain and in a particular language. We create a combined, focused web crawling system that automatically collects relevant documents and minimizes the amount of irrelevant web content. The collecte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2015
2015
2019
2019

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…-Information redundancy that exists in various sources (Agichtein and Gravano 2000;Brin 1999;Dill et al 2003b;Etzioni et al 2005). -Latent semantic similarity like co-occurrence (Rosenfeld and Feldman 2007;Rozenfeld and Feldman 2006) or distributional representation Remus 2014;Skeppstedt 2014;Zhang et al 2016). -Hybrid approaches (Cucchiarelli and Velardi 2001;Kambhatla 2004;Lechevrel et al 2017).…”
Section: Results Regarding Research Questions 2 Andmentioning
confidence: 99%
“…-Information redundancy that exists in various sources (Agichtein and Gravano 2000;Brin 1999;Dill et al 2003b;Etzioni et al 2005). -Latent semantic similarity like co-occurrence (Rosenfeld and Feldman 2007;Rozenfeld and Feldman 2006) or distributional representation Remus 2014;Skeppstedt 2014;Zhang et al 2016). -Hybrid approaches (Cucchiarelli and Velardi 2001;Kambhatla 2004;Lechevrel et al 2017).…”
Section: Results Regarding Research Questions 2 Andmentioning
confidence: 99%
“…Recently, many unsupervised learning-based methods [20][21][22][23][24][25] of open domain relation extraction have been proposed to reduce the heavy manual labor. TextRunner [5] was the first Open IE (OIE) system, where a large set of relational entity tuples were extracted without requiring any human labor and then these tuples were assigned a probability.…”
Section: Entity Relation Extractionmentioning
confidence: 99%
“…These have 1028 and 1085 unique sentences from Wikipedia, European Parliament transcriptions and crawled German sentences from the Internet, distributed equally per speaker. The crawled German sentences were collected randomly with a focused crawler [17], and were only selected from sentences encountered between quotation marks, which exhibit textual content more typical of direct speech. Unlike in the training set, where multiple speakers read the same sentences, every sentence recorded in the test and development set is unique, for evaluation purposes.…”
Section: Corpusmentioning
confidence: 99%