Unsupervised Relation Extraction of In-Domain Data from Focused Crawls

Remus, Steffen

doi:10.3115/v1/e14-3002

Cited by 5 publications

(3 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…-Information redundancy that exists in various sources (Agichtein and Gravano 2000;Brin 1999;Dill et al 2003b;Etzioni et al 2005). -Latent semantic similarity like co-occurrence (Rosenfeld and Feldman 2007;Rozenfeld and Feldman 2006) or distributional representation Remus 2014;Skeppstedt 2014;Zhang et al 2016). -Hybrid approaches (Cucchiarelli and Velardi 2001;Kambhatla 2004;Lechevrel et al 2017).…”

Section: Results Regarding Research Questions 2 Andmentioning

confidence: 99%

Unsupervised Approaches for Textual Semantic Annotation, A Survey

Liao

Zhao

2019

ACM Comput. Surv.

View full text Add to dashboard Cite

Semantic annotation is a crucial part of achieving the vision of the Semantic Web and has long been a research topic among various communities. The most challenging problem in reaching the Semantic Web’s real potential is the gap between a large amount of unlabeled existing/new data and the limited annotation capability available. To resolve this problem, numerous works have been carried out to increase the degree of automation of semantic annotation from manual to semi-automatic to fully automatic. The richness of these works has been well-investigated by numerous surveys focusing on different aspects of the problem. However, a comprehensive survey targeting unsupervised approaches for semantic annotation is still missing and is urgently needed. To better understand the state-of-the-art of semantic annotation in the textual domain adopting unsupervised approaches, this article investigates existing literature and presents a survey to answer three research questions: (1) To what extent can semantic annotation be performed in a fully automatic manner by using an unsupervised way? (2) What kind of unsupervised approaches for semantic annotation already exist in literature? (3) What characteristics and relationships do these approaches have? In contrast to existing surveys, this article helps the reader get an insight into the state-of-art of semantic annotation using unsupervised approaches. While examining the literature, this article also addresses the inconsistency in the terminology used in the literature to describe the various semantic annotation tools’ degree of automation and provides more consistent terminology. Based on this, a uniform summary of the degree of automation of the many semantic annotation tools that were previously investigated can now be presented.

show abstract

Section: Results Regarding Research Questions 2 Andmentioning

confidence: 99%

Unsupervised Approaches for Textual Semantic Annotation, A Survey

Liao

Zhao

2019

ACM Comput. Surv.

View full text Add to dashboard Cite

show abstract

“…Recently, many unsupervised learning-based methods [20][21][22][23][24][25] of open domain relation extraction have been proposed to reduce the heavy manual labor. TextRunner [5] was the first Open IE (OIE) system, where a large set of relational entity tuples were extracted without requiring any human labor and then these tuples were assigned a probability.…”

Section: Entity Relation Extractionmentioning

confidence: 99%

RGloVe: An Improved Approach of Global Vectors for Distributional Entity Relation Representation

et al. 2017

View full text Add to dashboard Cite

Abstract:Most of the previous works on relation extraction between named entities are often limited to extracting the pre-defined types; which are inefficient for massive unlabeled text data. Recently; with the appearance of various distributional word representations; unsupervised methods for many natural language processing (NLP) tasks have been widely researched. In this paper; we focus on a new finding of unsupervised relation extraction; which is called distributional relation representation. Without requiring the pre-defined types; distributional relation representation aims to automatically learn entity vectors and further estimate semantic similarity between these entities. We choose global vectors (GloVe) as our original model to train entity vectors because of its excellent balance between local context and global statistics in the whole corpus. In order to train model more efficiently; we improve the traditional GloVe model by using cosine similarity between entity vectors to approximate the entity occurrences instead of dot product. Because cosine similarity can convert vector to unit vector; it is intuitively more reasonable and more easily converge to a local optimum. We call the improved model RGloVe. Experimental results on a massive corpus of Sina News show that our proposed model outperforms the traditional global vectors. Finally; a graph database of Neo4j is introduced to store these relationships between named entities. The most competitive advantage of Neo4j is that it provides a highly accessible way to query the direct and indirect relationships between entities.

show abstract

“…These have 1028 and 1085 unique sentences from Wikipedia, European Parliament transcriptions and crawled German sentences from the Internet, distributed equally per speaker. The crawled German sentences were collected randomly with a focused crawler [17], and were only selected from sentences encountered between quotation marks, which exhibit textual content more typical of direct speech. Unlike in the training set, where multiple speakers read the same sentences, every sentence recorded in the test and development set is unique, for evaluation purposes.…”

Section: Corpusmentioning

confidence: 99%

Open Source German Distant Speech Recognition: Corpus and Acoustic Model

Radeck-Arneth

Milde

Lange

et al. 2015

Text, Speech, and Dialogue

View full text Add to dashboard Cite

We present a new freely available corpus for German distant speech recognition and report speaker-independent word error rate (WER) results for two open source speech recognizers trained on this corpus. The corpus has been recorded in a controlled environment with three different microphones at a distance of one meter. It comprises 180 different speakers with a total of 36 hours of audio recordings. We show recognition results with the open source toolkit Kaldi (20.5% WER) and PocketSphinx (39.6% WER) and make a complete open source solution for German distant speech recognition possible.

show abstract

Unsupervised Relation Extraction of In-Domain Data from Focused Crawls

Cited by 5 publications

References 30 publications

Unsupervised Approaches for Textual Semantic Annotation, A Survey

Unsupervised Approaches for Textual Semantic Annotation, A Survey

RGloVe: An Improved Approach of Global Vectors for Distributional Entity Relation Representation

Open Source German Distant Speech Recognition: Corpus and Acoustic Model

Contact Info

Product

Resources

About