Combining linguistic and statistical analysis to extract relations from web documents

Suchanek, Fabian M.; Ifrim, Georgiana; Weikum, Gerhard

doi:10.1145/1150402.1150492

Cited by 135 publications

(74 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The outcome is used to enrich the knowledge base. So far, a variety of tools working in this scheme have been proposed, including Snowball [30], Semagix/SWETO [31], KnowItAll [32], Text2Onto [33], LEILA [34], TextRunner [35], and SEAL [36]. These systems also take advantage of natural language processing tools to improve the results by employing parts-of-speech tagging, lexical dependency parsing or using heuristics for entity disambiguation, etc.…”

Section: Related Workmentioning

confidence: 99%

A Statistical Approach for Knowledge Discovery: Bootstrapped Analysis of Language Models for Knowledge base Population from Unstructured Text

Momtazi

Moradiannasab

2018

Scientia Iranica

View full text Add to dashboard Cite

Abstract. This paper proposes a novel approach to knowledge discovery from textual data. The generated knowledge base can be used as one of the main components in the cognitive process of question answering systems. The proposed model automatically extracts relations between named entities in Persian. Our proposed model is a bootstrapping approach based on n-gram (a contiguous sequence of n items from a given sequence of text or speech) model to nd the representative textual patterns of relations as n-grams in order to extract new knowledge about given named entities. The main motivation of this work is the characteristic of the sentence structure in Persian which, in comparison to English sentences, is in subject-object-verb format. The proposed approach is a purely statistical one, and no background knowledge of the target language is required. This makes our method applicable to any open domain relation extraction task. However, as for our test-bed, the domain of biographical data of international poets and scientists is considered herein to build a knowledge base about them. Qualitative evaluations based on human assessment represent the evidence of the e cacy of our method.

show abstract

Section: Related Workmentioning

confidence: 99%

A Statistical Approach for Knowledge Discovery: Bootstrapped Analysis of Language Models for Knowledge base Population from Unstructured Text

Momtazi

Moradiannasab

2018

Scientia Iranica

View full text Add to dashboard Cite

show abstract

“…The benefit of using LGP is that there exists a structure similarity to CGs, hence it is easier to map the obtained structure to CGs [26]. Suchanek et al [27] reported that the LGP provides a much deeper semantic structure than the standard context-free parsers. The parser is able to identify the syntactic level of sentence decomposition and categorizes the phrase into the following: S, which represents sentences; NP, which represents Noun Phrases; VP, which represents Verb Phrases; and PP, which represents Preposition Phrases.…”

Section: Parsing and Conceptual Graph Generationmentioning

confidence: 99%

Outlier detection in financial statements: a text mining method

et al. 2009

View full text Add to dashboard Cite

This paper presents a text mining methodology to extract outlying knowledge from a collection of financial statements. The main idea is to extract relevant financial performance indicators and discover implicit textual description of the indicators. The extracted information was represented using a network language i.e. conceptual graph. Outlier mining was performed on the conceptual graph representation using a deviation based method. Experiments were carried out to evaluate the effectiveness of the proposed method. Results show that the proposed method is able to excerpt outlying knowledge from the financial statements with accuracy comparable to human experts.

show abstract

“…LEILA [18] automatically generated negative examples using information about the cardinality of relations. Work conducted in [19] [20] employed semi-supervised learning algorithms and achieved good performance using only a small amount of labeled examples.…”

Section: Related Workmentioning

confidence: 99%

PORE: Positive-Only Relation Extraction from Wikipedia Text

Wang

Zhu

2007

The Semantic Web

View full text Add to dashboard Cite

Abstract. Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE (Positive-Only Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identification, and transductive inference to work with fewer positive training examples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL can work effectively given only a small amount of positive training examples and it significantly outperforms the original positive learning approaches and a multi-class SVM. Furthermore, although PORE is applied in the context of Wikipedia, the core algorithm B-POL is a general approach for Ontology Population and can be adapted to other domains.

show abstract

Combining linguistic and statistical analysis to extract relations from web documents

Cited by 135 publications

References 20 publications

A Statistical Approach for Knowledge Discovery: Bootstrapped Analysis of Language Models for Knowledge base Population from Unstructured Text

A Statistical Approach for Knowledge Discovery: Bootstrapped Analysis of Language Models for Knowledge base Population from Unstructured Text

Outlier detection in financial statements: a text mining method

PORE: Positive-Only Relation Extraction from Wikipedia Text

Contact Info

Product

Resources

About