Scalable Knowledge Graph Construction from Text Collections

Clancy, Ryan; Ilyas, Ihab F.; Lin, Jimmy

doi:10.18653/v1/d19-6607

Cited by 19 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While some works refer to knowledge found in texts or other resources as evidence for a fact [9,46,127,153] and call it fact only after the truthfulness has been determined and that knowledge is entered into a knowledge base, other works assume the truthfulness of the mentions and refer to them or the knowledge they represent as facts directly [32,71]. Very related is the task of Truth Discovery.…”

Section: Facts Vs Evidencementioning

confidence: 99%

Beyond facts – a survey and conceptualisation of claims in online discourse analysis

Boland

Fafalios

Tchechmedjiev³

et al. 2022

View full text Add to dashboard Cite

Analyzing statements of facts and claims in online discourse is subject of a multitude of research areas. Methods from natural language processing and computational linguistics help investigate issues such as the spread of biased narratives and falsehoods on the Web. Related tasks include fact-checking, stance detection and argumentation mining. Knowledge-based approaches, in particular works in knowledge base construction and augmentation, are concerned with mining, verifying and representing factual knowledge. While all these fields are concerned with strongly related notions, such as claims, facts and evidence, terminology and conceptualisations used across and within communities vary heavily, making it hard to assess commonalities and relations of related works and how research in one field may contribute to address problems in another. We survey the state-of-the-art from a range of fields in this interdisciplinary area across a range of research tasks. We assess varying definitions and propose a conceptual model – Open Claims – for claims and related notions that takes into consideration their inherent complexity, distinguishing between their meaning, linguistic representation and context. We also introduce an implementation of this model by using established vocabularies and discuss applications across various tasks related to online discourse analysis.

show abstract

Section: Facts Vs Evidencementioning

confidence: 99%

Beyond facts – a survey and conceptualisation of claims in online discourse analysis

Boland

Fafalios

Tchechmedjiev³

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Data-to-Text Generation Data-to-Text Generation has several benchmark datasets with slightly different objectives-WebNLG (Gardent et al, 2017) to convert a group of triples to text, E2ENLG (Dušek et al, 2018) (Etzioni et al, 2008;Angeli et al, 2015;Clancy et al, 2019) inherently create such a corpus but these works generally do not release the extracted KG triples.…”

Section: Related Workmentioning

confidence: 99%

Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Agarwal¹,

Ge²,

Shakeri³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

Prior work on Data-To-Text Generation, the task of converting knowledge graph (KG) triples into natural text, focused on domainspecific benchmark datasets. In this paper, however, we verbalize the entire English Wikidata KG, and discuss the unique challenges associated with a broad, open-domain, largescale verbalization. We further show that verbalizing a comprehensive, encyclopedic KG like Wikidata can be used to integrate structured KGs and natural language corpora. In contrast to the many architectures that have been developed to integrate these two sources, our approach converts the KG into natural text, allowing it to be seamlessly integrated into existing language models. It carries the further advantages of improved factual accuracy and reduced toxicity in the resulting language model. We evaluate this approach by augmenting the retrieval corpus in a retrieval language model and showing significant improvements on the knowledge intensive tasks of open domain QA and the LAMA knowledge probe.

show abstract

“…There is a vast literature on the inverse task of automatic KG construction from text (Etzioni et al, 2008;Angeli et al, 2015;Clancy et al, 2019), however these works generally describe the methodology and do not release the corresponding dataset. Figure 2: KG verbalization process.…”

Section: Kg-text Alignmentmentioning

confidence: 99%

Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Agarwal¹,

Ge²,

Shakeri³

et al. 2020

Preprint

View full text Add to dashboard Cite

Generating natural sentences from Knowledge Graph (KG) triples, known as Data-To-Text Generation, is a task with many datasets for which numerous complex systems have been developed. However, no prior work has attempted to perform this generation at scale by converting an entire KG into natural text. In this paper, we verbalize the entire Wikidata KG, and create a KG-Text aligned corpus in the training process 1 . We discuss the challenges in verbalizing an entire KG versus verbalizing smaller datasets. We further show that verbalizing an entire KG can be used to integrate structured and natural language data. In contrast to the many architectures that have been developed to integrate the structural differences between these two sources, our approach converts the KG into the same format as natural text allowing it to be seamlessly plugged into existing natural language systems. We evaluate this approach by augmenting the retrieval corpus in REALM and showing improvements, both on the LAMA knowledge probe and open domain QA.

show abstract

Scalable Knowledge Graph Construction from Text Collections

Cited by 19 publications

References 13 publications

Beyond facts – a survey and conceptualisation of claims in online discourse analysis

Beyond facts – a survey and conceptualisation of claims in online discourse analysis

Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

Contact Info

Product

Resources

About