Mitigating linked data quality issues in knowledge-intense information extraction methods

Weichselbraun, Albert; Kuntschik, Philipp

doi:10.1145/3102254.3102272

Cited by 3 publications

(3 citation statements)

References 32 publications

(41 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Integrity Risks Monitor draws upon documents obtained from popular Austrian, German, Swiss, U.K. and U.S. media outlets that are pre-processed, analyzed and semantically enriched using methods such as part-of-speech tagging, dependency parsing [21], sentiment analysis [18], keyword analysis [20] and named entity linking [19]. In addition, we assign a score to each document that indicates its likelihood to contain coverage on integrity risks.…”

Section: Identifying Media Coverage On Integrity Risksmentioning

confidence: 99%

Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence

Weichselbraun

Hörler

Hauser

et al. 2020

Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics

Self Cite

View full text Add to dashboard Cite

A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and germanspeaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks.Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context. CCS CONCEPTS• Information systems → Data analytics; • Computing methodologies → Neural networks; • Applied computing → Economics; Annotation.

show abstract

Section: Identifying Media Coverage On Integrity Risksmentioning

confidence: 99%

Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence

Weichselbraun

Hörler

Hauser

et al. 2020

Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Ristoski and Paulheim [19] suggest to deal with data problems in a separate data preprocessing step that handles missing values, identifies incorrect data, eliminates duplicates and performs conflict resolution. Weichselbraun and Kuntschik [29] discuss the impact of these data quality issues on knowledge extraction methods and investigate different mitigation strategies for the corresponding dimensions. They suggest to integrate these strategies into graph mining and information extraction methods and provide real-world use cases of such mitigation strategies.…”

Section: Linked Data Quality Issuesmentioning

confidence: 99%

Mining and Leveraging Background Knowledge for Improving Named Entity Linking

Weichselbraun

Kuntschik

Brașoveanu

2018

Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics

Self Cite

View full text Add to dashboard Cite

Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development. The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge. This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge. Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance.

show abstract

“…This high-quality healthcare data offers potential value for optimizing care delivery, but it is still perceived as a by-product of healthcare delivery, rather than a central asset source for competitive advantages [3]. Quality data, particularly concerning timeliness, completeness and accuracy are needed for a variety of purposes including health sector reviews, planning, programme monitoring, quality improvement and reporting [2] [4]. For this reason, it is critical to have high-quality data on performance in the health sector available routinely [2].…”

Section: Introductionmentioning

confidence: 99%

Assessment of Knowledge and Practices of Community Health Nurses on Data Quality in the Ho Municipality of Ghana

Zumah¹,

Niyi²,

Eweh³

et al. 2022

OJN

View full text Add to dashboard Cite

show abstract

Mitigating linked data quality issues in knowledge-intense information extraction methods

Cited by 3 publications

References 32 publications

Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence

Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence

Mining and Leveraging Background Knowledge for Improving Named Entity Linking

Assessment of Knowledge and Practices of Community Health Nurses on Data Quality in the Ho Municipality of Ghana

Contact Info

Product

Resources

About