Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics 2017
DOI: 10.1145/3102254.3102272
|View full text |Cite
|
Sign up to set email alerts
|

Mitigating linked data quality issues in knowledge-intense information extraction methods

Abstract: Advances in research areas such as named entity linking and sentiment analysis have triggered the emergence of knowledge-intensive information extraction methods that combine classical information extraction with background knowledge from the Web. Despite data quality concerns, linked data sources such as DBpedia, GeoNames and Wikidata which encode facts in a standardized structured format are particularly attractive for such applications.This paper addresses the problem of data quality by introducing a framew… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 32 publications
(41 reference statements)
0
3
0
Order By: Relevance
“…The Integrity Risks Monitor draws upon documents obtained from popular Austrian, German, Swiss, U.K. and U.S. media outlets that are pre-processed, analyzed and semantically enriched using methods such as part-of-speech tagging, dependency parsing [21], sentiment analysis [18], keyword analysis [20] and named entity linking [19]. In addition, we assign a score to each document that indicates its likelihood to contain coverage on integrity risks.…”
Section: Identifying Media Coverage On Integrity Risksmentioning
confidence: 99%
“…The Integrity Risks Monitor draws upon documents obtained from popular Austrian, German, Swiss, U.K. and U.S. media outlets that are pre-processed, analyzed and semantically enriched using methods such as part-of-speech tagging, dependency parsing [21], sentiment analysis [18], keyword analysis [20] and named entity linking [19]. In addition, we assign a score to each document that indicates its likelihood to contain coverage on integrity risks.…”
Section: Identifying Media Coverage On Integrity Risksmentioning
confidence: 99%
“…Ristoski and Paulheim [19] suggest to deal with data problems in a separate data preprocessing step that handles missing values, identifies incorrect data, eliminates duplicates and performs conflict resolution. Weichselbraun and Kuntschik [29] discuss the impact of these data quality issues on knowledge extraction methods and investigate different mitigation strategies for the corresponding dimensions. They suggest to integrate these strategies into graph mining and information extraction methods and provide real-world use cases of such mitigation strategies.…”
Section: Linked Data Quality Issuesmentioning
confidence: 99%
“…This high-quality healthcare data offers potential value for optimizing care delivery, but it is still perceived as a by-product of healthcare delivery, rather than a central asset source for competitive advantages [3]. Quality data, particularly concerning timeliness, completeness and accuracy are needed for a variety of purposes including health sector reviews, planning, programme monitoring, quality improvement and reporting [2] [4]. For this reason, it is critical to have high-quality data on performance in the health sector available routinely [2].…”
Section: Introductionmentioning
confidence: 99%