Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Data Cloud

Hellmann, Sebastian; Brekle, Jonas; Auer, Sören

doi:10.1007/978-3-642-37996-3_13

Cited by 6 publications

(7 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are a large number of further crowd-sourced content repositories and DBpedia already had an impact on their structured data publishing and interlinking. Two examples are Wiktionary with the Wiktionary extraction [19] meanwhile becoming part of DBpedia and LinkedGeoData [41], which aims to implement similar data extraction, publishing and linking strategies for OpenStreetMaps.…”

Section: Discussionmentioning

confidence: 99%

“…The overall number of links pointing to DBpedia from other data sets is 39,007,478 according to the Data Hub. 19 However, those counts are entered by users and may not always be valid and up-to-date.…”

Section: Incoming Linksmentioning

confidence: 99%

“…Due to its fast changing nature, together with the fragmentation of the project into Wiktionary language editions (WLE) with independent layout rules a, configurable mediator/wrapper approach is taken for its automated transformation into a structured knowledge base. The workflow of this dedicated Wiktionary extractor being part of the Wiktionary2RDF [19] project is as follows: For every WLE to be transformed an XML configuration file is provided as input. This configuration is used by the Wiktionary extractor, invoked by the DBpedia extraction framework, to first generate a schema reflecting the configured page structure (wrapper part).…”

Section: Applications Of the Extraction Framework: Wiktionary Extractionmentioning

confidence: 99%

See 2 more Smart Citations

DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia

Lehmann

Isele²,

Jakob³

et al. 2015

Semantic Web

Self Cite

2,300

1,158

View full text Add to dashboard Cite

The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of Wikipedia. The largest DBpedia knowledge base which is extracted from the English edition of Wikipedia consists of over 400 million facts that describe 3.7 million things. The DBpedia knowledge bases that are extracted from the other 110 Wikipedia editions together consist of 1.46 billion facts and describe 10 million additional things. The DBpedia project maps Wikipedia infoboxes from 27 different language editions to a single shared ontology consisting of 320 classes and 1,650 properties. The mappings are created via a worldwide crowd-sourcing effort and enable knowledge from the different Wikipedia editions to be combined. The project publishes regular releases of all DBpedia knowledge bases for download and provides SPARQL query access to 14 out of the 111 language editions via a global network of local DBpedia chapters. In addition to the regular releases, the project maintains a live knowledge base which is updated whenever a page in Wikipedia changes. DBpedia sets 27 million RDF links pointing into over 30 external data sources and thus enables data from these sources to be used together with DBpedia data. Several hundred data sets on the Web publish RDF links pointing to DBpedia themselves and thus make DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud. In this system report, we give an overview of the DBpedia community project, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Incoming Linksmentioning

confidence: 99%

Section: Applications Of the Extraction Framework: Wiktionary Extractionmentioning

confidence: 99%

See 1 more Smart Citation

DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia

Lehmann

Isele²,

Jakob³

et al. 2015

Semantic Web

Self Cite

2,300

1,158

View full text Add to dashboard Cite

show abstract

“…Indeed, there is a chance that URIs of lexical senses may change between two versions as word senses may be reordered in the original Wiktionary data. 6 We believe that such changes are rather unfrequent, but we still have to find out a way to cope with them.…”

Section: Interlinkingmentioning

confidence: 99%

DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF

Sérasset

2015

View full text Add to dashboard Cite

Contributive resources, such as Wikipedia, have proved to be valuable to Natural Language Processing or multilingual Information Retrieval applications. This work focusses on Wiktionary, the dictionary part of the resources sponsored by the Wikimedia foundation. In this article, we present our extraction of multilingual lexical data from Wiktionary data and to provide it to the community as a Multilingual Lexical Linked Open Data (MLLOD). This lexical resource is structured using the LEMON Model.

show abstract

“…For lemon, the maintainers implemented a Python validator 9 , which enables us to directly compare our efforts to a software validator. For NIF there was an early prototype of RDFUnit that used only manual SPARQL test cases.…”

Section: Test Case Implementation For Linguistic Ontologiesmentioning

confidence: 99%

NLP Data Cleansing Based on Linguistic Ontology Constraints

Kontokostas

Brümmer

Hellmann

et al. 2014

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is -compared to other domains, such as biology -a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues.

show abstract

Leveraging the Crowdsourcing of Lexical Resources for Bootstrapping a Linguistic Data Cloud

Cited by 6 publications

References 12 publications

DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia

DBpedia – A large-scale, multilingual knowledge base extracted from Wikipedia

DBnary: Wiktionary as a Lemon-based multilingual lexical resource in RDF

NLP Data Cleansing Based on Linguistic Ontology Constraints

Contact Info

Product

Resources

About