Nowadays an ever-increasing number of applications require complete and up-to-date spatial data, in particular maps. However, mapping is an expensive process and the vastness and dynamics of our world usually render centralized and authoritative maps outdated and incomplete. In this context crowd-sourced maps have the potential to provide a complete, up-to-date, and free representation of our world. However, the proliferation of such maps largely remains limited due to concerns about their data quality. While most of the current data quality assessment mechanisms for such maps require referencing to authoritative maps, we argue that such referencing of a crowd-sourced spatial database is ineffective. Instead we focus on the use of machine learning techniques that we believe have the potential to not only allow the assessment but also to recommend the improvement of the quality of crowd-sourced maps without referencing to external databases. This chapter gives an overview of these approaches.
Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.
Nowadays an ever-increasing number of applications require complete and up-to-date spatial data, in particular maps. However, mapping is an expensive process and the vastness and dynamics of our world usually render centralized and authoritative maps outdated and incomplete. In this context crowd-sourced maps have the potential to provide a complete, up-to-date, and free representation of our world. However, the proliferation of such maps largely remains limited due to concerns about their data quality. While most of the current data quality assessment mechanisms for such maps require referencing to authoritative maps, we argue that such referencing of a crowd-sourced spatial database is ineffective. Instead we focus on the use of machine learning techniques that we believe have the potential to not only allow the assessment but also to recommend the improvement of the quality of crowd-sourced maps without referencing to external databases. This chapter gives an overview of these approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.