Considerable amounts of geographical data are still collected not in form of GIS data but just as natural language texts. This paper proposes an approach for the automatic geocoding of itineraries described in natural language. This approach needs as an input a text annotated with part-of-speech and geosemantic tags. The proposed method is divided into three main steps. Firstly we build a complete graph where vertices represent locations, all vertices are connected to each other by undirected edges. We assign a weight to all the edges of the complete graph using a multi-criteria analysis approach. Then we compute a minimum spanning tree to obtain an undirected acyclic graph connecting all vertices. And finally, we transform this graph into a partially directed acyclic graph in order to identify the sequence of waypoints and build an approximation of a plausible footprint of the itinerary described. Additionally, the rationale of the proposed approach has been verified with a set of experiments on a corpus of hiking descriptions.
This paper proposes an approach for the reconstruction of itineraries extracted from narrative texts. This approach is divided into two main tasks. The first extracts geographical information with natural language processing. Its outputs are annotations of so called expanded entities and expressions of displacement or perception from hiking descriptions. In order to reconstruct a plausible footprint of an itinerary described in the text, the second task uses the outputs of the first task to compute a minimum spanning tree.
Fictive motion (e.g. 'The highway runs along the coast') is a pervasive phenomenon in language that can imply both a static and a moving observer. In a corpus of alpine narratives, it is used in three types of spatial descriptions: conveying the actual motion of the observer, describing a vista and communicating encyclopaedic spatial knowledge. This study takes a knowledge-based approach to develop rules for automated extraction and classification of these types based on an annotated corpus of fictive motion instances. In particular, we identify the differences in the set of concepts involved into the production of the three types of descriptions, followed by their linguistic operationalization. Based on that, we build a set of rules that classify fictive motion with an overall precision of 0.87 and recall of 0.71. The article highlights the importance of examining spatially rich, naturally occurring corpora for the lines of work dealing with the automated interpretation of spatial information in texts, as well as, more broadly, investigation of spatial language involved into various types of spatial discourse.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.