SummaryLocation plays a fundamental role in human cognition and communication, certainly in this era of social media in which people have freedom to communicate anytime and from anywhere with current communication technology. This opportunity to communicate with text messages or through online social media such as twitter, blogs or facebook allows, for instance, sharing travel experiences.Geographic Information Systems are well equipped to handle locational information expressed in longitude and latitude, however, they cannot convert the geographic information present in the text to useful map information. Such textual information is known to have existed for over 250 years with over a billion biological specimens collected, all providing information about the collection locality of the specimen, but not always providing coordinates. This is one of the barriers in using these descriptions for spatial analysis. With some effort of interpretation, one might be able to understand and geocode these locations. Geocoding from textual descriptions is important because it allows to address textual ambiguity and, once the geocoding is done, no other geographic identifier is required.In a broader perspective the research project reported here aims to understand how humans communicate about location information using semi-structured text and how technology can aid in understanding and spatially representing it. For this purpose, real-world data from the published Ornithological Gazetteer of Brazil was used. In this gazetteer, localities are described using a number of statements that can be interpreted as spatial hints as to position. We identify those hints and their components, which need to be extracted and stored in a structured format. To do so, techniques of natural language processing and information extraction are used to understand the syntactic structure of the descriptions, based on which extraction patterns are developed per hint type. Upon extraction, these hints are translated to spatial representations.Some hints allow us to represent crisp boundaries as vector representation, whereas others are represented using a probability raster approach. Using these two representation types, hints were converted into their relevant spatial representations and for an entry description, these were combined to derive the common area where the locality at i Summary hand is expected to fall. By carrying out this methodology for those entries with available geocodes, we are able to evaluate the accuracy of our results for this gazetteer. The approach presented in the thesis is generic and can be applied to other similar text sources.ii
SamenvattingLocatie heeft een belangrijke rol in menselijke cognitie en communicatie, vooral ook in het huidige tijdsgewricht waarin sociale media de mens de vrijheid verleent op ieder moment en iedere locatie te communiceren. Deze verworvenheid van communiceren met korte tekstberichten of via sociale media zoals twitter, blogs of facebook, stelt de mens bijvoorbeeld in staat reiservaringen ui...