Jacques Fize scite author profile

Textual data is available to an increasing extent through different media (social networks, companies data, data catalogues, etc.). New information extraction methods are needed since these new resources are highly heterogeneous. In this article, we propose a text matching process based on spatial features and assessed through heterogeneous textual data. Besides being compatible with heterogeneous data, it comprises two contributions: first, spatial information is extracted for comparison purposes and subsequently stored in a dedicated spatial textual representation (STR); and then two transformations are applied on STR to improve the spatial similarity estimation. This article outlines the proposed approach with new contributions: (i) a new geocoding methods using general co-occurrences between entities, and (ii) a thorough evaluation followed by (iii) an in-depth discussion. The results obtained on two corpora demonstrate that good spatial matches (≈ 80% precision on major criteria) can be obtained between the most similar STRs with further enhancement achieved via STR transformation.

show abstract

Deep Learning for Toponym Resolution: Geocoding Based on Pairs of Toponyms

Fize

Moncla

Martins

2021

IJGI

View full text Add to dashboard Cite

Geocoding aims to assign unambiguous locations (i.e., geographic coordinates) to place names (i.e., toponyms) referenced within documents (e.g., within spreadsheet tables or textual paragraphs). This task comes with multiple challenges, such as dealing with referent ambiguity (multiple places with a same name) or reference database completeness. In this work, we propose a geocoding approach based on modeling pairs of toponyms, which returns latitude-longitude coordinates. One of the input toponyms will be geocoded, and the second one is used as context to reduce ambiguities. The proposed approach is based on a deep neural network that uses Long Short-Term Memory (LSTM) units to produce representations from sequences of character n-grams. To train our model, we use toponym co-occurrences collected from different contexts, namely textual (i.e., co-occurrences of toponyms in Wikipedia articles) and geographical (i.e., inclusion and proximity of places based on Geonames data). Experiments based on multiple geographical areas of interest—France, United States, Great-Britain, Nigeria, Argentina and Japan—were conducted. Results show that models trained with co-occurrence data obtained a higher geocoding accuracy, and that proximity relations in combination with co-occurrences can help to obtain a slightly higher accuracy in geographical areas with fewer places in the data sources.

show abstract

Mapping Heterogeneous Textual Data: A Multidimensional Approach Based on Spatiality and Theme

Fize

Roche

Teisseire

2019

View full text Add to dashboard Cite

In this paper, we propose a multidimensional mapping approach for heterogeneous textual data that exploits firstly the spatial dimension and secondly the thematic dimension. Based on the Spatial Textual Representation (STR) as well as the Geodict geographic database, the contribution presented in this paper integrates the thematic dimension of documents. To support our proposal on mapping textual documents, we evaluate the different aspects of the process using two real corpora, including one corpus that is highly heterogeneous.

show abstract

Geodict: an integrated gazetteer

Fize¹,

Shrivastava²

2017

View full text Add to dashboard Cite

Spatial representations of texts from the BVLAC and PADI-WEB corpora

Fize¹

2018

View full text Add to dashboard Cite

Données pour l'évaluation de méthodes de géocodage

Fize¹

2018

View full text Add to dashboard Cite

Grand Débat National - Données enrichies

Sautot¹,

Chraibi²,

Fize³

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jacques Fize

Matching Heterogeneous Textual Data Using Spatial Features

Could spatial features help the matching of textual data?

Deep Learning for Toponym Resolution: Geocoding Based on Pairs of Toponyms

Mapping Heterogeneous Textual Data: A Multidimensional Approach Based on Spatiality and Theme

Geodict: an integrated gazetteer

Spatial representations of texts from the BVLAC and PADI-WEB corpora

Données pour l'évaluation de méthodes de géocodage

Grand Débat National - Données enrichies

Contact Info

Product

Resources

About