Automatic gazetteer enrichment with user-geocoded data

Gelernter, Judith; Ganesh, Gautam; Krishnakumar, Hamsini; Zhang, Wei

doi:10.1145/2534732.2534736

Cited by 19 publications

(11 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The unsupervised method surpasses the supervised method when the overlap ratio is less than 60% (when the overlap ratio is at 0.6, CHF still outperforms CustomAdaptive with a 1% margin). This observation confirms that the unsupervised technique, namely CHF, can handle 15 Among the datasets used in TopoCluster paper [8], LG L is the only dataset to which we have access 16 Mean error distance for TopoCluster in LG L is derived from the original paper [8]. unknown data better than the supervised method, namely Adaptive (CustomAdaptive implementation).…”

Section: Unseen Data Analysissupporting

confidence: 63%

“…Provided that it co-occurred with either Alberta or Canada, we can pinpoint it (i.e., the city of Edmonton located in Canada). For each toponym t i , the preliminary disambiguation measures a score for each interpretation l i, j (lines 8-13) and picks the interpretation with maximum score (lines [14][15] and in case of tie, the most populous interpretation is selected (lines [16][17][18]. The score is acquired by finding the maximum similarity between l i, j mentions and its ancestors' mentions; similarity here is the inverse of term distance (line 11), as used by Yu and Rafiei [37].…”

Section: Algorithm 1: Preliminary Toponym Disambiguation In Cbhmentioning

confidence: 99%

See 1 more Smart Citation

A Coherent Unsupervised Model for Toponym Resolution

Kamalloo

Rafiei

2018

Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18

View full text Add to dashboard Cite

Toponym Resolution, the task of assigning a location mention in a document to a geographic referent (i.e., latitude/longitude), plays a pivotal role in analyzing location-aware content. However, the ambiguities of natural language and a huge number of possible interpretations for toponyms constitute insurmountable hurdles for this task. In this paper, we study the problem of toponym resolution with no additional information other than a gazetteer and no training data. We demonstrate that a dearth of large enough annotated data makes supervised methods less capable of generalizing. Our proposed method estimates the geographic scope of documents and leverages the connections between nearby place names as evidence to resolve toponyms. We explore the interactions between multiple interpretations of mentions and the relationships between different toponyms in a document to build a model that finds the most coherent resolution. Our model is evaluated on three news corpora, two from the literature and one collected and annotated by us; then, we compare our methods to the state-of-the-art unsupervised and supervised techniques. We also examine three commercial products including Reuters OpenCalais, Yahoo! YQL Placemaker, and Google Cloud Natural Language API. The evaluation shows that our method outperforms the unsupervised technique as well as Reuters OpenCalais and Google Cloud Natural Language API on all three corpora; also, our method shows a performance close to that of the state-of-the art supervised method and outperforms it when the test data has 40% or more toponyms that are not seen in the training data.

show abstract

Section: Unseen Data Analysissupporting

confidence: 63%

Section: Algorithm 1: Preliminary Toponym Disambiguation In Cbhmentioning

confidence: 99%

A Coherent Unsupervised Model for Toponym Resolution

Kamalloo

Rafiei

2018

Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18

View full text Add to dashboard Cite

show abstract

“…While such a platform is useful, it can be challenging to constantly encourage people to contribute, especially over a long time period. In another study, Gelernter et al (2013) proposed a matching algorithm which can compare the tags in OpenStreetMap and Wikimapia with the place entries in a gazetteer, and can add the place information that are not contained in a gazetteer. Our work aligns with the general direction of these two studies, but utilizes geotagged housing advertisements posted on local-oriented websites for harvesting local place names.…”

Section: Related Workmentioning

confidence: 99%

A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements

Mao

McKenzie

2018

International Journal of Geographical Information Science

View full text Add to dashboard Cite

Local place names are frequently used by residents living in a geographic region. Such place names may not be recorded in existing gazetteers, due to their vernacular nature, relative insignificance to a gazetteer covering a large area (e.g., the entire world), recent establishment (e.g., the name of a newly-opened shopping center), or other reasons. While not always recorded, local place names play important roles in many applications, from supporting public participation in urban planning to locating victims in disaster response. In this paper, we propose a computational framework for harvesting local place names from geotagged housing advertisements. We make use of those advertisements posted on local-oriented websites, such as Craigslist, where local place names are often mentioned. The proposed framework consists of two stages: natural language processing (NLP) and geospatial clustering. The NLP stage examines the textual content of housing advertisements, and extracts place name candidates. The geospatial stage focuses on the coordinates associated with the extracted place name candidates, and performs multi-scale geospatial clustering to filter out the non-place names. We evaluate our framework by comparing its performance with those of six baselines. We also compare our result with four existing gazetteers to demonstrate the not-yet-recorded local place names discovered by our framework.

show abstract

“…A common intuition is that users often mention places that are near their current location. Several approaches have been presented to automatically geolocate non-geotagged textual clips using textual content [5,7,11,23,29]. Most of these methods rely on a training phase, during which they construct language models, in order to probabilistically infer the location of unseen messages.…”

Section: Related Workmentioning

confidence: 99%

A Pipeline for Measuring Brand Loyalty Through Social Media Mining

Samoaa

Catania

2021

SOFSEM 2021: Theory and Practice of Computer Science

View full text Add to dashboard Cite

Enhancing customer relationships through social media is an area of high relevance for companies. To this aim, Social Business Intelligence (SBI) plays a crucial role by supporting companies in combining corporate data with user-generated content, usually available as textual clips on social media. Unfortunately, SBI research is often constrained by the lack of publicly-available, real-world data for experimental activities. In this paper, we describe our experience in extracting social data and processing them through an enrichment pipeline for brand analysis. As a first step, we collect texts from social media and we annotate them based on predefined metrics for brand analysis, using features such as sentiment and geolocation. Annotations rely on various learning and natural language processing approaches, including deep learning and geographical ontologies. Structured data obtained from the annotation process are then stored in a distributed data warehouse for further analysis. Preliminary results, obtained from the analysis of three well known ICT brands, using data gathered from Twitter, news portals, and Amazon product reviews, show that different evaluation metrics can lead to different outcomes, indicating that no single metric is dominant for all brand analysis use cases.

show abstract

Automatic gazetteer enrichment with user-geocoded data

Cited by 19 publications

References 29 publications

A Coherent Unsupervised Model for Toponym Resolution

A Coherent Unsupervised Model for Toponym Resolution

A natural language processing and geospatial clustering framework for harvesting local place names from geotagged housing advertisements

A Pipeline for Measuring Brand Loyalty Through Social Media Mining

Contact Info

Product

Resources

About