A visual approach for video geocoding using bag-of-scenes

Penatti, Otávio Augusto Bizetto; Li, Lin Tzy; Almeida, Jurandy; Torres, Ricardo da Silva

doi:10.1145/2324796.2324857

Cited by 29 publications

(34 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A similar conclusion follows from the multimodal approach demonstrated and evaluated in [11]. In contrast, [24] discusses method using only visual information. A novel high-level representation for videos, called bag-of-scenes, is proposed.…”

Section: Finding Locations Of Resourcessupporting

confidence: 58%

Georeferencing Flickr resources based on textual meta-data

Laere

Schockaert

Dhoedt

2013

Information Sciences

View full text Add to dashboard Cite

The task of automatically estimating the location of web resources is of central importance in location-based services on the Web. Much attention has been focused on Flickr photos and videos, for which it was found that language modeling approaches are particularly suitable. In particular, state-of-the art systems for georeferencing Flickr photos tend to cluster the locations on Earth in a relatively small set of disjoint regions, apply feature selection to identify location-relevant tags, then use a form of text classification to identify which area is most likely to contain the true location of the resource, and finally attempt to find an appropriate location within the identified area. In this paper, we present a systematic discussion of each of the aforementioned components, based on the lessons we have learned from participating in the 2010 and 2011 editions of MediaEval's Placing Task. Extensive experimental results allow us to analyze why certain methods work well on this task and show that a median error of just over 1 kilometer can be achieved on a standard benchmark test set.

show abstract

Section: Finding Locations Of Resourcessupporting

confidence: 58%

Georeferencing Flickr resources based on textual meta-data

Laere

Schockaert

Dhoedt

2013

Information Sciences

View full text Add to dashboard Cite

show abstract

“…Papers included the work of Claudia Hauff at TU Delft on exploiting information extracted from microblog posts (i.e., Twitter) and geographical priors [19,20]. Also, ICSI developed a graph-based approach aimed at dealing with data sparsity [6], and UniCamp extended their approach [46]. Olivier Van Laere and colleagues from the University of Ghent published a systematic overview of their approaches to the MediaEval 2010 and 2011 Placing Tasks as [62].…”

Section: Results and Insightsmentioning

confidence: 99%

The Benchmark as a Research Catalyst: Charting the Progress of Geo-prediction for Social Multimedia

Larson

Kelm²,

Rae

et al. 2014

Multimodal Location Estimation of Videos and Images

View full text Add to dashboard Cite

show abstract

“…At its core, the geolocation problem is about assigning coordinates to an unannotated image or a video [8]. Hays and Efros [5] showed that a simple scene matching approach can achieve respectable performance.…”

Section: A Geolocationmentioning

confidence: 99%

“…These clusters are either done on the location information [7], [8], [9] or on the visual features [6], using such methods as k-means or meanshift. In this paper we propose a new on-line unsupervised clustering algorithm, called Location Aware Self-Organizing Map (LASOM), that can be used for estimating the density distribution of one variable conditioned on another.…”

Section: Introductionmentioning

confidence: 99%

LASOM: Location Aware Self-Organizing Map for discovering similar and unique visual features of geographical locations

Kit

Kong

2014

2014 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

Can a machine tell us if an image was taken in Beijing or New York? Automated identification of the geographical coordinates based on image content is of particular importance to data mining systems, because geolocation provides a large source of context for other useful features of an image. However, successful localization of unannotated images requires a large collection of images that cover all possible locations. Brute-force searches over the entire databases are costly in terms of computation and storage requirements, and achieve limited results. Knowing what visual features make a particular location unique or similar to other locations can be used for choosing a better match between spatially distance locations. However, doing this at global scales is a challenging problem. In this paper we propose an on-line, unsupervised, clustering algorithm called Location Aware Self-Organizing Map (LASOM), for learning the similarity graph between different regions. The goal of LASOM is to select key features in specific locations so as to increase the accuracy in geotagging untagged images, while also reducing computational and storage requirements. Different from other Self-Organizing Map algorithms, LASOM provides the means to learn a conditional distribution of visual features, conditioned on geospatial coordinates. We demonstrate that the generated map not only preserves important visual information, but provides additional context in the form of visual similarity relationships between different geographical areas. We show how this information can be used to improve geotagging results when using large databases.

show abstract

A visual approach for video geocoding using bag-of-scenes

Cited by 29 publications

References 33 publications

Georeferencing Flickr resources based on textual meta-data

Georeferencing Flickr resources based on textual meta-data

The Benchmark as a Research Catalyst: Charting the Progress of Geo-prediction for Social Multimedia

LASOM: Location Aware Self-Organizing Map for discovering similar and unique visual features of geographical locations

Contact Info

Product

Resources

About