Proceedings of the 20th International Conference on Advances in Geographic Information Systems 2012
DOI: 10.1145/2424321.2424393
|View full text |Cite
|
Sign up to set email alerts
|

Multimedia multimodal geocoding

Abstract: This work is developed in the context of the placing task of the MediaEval 2011 initiative. The objective is to geocode (or geotag) a set of videos, i.e., automatically assign geographical coordinates to them. This paper presents an architecture for multimodal geocoding that exploits both visual and textual descriptions associated with videos. This work also describes our e↵orts regarding the implementation of this architecture to demonstrate its applicability. Conducted experiments show how our multimodal app… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2013
2013
2015
2015

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…Figure 5 depicts the median geotagging error (relative to the number of tags) of run1, run4 and two configurations of the approach that use the full YFCC100M dataset, one combining only the language model with feature selection and the second using all of the proposed refinements. The combination of all proposed refinements appears to result in the best geotagging accuracy in almost all tag ranges, except the [6,10] range where the base language model slightly outperforms the rest. Another noteworthy fact is that using the proposed improvements on the reduced training set (5M), i.e.…”
Section: Further Performance Analysismentioning
confidence: 98%
See 1 more Smart Citation
“…Figure 5 depicts the median geotagging error (relative to the number of tags) of run1, run4 and two configurations of the approach that use the full YFCC100M dataset, one combining only the language model with feature selection and the second using all of the proposed refinements. The combination of all proposed refinements appears to result in the best geotagging accuracy in almost all tag ranges, except the [6,10] range where the base language model slightly outperforms the rest. Another noteworthy fact is that using the proposed improvements on the reduced training set (5M), i.e.…”
Section: Further Performance Analysismentioning
confidence: 98%
“…The second one is Hiemstra's Language Model (HLM) with re-ranking, which combined the Terrier 5 Information Retrieval (IR) engine with the HLM weighting model. In [10], Li et al applied a combination of textual, visual and audio analysis in order to geocode the given image/videos. Further, they re-ranked items using the RL-Sim algorithm and predicted the location of the images by clustering the top-rated results.…”
Section: The Mediaeval 2014 Placing Taskmentioning
confidence: 99%
“…Estimating the geo-location of a video can be achieved based on visual features [17,14], textual information [18], social context [21], and their combinations [12,23,20]. Considering that some textual and social metadata used in multimodal frameworks are not always available in real life, visual approaches for video geocoding are important because they infer the geo-coordinates of a video using the visual content only.…”
Section: Introductionmentioning
confidence: 99%
“…In the most recent visual approaches for video geocoding, the Bag-of-Scenes (BoS) model has been shown to be simple and effective [17,12]. This approach first generates a dictionary of scenes, each of which represents a specific semantic concept.…”
Section: Introductionmentioning
confidence: 99%
“…Park et al [11] aim to find the heading of a query image by utilizing Google Street View TM . Li et al [9] and Gallagher et al [6] propose a multimodal geo-tagging system that exploits visual and textual descriptions associated with videos and photos, and show that this multimodal approach yields better results than those of a single modal approach. However, these methods suffer from non-uniform geographical spatial prior; crowd-sourced georeferenced images tend to be concentrated in urban areas, at popular land-marks, or around frequently-traveled places.…”
Section: Introductionmentioning
confidence: 99%