Proceedings of the 2005 Workshop on Geographic Information Retrieval 2005
DOI: 10.1145/1096985.1096992
|View full text |Cite
|
Sign up to set email alerts
|

Extracting metadata for spatially-aware information retrieval on the internet

Abstract: This paper presents methods used to extract geospatial information from web pages for use in SPIRIT, a new Geographic Information Retrieval (GIR) system for the web. The resulting geospatial markup tools have been used to annotate around 900,000 web pages taken from a 1TB web crawl, focused on regions in the UK, France, Germany and Switzerland. This paper discusses a versatile geo-parsing tool for extracting spatial metadata based upon the GATE Information Extraction (IE) system, and a simple geo-coding progra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2006
2006
2017
2017

Publication Types

Select...
3
3
1

Relationship

2
5

Authors

Journals

citations
Cited by 62 publications
(46 citation statements)
references
References 12 publications
0
46
0
Order By: Relevance
“…The primary reason is simple -the number of georeferenced documents is relatively small -the SPIRIT collection for the whole of the UK, for example, consists of only around 340,000 documents. Further, as discussed in Clough (2005) not all documents are correctly georeferenced (around 90% in the experiments reported by Clough).…”
Section: Querymentioning
confidence: 89%
See 3 more Smart Citations
“…The primary reason is simple -the number of georeferenced documents is relatively small -the SPIRIT collection for the whole of the UK, for example, consists of only around 340,000 documents. Further, as discussed in Clough (2005) not all documents are correctly georeferenced (around 90% in the experiments reported by Clough).…”
Section: Querymentioning
confidence: 89%
“…We used a default sense approach and global geographical world knowledge to resolve ambiguity based on features from the geographical resources available to us. Again, based on the evaluation described in Clough (2005), we were able to ground correctly around 89% of all place names. Locations were assigned an appropriate bounding box, representing a spatial extent derived from polygonal data stored in the geo-ontology, since the overheads associated with passing polygons through the system were too high.…”
Section: Assigning Spatial Footprints To Web Documentsmentioning
confidence: 97%
See 2 more Smart Citations
“…The General Architecture for Text Engineering (GATE) system [5] was used to extract geographical information (as used in previous work [6]). The Ordnance Survey (OS) 50k Gazetteer and Locator resources were used to assist extraction of locations and assignment of spatial coordinates.…”
Section: Processing Contentmentioning
confidence: 99%