Nowadays there is an increasing volume of heterogeneous documents in a company, in different media such as texts, spreadsheets, images, maps, etc. Most of these documents contain relevant information about the company; hence they need to be stored, indexed and retrieved efficiently. Digital libraries aim to address this issue by providing an infrastructure which enables to manipulate these documents. Furthermore, the use of geographical location of such documents has become more attractive recently, especially due to the dissemination of GPS units in many devices. Thus, these documents may have spatial footprints which allow for the enhancement of the quality of search in such digital libraries. This paper presents Freebie, an open source digital library, which provides both textual and spatial indexing on underlying documents. The spatial metadata may be used regarding the subject and author location; and the place of the publication.
It is well known that documents available on the Web are extremely heterogeneous in several aspects, such as the use of various idioms, different formats to represent the contents, besides other external factors like source reputation, refresh frequency, and so forth (Page & Brin, 1998). Altogether, these factors increase the complexity of Web information retrieval systems. Superficially, traditional search engines available on the Web nowadays consist of retrieving documents that contain keywords informed by users. Nevertheless, among the variety of search possibilities, it is evident that the user needs a process that involves more sophisticated analysis; for example, temporal or spatial contextualization might be considered. In these keyword-based search engines, for instance, a Web page containing the phrase “…due to the company arrival in London, a thousand java programming jobs will be open…” would not be found if the submitted search was “jobs programming England,” unless the word “England” appeared in another phrase of the page. The explanation to this fact is that the term “London” is treated merely like another word, instead of regarding its geographical position. In a spatial search engine, the expected behavior would be to return the page described in the previous example, since the system shall have information indicating that the term “London” refers to a city located in a country referred to by the term “England.” This result could only be feasible in a traditional search engine if the user repeatedly submitted searches for all possible England sub-regions (e.g., cities). In accordance with the example, it is reasonable that for several user searches, the most interesting results are those related to certain geographical regions. A variety of features extraction and automatic document classification techniques have been proposed, however, acquiring Web-page geographical features involves some peculiar complexities, such as ambiguity (e.g., many places with the same name, various names for a single place, things with place names, etc.). Moreover, a Web page can refer to a place that contains or is contained by the one informed in the user query, which implies knowing the different region topologies used by the system. Many features related to geographical context can be added to the process of elaborating relevance ranking for returned documents. For example, a document can be more relevant than another one if its content refers to a place closer to the user location. Nonetheless, in spatial search engines, there are more complex issues to be considered because of the spatial dimension concerning on ranking elaboration. Jones, Alani, and Tudhope (2001) propose a combination of Euclidian distance between place centroids with hierarchical distances in order to generate a hybrid spatial distance that may be used in the relevance ranking elaboration of returned documents. Further important issues are the indexing mechanisms and query processing. In general, these solutions try to combine well-known textual indexing techniques (e.g., inverted files) with spatial indexing mechanisms. On the subject of user interface, spatial search engines are more complex, because users need to choose regions of interest, as well as possible spatial relationships, in addition to keywords. To visualize the results, it is pleasant to use digital map resources besides textual information.
A Web 2.0 permite que os usuários publiquem livremente informações sobre diversos tópicos. Os setores públicos podem tirar vantagem deste fato criando canais de comunicação com a sociedade. Entretanto é necessário prover um ambiente adequado para inserir estes setores em sites de conteúdo social. Esse artigo apresenta uma abordagem baseada em ontologias para a modelagem semântica e geográfica de sites de conteúdo social para setores públicos e um protótipo construído utilizando o modelo proposto.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.