Information sources on the Internet (e.g., Web versions of newspapers) usually have an implicit spatial reader scope, which is the geographical location for which the content has been primarily produced. Knowledge of the spatial reader scope facilitates the construction of a news search engine that provides readers a set of news sources relevant to the location in which they are interested. In particular, it plays an important role in disambiguating toponyms (e.g., textual specifications of geographical locations) in news articles, as the process of selecting an interpretation for the toponym often reduces to one of selecting an interpretation that seems natural in the context of the spatial reader scope. The key to determining the spatial reader scope of news sources is the notion of local lexicon, which for a location s is a set of concepts such as, but not limited to, names of people, landmarks, and historical events, that are spatially related to s. Techniques to automatically generate the local lexicon of a location by using the link structure of Wikipedia are described and evaluated. A key contribution is the improvement of existing methods used in the semantic relatedness domain to extract concepts spatially related to a given location from the Wikipedia. Results of experiments are presented that indicate that the knowledge of the spatial reader scope significantly improves the disambiguation of textually specified locations in news articles and that using local lexicons is an effective method to determine the spatial reader scopes of news sources.
The Web is rich of tables (e.g., HTML tables, spreadsheets, Google Fusion Tables) that host a considerable wealth of high-quality relational data. Unlike unstructured texts, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which determines its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell.In this paper, we focus on the problem of discovering and annotating entities in tables. More specifically, we describe an algorithm that identifies the rows of a table that contain information on entities of specific types (e.g., restaurant, museum, theatre) derived from an ontology and determines the cells in which the names of those entities occur. We implemented this algorithm while developing a faceted browser over a repository of RDF data on points of interest of cities that we extracted from Google Fusion Tables.We claim that our algorithm complements the existing approaches, which annotate entities in a table based on a pre-compiled reference catalogue that lists the types of a finite set of entities; as a result, they are unable to discover and annotate entities that do not belong to the reference catalogue. Instead, we train our algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type.
Social Networking Sites, such as Facebook and LinkedIn, are clear examples of the impact that the Web 2.0 has on people around the world, because they target an aspect of life that is extremely important to anyone: social relationships. The key to building a social network is the ability of finding people that we know in real life, which, in turn, requires those people to make publicly available some personal information, such as their names, family names, locations and birth dates, just to name a few. However, it is not uncommon that individuals create multiple profiles in several social networks, each containing partially overlapping sets of personal information. Matching those different profiles allows to create a global profile that gives a holistic view of the information of an individual. In this paper, we present an algorithm that uses the network topology and the publicly available personal information to iteratively match profiles across n social networks, based on those individuals who disclose the links to their multiple profiles. The evaluation results, obtained on a real dataset composed of around 2 million profiles, show that our algorithm achieves a high accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.