Trail route networks provide an infrastructure for touristic and recreational walking activities worldwide. They can have a variety of layouts, signage systems, development and management patterns, involving multiple stakeholders and contributors, and tend to be determined by various interests on different levels and dynamically changing circumstances. This paper aims to develop the skeleton of TRAILSIGNER, a sound geospatial conceptual data model suite of trail networks, waymarked routes and their signage systems and assets, which can be used as a basis for creating an information system for the effective, organic and consistent planning, management, maintenance and presentation of trails and their signage. This reduces potential confusion, mistrust and danger for visitors caused by information mismatches including incomplete, incoherent or inconsistent route signposting. To ensure consistency of incrementally planned signposts with each other and with the (possibly changing) underlying trail network, a systematic, set-based approach is developed using generative logical rules and incorporated into the conceptual model suite as signpost logics. The paper also defines a reference ruleset for it. This approach may further be generalized, personalized and adapted to other fields or applications having similar requirements or phenomena.
Abstract. Entity resolution (ER) is a computationally hard problem of data integration scenarios, where database records have to be grouped according to the real-world entities they belong to. In practice these entities may consist of only a few records from different data sources with typos or historical data. In other cases they may contain significantly more records, especially when we search for entities on a higher level of a concept hierarchy than records.In this paper we give theoretical foundation of a variety of practically important match functions. We show that under these formulations, ER with large entities can be solved efficiently with algorithms based on MapReduce, a distributed computing paradigm. Our algorithm can efficiently incorporate probabilistic and similarity-based record match, enabling flexible match function definition. We demonstrate the usability of our model and algorithm in a real-world insurance ER scenario, where we identify household groups of client records.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.