In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, may deteriorate search precision and thus needs to be detected properly.In this paper, we analyze a small set of shallow text features for classifying the individual text elements in a Web page. We compare the approach to complex, stateof-the-art techniques and show that competitive accuracy can be achieved, at almost no cost. Moreover, we derive a simple and plausible stochastic model for describing the boilerplate creation process. With the help of our model, we also quantify the impact of boilerplate removal to retrieval performance and show significant improvements over the baseline. Finally, we extend the principled approach by straight-forward heuristics, achieving a remarkable accuracy.
We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity Resolution. To make this quadratic task efficient, blocking techniques are typically employed. However, the high dynamics, loose schema binding, and heterogeneity of (semi-)structured data, impose new challenges to entity resolution. Existing blocking approaches become inapplicable because they rely on the homogeneity of the considered data and a-priory known schemata. In this paper, we introduce a novel approach for entity resolution, scaling it up for large, noisy, and heterogeneous information spaces. It combines an attribute-agnostic mechanism for building blocks with intelligent block processing techniques that boost blocks with high expected utility, propagate knowledge about identified matches, and preempt the resolution process when it gets too expensive. Our extensive evaluation on real-world, large, heterogeneous data sets verifies that the suggested approach is both effective and efficient.
Web services have a potential to enhance B2B ecommerce over the Internet by allowing companies and organizations to publish their business processes on service directories where potential trading partners can find them. This can give rise to new business paradigms based on ad-hoc trading relations as companies, particularly small to medium scale, can cheaply and flexibly enter into fruitful contracts, e.g., through subcontracting from big companies by simply publishing their business processes and the services they offer. More business process support by the web service infrastructure is however needed before such a paradigm change can materialize. A service for searching and matchmaking of business processes does not yet exist in the current infrastructure. We believe that such a service is needed and will enable companies and organizations to be able to establish ad-hoc business relations without relying on manually negotiated interorganizational workflows. This paper gives a formal semantics to business process matchmaking based on finite state automata extended by logical expressions associated to states.
XML has evolved to the format of choice for exposing data over the web. Together with mature and maturing standards for querying XML (XSLT, XPath, and XQuery) the basic infrastructure for integrating multiple heterogeneous data sources is there. However, the versatility of XML as a data model and the unrestricted expressive power of XML query languages can lead to rather complex integration architectures, where low level syntactic heterogeneities and semantic heterogeneities are overcome all at once by means of complex query expressions. This paper explores how the Web Ontology Language OWL can be used as a more abstract modelling layer on top of XML data sources, described by an XML Schema, to which extent the semantic relationships provided by OWL can be used for mapping heterogeneous data sources to a common global schema, and how the inference mechanisms of OWL can be used to check the consistency of such mappings. Moreover, it introduces a query language for OWL as a natural extension of XQuery, and describes how these queries against a global schema are translated to XQueries against the original data sources.
A series of thioether profragrances was prepared by reaction of different sulfanylalkanoates with δ-damascone and tested for their release efficiencies in a fabric-softener and an all-purpose cleaner application. Dynamic headspace analysis on dry cotton and on a ceramic plate revealed that the performance of the different precursors depended on the structure, but also on the particular conditions encountered in different applications. Moreover, profragrances derived from other α,β-unsaturated fragrance aldehydes and ketones were synthesized analogously and evaluated using the same test protocol. Thioethers were found to be suitable precursors to release the corresponding fragrances, but neither the quantity of profragrance deposited from an aqueous environment onto the target surface, nor the amount of fragrance released after deposition could be linearly correlated to the hydrophilicity or hydrophobicity of the compounds. Different sets of compounds were found to be the best performers for different types of applications. Only one of the compounds evaluated in the present work, namely the thiolactic acid derivative of δ-damascone, efficiently released the corresponding fragrance in both of the tested applications. Profragrance development for functional perfumery thus remains a partially empirical endeavour. More knowledge (and control) of the various application conditions are required for an efficient profragrance design.
We present an approach to determine the similarity of classes which utilized fuzzy and incomplete terminological knowledge together with schema knowledge. We clearly distinguish between semantic similarity determining the degree of resemblance according to real world semantics, and structural correspondence explaining how classes can actually be interrelated. To compute the semantic similarity we introduce the notion of semantic relevance and apply fuzzy set theory to reason about both terminological knowledge and schema knowledge.
Abstract-Developing control methods that allow legged robots to move with skill and agility remains one of the grand challenges in robotics. In order to achieve this ambitious goal, legged robots must possess a wide repertoire of motor skills. A scalable control architecture that can represent a variety of gaits in a unified manner is therefore desirable. Inspired by the motor learning principles observed in nature, we use an optimization approach to automatically discover and fine-tune parameters for agile gaits. The success of our approach is due to the controller parameterization we employ, which is compact yet flexible, therefore lending itself well to learning through repetition. We use our method to implement a flying trot, a bound and a pronking gait for StarlETH, a fully autonomous quadrupedal robot.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.