Abstract.With the expanding of the Semantic Web and the availability of numerous ontologies which provide domain background knowledge and semantic descriptors to the data, the amount of semantic data is rapidly growing. The data mining community is faced with a paradigm shift: instead of mining the abundance of empirical data supported by the background knowledge, the new challenge is to mine the abundance of knowledge encoded in domain ontologies, constrained by the heuristics computed from the empirical data collection. We address this challenge by an approach, named semantic data mining, where domain ontologies define the hypothesis search space, and the data is used as means of constraining and guiding the process of hypothesis search and evaluation. The use of prototype semantic data mining systems SEGS and g-SEGS is demonstrated in a simple semantic data mining scenario and in two reallife functional genomics scenarios of mining biological ontologies with the support of experimental microarray data.
Recent research in coarse geometry revealed similarities between certain
concepts of analysis, large scale geometry, and topology. Property A of G.Yu is
the coarse analog of amenability for groups and its generalization (exact
spaces) was later strengthened to be the large scale analog of paracompact
spaces using partitions of unity. In this paper we go deeper into divulging
analogies between coarse amenability and paracompactness. In particular, we
define a new coarse analog of paracompactness modelled on the defining
characteristics of expanders. That analog gives an easy proof of three
categories of spaces being coarsely non-amenable: expander sequences, graph
spaces with girth approaching infinity, and unions of powers of a finite
non-trivial group.Comment: 24 pages, version 3 as a result of comments by a great refere
This paper addresses semantic data mining, a new data mining paradigm in which ontologies are exploited in the process of data mining and knowledge discovery. This paradigm is introduced together with new semantic subgroup discovery systems SDM-search for enriched gene sets (SEGS) and SDM-Aleph. These systems are made publicly available in the new SDM-Toolkit for semantic data mining. The toolkit is implemented in the Orange4WS data mining platform that supports knowledge discovery workflow construction from local and distributed data mining services. On the basis of the experimental evaluation of semantic subgroup discovery systems on two publicly available biomedical datasets, the paper results in a thorough quantitative and qualitative evaluation of SDM-SEGS and SDM-Aleph and their comparison with SEGS, a system for enriched gene set discovery from microarray data.
We propose a general framework for geometric approximation of circular arcs by parametric polynomial curves. The approach is based on constrained uniform approximation of an error function by scalar polynomials. The system of nonlinear equations for the unknown control points of the approximating polynomial given in the Bézier form is derived and a detailed analysis provided for some low degree cases which might be important in practice. At least for these cases the solutions can be, in principal, written in a closed form, and provide the best known approximants according to the radial distance. A general conjecture on the optimality of the solution is stated and several numerical examples conforming theoretical results are given.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.