Abstract. Entries in biomolecular databases are often annotated with concepts from different ontologies and thereby establish links between pairs of concepts. Such links may reveal meaningful relationships between linked concepts, however they could as well relate concepts by chance. In this work we present InterOnto, a methodology that allows us to rank concept pairs to identify the most meaningful associations. The novelty of our approach compared to previous works is that we take the entire structure of the involved ontologies into account. This way, our method even finds links that are not present in the annotated data, but may be inferred through subsumed concept pairs. We have evaluated our methodology both quantitatively and qualitatively. Using real-life data from TAIR we show that our proposed scoring function is able to identify the most representative concept pairs while preventing overgeneralization. In comparison to prior work our method generally yields rankings of equivalent or better quality.
Data integration projects in the life sciences often gather data on a particular subject from multiple sources. Some of these sources overlap to a certain degree. Therefore, integrated search results may be supported by one, few, or all data sources. To reflect these differences, results should be ranked according to the number of data sources that support them. How such a ranking should look like is not clear per se. Either, results supported by only few sources are ranked high because this information is potentially new, or such results are ranked low because the strength of evidence supporting them is limited. We present two scoring schemes to rank search results in the integrated protein annotation database Columba. We define a surprisingness score, preferring results supported by few sources, and a confidence score, preferring frequently encountered information. Unlike many other scoring schemes our proposal is purely data-driven and does not require users to specify preferences among sources. Both scores take the concrete overlaps of data sources into account and do not presume statistical independence. We show how our schemes have been implemented efficiently using SQL.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.