David B. Blumenthal scite author profile

Coronavirus Disease-2019 (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. Various studies exist about the molecular mechanisms of viral infection. However, such information is spread across many publications and it is very time-consuming to integrate, and exploit. We develop CoVex, an interactive online platform for SARS-CoV-2 host interactome exploration and drug (target) identification. CoVex integrates virus-human protein interactions, human protein-protein interactions, and drug-target interactions. It allows visual exploration of the virus-host interactome and implements systems medicine algorithms for network-based prediction of drug candidates. Thus, CoVex is a resource to understand molecular mechanisms of pathogenicity and to prioritize candidate therapeutics. We investigate recent hypotheses on a systems biology level to explore mechanistic virus life cycle drivers, and to extract drug repurposing candidates. CoVex renders COVID-19 drug research systems-medicine-ready by giving the scientific community direct access to network medicine algorithms. It is available at https://exbio.wzw.tum.de/covex/.

show abstract

Comparing heuristics for graph edit distance computation

Blumenthal¹,

Boria²,

Gamper³

et al. 2019

The VLDB Journal

View full text Add to dashboard Cite

Because of its flexibility, intuitiveness, and expressivity, the graph edit distance (GED) is one of the most widely used distance measures for labeled graphs. Since exactly computing GED is NP-hard, over the past years, various heuristics have been proposed. They use techniques such as transformations to the linear sum assignment problem with error-correction, local search, and linear programming to approximate GED via upper or lower bounds. In this paper, we provide a systematic overview of the most important heuristics. Moreover, we empirically evaluate all compared heuristics within an integrated implementation.

show abstract

Network medicine for disease module identification and drug repurposing with the NeDRex platform

et al. 2021

View full text Add to dashboard Cite

Traditional drug discovery faces a severe efficacy crisis. Repurposing of registered drugs provides an alternative with lower costs and faster drug development timelines. However, the data necessary for the identification of disease modules, i.e. pathways and sub-networks describing the mechanisms of complex diseases which contain potential drug targets, are scattered across independent databases. Moreover, existing studies are limited to predictions for specific diseases or non-translational algorithmic approaches. There is an unmet need for adaptable tools allowing biomedical researchers to employ network-based drug repurposing approaches for their individual use cases. We close this gap with NeDRex, an integrative and interactive platform for network-based drug repurposing and disease module discovery. NeDRex integrates ten different data sources covering genes, drugs, drug targets, disease annotations, and their relationships. NeDRex allows for constructing heterogeneous biological networks, mining them for disease modules, prioritizing drugs targeting disease mechanisms, and statistical validation. We demonstrate the utility of NeDRex in five specific use-cases.

show abstract

Finding k-shortest paths with limited overlap

et al. 2020

View full text Add to dashboard Cite

In this paper, we investigate the computation of alternative paths between two locations in a road network. More specifically, we study the k-shortest paths with limited overlap (kSPwLO) problem that aims at finding a set of k paths such that all paths are sufficiently dissimilar to each other and as short as possible. To compute kSPwLO queries, we propose two exact algorithms, termed OnePass and MultiPass, and we formally prove that MultiPass is optimal in terms of complexity. We also study two classes of heuristic algorithms: (a) performance-oriented heuristic algorithms that trade shortness for performance, i.e., they reduce query processing time, but do not guarantee that the length of each subsequent result is minimum; and (b) completeness-oriented heuristic algorithms that trade dissimilarity for completeness, i.e., they relax the similarity constraint to return a result that contains exactly k paths. An extensive experimental analysis on real road networks demonstrates the efficiency of our proposed solutions in terms of runtime and quality of the result.

show abstract

On the limits of active module identification

Lazareva

Baumbach

List

et al. 2021

View full text Add to dashboard Cite

In network and systems medicine, active module identification methods (AMIMs) are widely used for discovering candidate molecular disease mechanisms. To this end, AMIMs combine network analysis algorithms with molecular profiling data, most commonly, by projecting gene expression data onto generic protein–protein interaction (PPI) networks. Although active module identification has led to various novel insights into complex diseases, there is increasing awareness in the field that the combination of gene expression data and PPI network is problematic because up-to-date PPI networks have a very small diameter and are subject to both technical and literature bias. In this paper, we report the results of an extensive study where we analyzed for the first time whether widely used AMIMs really benefit from using PPI networks. Our results clearly show that, except for the recently proposed AMIM DOMINO, the tested AMIMs do not produce biologically more meaningful candidate disease modules on widely used PPI networks than on random networks with the same node degrees. AMIMs hence mainly learn from the node degrees and mostly fail to exploit the biological knowledge encoded in the edges of the PPI networks. This has far-reaching consequences for the field of active module identification. In particular, we suggest that novel algorithms are needed which overcome the degree bias of most existing AMIMs and/or work with customized, context-specific networks instead of generic PPI networks.

show abstract

On the exact computation of the graph edit distance

Blumenthal

Gamper

2020

Pattern Recognition Letters

View full text Add to dashboard Cite

Identification of a cDNA for a human high-molecular-weight B-cell growth factor.

Ambrus

Pippin²,

Joseph³

et al. 1993

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

authors request that the following be noted. Several errors have been identified in the IL-14 cDNA sequence shown in Fig. 1. The translation of this sequence does not predict an open reading frame that would result in the production of a 50-60-kDa protein. A reading frame in the 5Ј to 3Ј direction (plus strand) predicts a 7.7-kDa protein and a reading frame in the 3Ј to 5Ј direction (minus strand) predicts a 36.4-kDa protein. The relationship of this sequence to IL-14, if any, is uncertain. The corrected sequence is shown below.

show abstract

Cracking the black box of deep sequence-based protein-protein interaction prediction

Bernett

Blumenthal

List

2023

Preprint

View full text Add to dashboard Cite

Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways and their dysregulation. Numerous prediction methods have been developed as a cheap alternative to biological experiments, reporting phenomenal accuracy estimates. While most methods rely exclusively on sequence information, PPIs occur in 3D space. As predicting protein structure from sequence is an infamously complex problem, the almost perfect reported performances for PPI prediction seem dubious. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities, and node degree information and compared them to basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances, giving a false impression of the field. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test, performances become random, leaving this research field wide open.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.