Markus Schmidt scite author profile

Accurate and fast aligners are required to handle the steadily increasing volume of sequencing data. Here we present an approach allowing performant alignments of short reads (Illumina) as well as long reads (Pacific Bioscience, Ultralong Oxford Nanopore), while achieving high accuracy, based on a universal three-stage scheme. It is also suitable for the discovery of insertions and deletions that originate from structural variants. We comprehensively compare our approach to other state-of-the-art aligners in order to confirm its performance with respect to accuracy and runtime. As part of our algorithmic scheme, we introduce two line sweep-based techniques called “strip of consideration” and “seed harmonization”. These techniques represent a replacement for chaining and do not rely on any specially tailored data structures. Additionally, we propose a refined form of seeding on the foundation of the FMD-index.

Proteomic Atomics Reveals a Distinctive Uracil‐5‐Methyltransferase

Pramanik

Thaker

Perumal

et al. 2020

Molecular Informatics

Carbon (C), hydrogen (H), nitrogen (N), oxygen (O), and sulfur (S) atoms intrigue as they are the foundation for amino acid (AA) composition and the folding and functions of proteins and thus define and control the survival of a cell, the smallest unit of life. Here, we calculated the proteomic atom distribution in > 1500 randomly selected species across the entire current phylogenetic tree and identified uracil-5-methyltransferase (U5MTase) of the protozoan parasite Plasmodium falciparum (Pf, strain Pf3D7), with a distinct atom and AA distribution pattern. We determined its apicoplast location and in silico 3D protein structure to refocus attention exclusively on U5MTase with tremendous potential for therapeutic intervention in malaria. Around 300 million clinical cases of malaria occur each year in tropical and subtropical regions of the world, resulting in over one million deaths annually, placing malaria among the most serious infectious diseases. Genomic and proteomic research of the clades of parasites containing Pf is progressing slowly and the functions of most of the~5300 genes are still unknown. We applied a 'bottom-up' comparative proteomic atomics analysis across the phylogenetic tree to visualize a protein molecule on its actual basis -i. e., its atomic level. We identified a protruding Pf3D7-specific U5MTase, determined its 3D protein structure, and identified potential inhibitory drug molecules through in silico drug screening that might serve as possible remedies for the treatment of malaria. Besides, this atomic-based proteome map provides a unique approach for the identification of parasite-specific proteins that could be considered as novel therapeutic targets.

Low Delay Filterbanks for Enhanced Low Delay Audio Coding

Schnell

Geiger

et al. 2007

Isolation and characterization of a retrovirus from the fish genus Xiphophorus

Petry

et al. 1992

Virology

A performant bridge between fixed-size and variable-size seeding

Kutzner

Kim

2020

Preprint

Background Seeding is usually the initial step of high-throughput sequence aligners. Two popular seeding strategies are fixed-size seeding (k-mers, minimizers) and variable-size seeding (MEMs, SMEMs, maximal spanning seeds). The former strategy supports fast seed computation, while the latter one benefits from a high seed entropy. Algorithmic bridges between instances of both seeding strategies are of interest for combining their respective advantages. Results We introduce an efficient strategy for computing MEMs out of fixed-size seeds (k-mers or minimizers). In contrast to previously proposed extend-purge strategies, our merge-extend strategy prevents the creation and filtering of duplicate MEMs. Further, we describe techniques for extracting SMEMs or maximal spanning seeds out of MEMs. A comprehensive benchmarking shows the applicability, strengths, shortcomings and computational requirements of all discussed seeding techniques. Additionally, we report the effects of seed occurrence filters in the context of these techniques. Aside from our novel algorithmic approaches, we analyze hierarchies within fixed-size and variable-size seeding along with a mapping between instances of both seeding strategies. Conclusion Benchmarking shows that our proposed merge-extend strategy for MEM computation outperforms previous extend-purge strategies in the context of PacBio reads. The observed superiority grows with increasing read size and read quality. Further, the presented filters for extracting SMEMs or maximal spanning seeds out of MEMs outperform FMD-index based extension techniques. All code used for benchmarking is available via GitHub at https://github.com/ITBE-Lab/seed-evaluation.

A novel specialized single-linkage clustering algorithm for taxonomically ordered data

Journal of Theoretical Biology

Kutzner

Heese

2017

Cover Picture: Proteomic Atomics Reveals a Distinctive Uracil‐5‐Methyltransferase (Mol. Inf. 5/2020)

Pramanik

Thaker

Perumal

et al. 2020

Molecular Informatics

The cover picture shows how proteome‐based atom (C, H, N, O, S) distributions across all species of the phylogenetic tree revealed U5MTase of Plasmodium falciparum as a distinguished possible therapeutic target which in turn was used for in silico structure‐based drug design strategies (i.e., 3D protein structure modeling, virtual chemical library screening, and molecular docking) to identify imanixil as potential inhibitory drug molecule that might serve as possible remedy for the treatment of malaria. More Details can be found in the Full Paper by Subrata Pramanik, Manisha Thaker, Ananda Gopu Perumal, Rajasekaran Ekambaram, Naresh Poondla, Markus Schmidt, Pok‐Son Kim, Arne Kutzner, and Klaus Heese, please see DOI: 10.1002/minf.201900135

MSV: a modular structural variant caller that reveals nested and complex rearrangements by unifying breakends inferred directly from reads

Schmidt¹,

Kutzner

2023

Genome Biol

Structural variant (SV) calling belongs to the standard tools of modern bioinformatics for identifying and describing alterations in genomes. Initially, this work presents several complex genomic rearrangements that reveal conceptual ambiguities inherent to the representation via basic SV. We contextualize these ambiguities theoretically as well as practically and propose a graph-based approach for resolving them. For various yeast genomes, we practically compute adjacency matrices of our graph model and demonstrate that they provide highly accurate descriptions of one genome in terms of another. An open-source prototype implementation of our approach is available under the MIT license at https://github.com/ITBE-Lab/MA.