Wolfram Höps scite author profile

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent–child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average contig N50: 26 Mbp) integrate all forms of genetic variation even across complex loci. We identify 107,590 structural variants (SVs), of which 68% are not discovered by short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterize 130 of the most active mobile element source elements and find that 63% of all SVs arise by homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1,526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

show abstract

Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders

Porubský

Höps

Ashraf

et al. 2022

Cell

155

View full text Add to dashboard Cite

Recurrent inversion toggling and great ape genome evolution

et al. 2020

View full text Add to dashboard Cite

Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by six-fold the number ( n = 1,069) of previously reported great ape inversions using Strand-seq and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions based on its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kb) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge they preferentially (~75%) occur in an inverted orientation compared to their ancestral locus. We construct megabase-pair-scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.

show abstract

Haplotype-resolved inversion landscape reveals hotspots of mutational recurrence associated with genomic disorders

Porubský

Höps

Ashraf

et al. 2021

Preprint

View full text Add to dashboard Cite

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1-retrotransposition; 80% of the larger inversions are balanced and affect twice as many base pairs as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or mobile elements. Since this suggests recurrence due to non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7*10-4 per locus and generation. Recurrent inversions exhibit a sex-chromosomal bias, and significantly co-localize to the critical regions of genomic disorders. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes to disease-causing CNVs.

show abstract

Gene Unprediction with Spurio: A tool to identify spurious protein sequences

2018

View full text Add to dashboard Cite

We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation. Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases. We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes. Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence’s likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wolfram Höps

Haplotype-resolved diverse human genomes and integrated analysis of structural variation

Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders

Recurrent inversion toggling and great ape genome evolution

Haplotype-resolved inversion landscape reveals hotspots of mutational recurrence associated with genomic disorders

Gene Unprediction with Spurio: A tool to identify spurious protein sequences

Contact Info

Product

Resources

About