Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unraveling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP‐2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP‐2.0 is sequence‐based and uses an architecture composed of convolutional and long short‐term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP‐2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP‐2.0 on several independent test datasets and found it to consistently produce state‐of‐the‐art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3‐class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1000 proteins in less than 2 hours, and complete proteomes in less than 1 day.
Pulsed field gradient diffusion sequences (PFG) with multiple diffusion encoding blocks have been indicated to offer new microstructural tissue information, such as the ability to detect nonspherical compartment shapes in macroscopically isotropic samples, i.e. samples with negligible directional signal dependence on diffusion gradients in standard diffusion experiments. However, current acquisition schemes are not rotationally invariant in the sense that the derived metrics depend on the orientation of the sample, and are affected by the interplay of sampling directions and compartment orientation dispersion when applied to macroscopically anisotropic systems. Here we propose a new framework, the d-PFG 5-design, to enable rotationally invariant estimation of double wave vector diffusion metrics (d-PFG). The method is based on the idea that an appropriate orientational average of the signal emulates the signal from a powder preparation of the same sample, where macroscopic anisotropy is absent by construction. Our approach exploits the theory of exact numerical integration (quadrature) of polynomials on the rotation group, and we exemplify the general procedure with a set consisting of 60 pairs of diffusion wave vectors (the d-PFG 5-design) facilitating a theoretically exact determination of the fourth order Taylor or cumulant expansion of the orientationally averaged signal. The d-PFG 5-design is evaluated with numerical simulations and ex vivo high field diffusion MRI experiments in a nonhuman primate brain. Specifically, we demonstrate rotational invariance when estimating compartment eccentricity, which we show offers new microstructural information, complementary to that of fractional anisotropy (FA) from diffusion tensor imaging (DTI). The imaging observations are supported by a new theoretical result, directly relating compartment eccentricity to FA of individual pores.
Identification and reconstruction of microbial species from metagenomics wide genome sequencing data is an important and challenging task. Current existing approaches rely on gene or contig co-abundance information across multiple samples and k -mer composition information in the sequences. Here we use recent advances in deep learning to develop an algorithm that uses variational autoencoders to encode co-abundance and compositional information prior to clustering. We show that the deep network is able to integrate these two heterogeneous datasets without any prior knowledge and that our method outperforms existing state-of-the-art by reconstructing 1.8 -8 times more highly precise and complete genome bins from three different benchmark datasets. Additionally, we apply our method to a gene catalogue of almost 10 million genes and 1,270 samples from the human gut microbiome. Here we are able to cluster 1.3 -1.8 million extra genes and reconstruct 117 -246 more highly precise and complete bins of which 70 bins were completely new compared to previous methods. Our method Variational Autoencoders for Metagenomic Binning (VAMB) is freely available at: https://github.com/jakobnissen/vamb
Research on human and murine haematopoiesis has resulted in a vast number of gene-expression data sets that can potentially answer questions regarding normal and aberrant blood formation. To researchers and clinicians with limited bioinformatics experience, these data have remained available, yet largely inaccessible. Current databases provide information about gene-expression but fail to answer key questions regarding co-regulation, genetic programs or effect on patient survival. To address these shortcomings, we present BloodSpot (www.bloodspot.eu), which includes and greatly extends our previously released database HemaExplorer, a database of gene expression profiles from FACS sorted healthy and malignant haematopoietic cells. A revised interactive interface simultaneously provides a plot of gene expression along with a Kaplan–Meier analysis and a hierarchical tree depicting the relationship between different cell types in the database. The database now includes 23 high-quality curated data sets relevant to normal and malignant blood formation and, in addition, we have assembled and built a unique integrated data set, BloodPool. Bloodpool contains more than 2000 samples assembled from six independent studies on acute myeloid leukemia. Furthermore, we have devised a robust sample integration procedure that allows for sensitive comparison of user-supplied patient samples in a well-defined haematopoietic cellular space.
Motivation Models for analysing and making relevant biological inferences from massive amounts of complex single-cell transcriptomic data typically require several individual data-processing steps, each with their own set of hyperparameter choices. With deep generative models one can work directly with count data, make likelihood-based model comparison, learn a latent representation of the cells and capture more of the variability in different cell populations. Results We propose a novel method based on variational auto-encoders (VAEs) for analysis of single-cell RNA sequencing (scRNA-seq) data. It avoids data preprocessing by using raw count data as input and can robustly estimate the expected gene expression levels and a latent representation for each cell. We tested several count likelihood functions and a variant of the VAE that has a priori clustering in the latent space. We show for several scRNA-seq data sets that our method outperforms recently proposed scRNA-seq methods in clustering cells and that the resulting clusters reflect cell types. Availability and implementation Our method, called scVAE, is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://github.com/scvae/scvae. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.