Exposure of cells or organisms to chemicals can trigger a series of effects at the regulatory pathway level, which involve changes of levels, interactions, and feedback loops of biomolecules of different types. A single-omics technique, e.g., transcriptomics, will detect biomolecules of one type and thus can only capture changes in a small subset of the biological cascade. Therefore, although applying single-omics analyses can lead to the identification of biomarkers for certain exposures, they cannot provide a systemic understanding of toxicity pathways or adverse outcome pathways. Integration of multiple omics data sets promises a substantial improvement in detecting this pathway response to a toxicant, by an increase of information as such and especially by a systemic understanding. Here, we report the findings of a thorough evaluation of the prospects and challenges of multi-omics data integration in toxicological research. We review the availability of such data, discuss options for experimental design, evaluate methods for integration and analysis of multi-omics data, discuss best practices, and identify knowledge gaps. Re-analyzing published data, we demonstrate that multi-omics data integration can considerably improve the confidence in detecting a pathway response. Finally, we argue that more data need to be generated from studies with a multi-omics-focused design, to define which omics layers contribute most to the identification of a pathway response to a toxicant.Archives of Toxicology (2020) 94:371-388Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Background Gaining biological insights into molecular responses to treatments or diseases from omics data can be accomplished by gene set or pathway enrichment methods. A plethora of different tools and algorithms have been developed so far. Among those, the gene set enrichment analysis (GSEA) proved to control both type I and II errors well. In recent years the call for a combined analysis of multiple omics layers became prominent, giving rise to a few multi-omics enrichment tools. Each of these has its own drawbacks and restrictions regarding its universal application. Results Here, we present the package aiding to calculate a combined GSEA-based pathway enrichment on multiple omics layers. The package queries 8 different pathway databases and relies on the robust GSEA algorithm for a single-omics enrichment analysis. In a final step, those scores will be combined to create a robust composite multi-omics pathway enrichment measure. supports 11 different organisms and includes a comprehensive mapping of transcripts, proteins, and metabolite IDs. Conclusions With we introduce a highly versatile tool for multi-omics pathway integration that minimizes previous restrictions in terms of omics layer selection, pathway database availability, organism selection and the mapping of omics feature identifiers. is publicly available under the GPL-3 license at https://github.com/yigbt/multiGSEA and at bioconductor: https://bioconductor.org/packages/multiGSEA.
Background The intestinal microbiota plays a crucial role in protecting the host from pathogenic microbes, modulating immunity and regulating metabolic processes. We studied the simplified human intestinal microbiota (SIHUMIx) consisting of eight bacterial species with a particular focus on the discovery of novel small proteins with less than 100 amino acids (= sProteins), some of which may contribute to shape the simplified human intestinal microbiota. Although sProteins carry out a wide range of important functions, they are still often missed in genome annotations, and little is known about their structure and function in individual microbes and especially in microbial communities. Results We created a multi-species integrated proteogenomics search database (iPtgxDB) to enable a comprehensive identification of novel sProteins. Six of the eight SIHUMIx species, for which no complete genomes were available, were sequenced and de novo assembled. Several proteomics approaches including two earlier optimized sProtein enrichment strategies were applied to specifically increase the chances for novel sProtein discovery. The search of tandem mass spectrometry (MS/MS) data against the multi-species iPtgxDB enabled the identification of 31 novel sProteins, of which the expression of 30 was supported by metatranscriptomics data. Using synthetic peptides, we were able to validate the expression of 25 novel sProteins. The comparison of sProtein expression in each single strain versus a multi-species community cultivation showed that six of these sProteins were only identified in the SIHUMIx community indicating a potentially important role of sProteins in the organization of microbial communities. Two of these novel sProteins have a potential antimicrobial function. Metabolic modelling revealed that a third sProtein is located in a genomic region encoding several enzymes relevant for the community metabolism within SIHUMIx. Conclusions We outline an integrated experimental and bioinformatics workflow for the discovery of novel sProteins in a simplified intestinal model system that can be generically applied to other microbial communities. The further analysis of novel sProteins uniquely expressed in the SIHUMIx multi-species community is expected to enable new insights into the role of sProteins on the functionality of bacterial communities such as those of the human intestinal tract.
BackgroundSmall nucleolar RNAs (snoRNAs) are one of the most ancient families amongst non-protein-coding RNAs. They are ubiquitous in Archaea and Eukarya but absent in bacteria. Their main function is to target chemical modifications of ribosomal RNAs. They fall into two classes, box C/D snoRNAs and box H/ACA snoRNAs, which are clearly distinguished by conserved sequence motifs and the type of chemical modification that they govern. Similarly to microRNAs, snoRNAs appear in distinct families of homologs that affect homologous targets. In animals, snoRNAs and their evolution have been studied in much detail. In plants, however, their evolution has attracted comparably little attention.ResultsIn order to chart the phylogenetic distribution of individual snoRNA families in plants, we applied a sophisticated approach for identifying homologs of known plant snoRNAs across the plant kingdom. In response to the relatively fast evolution of snoRNAs, information on conserved sequence boxes, target sequences, and secondary structure is combined to identify additional snoRNAs. We identified 296 families of snoRNAs in 24 species and traced their evolution throughout the plant kingdom. Many of the plant snoRNA families comprise paralogs. We also found that targets are well-conserved for most snoRNA families.ConclusionsThe sequence conservation of snoRNAs is sufficient to establish homologies between phyla. The degree of this conservation tapers off, however, between land plants and algae. Plant snoRNAs are frequently organized in highly conserved spatial clusters. As a resource for further investigations we provide carefully curated and annotated alignments for each snoRNA family under investigation.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-3301-2) contains supplementary material, which is available to authorized users.
Small nucleolar RNAs (snoRNAs) are essential players in the rRNA biogenesis due to their involvement in the nucleolytic processing of the precursor and the subsequent guidance of nucleoside modifications. Within the kingdom Fungi, merely a few species-specific surveys have explored their snoRNA repertoire. However, the wide range of the snoRNA landscape spanning all major fungal lineages has not been mapped so far, mainly because of missing tools for automatized snoRNA detection and functional analysis. For the first time, we report here a comprehensive inventory of fungal snoRNAs together with a functional analysis and an in-depth investigation of their evolutionary history including innovations, deletions, and target switches. This large-scale analysis, incorporating more than 120 snoRNA families with more than 7700 individual snoRNA sequences, catalogs and clarifies the landscape of fungal snoRNA families, assigns functions to previously orphan snoRNAs, and increases the number of sequences by 450%. We also show that the snoRNAome is subject to ongoing rearrangements and adaptations, e.g., through lineage-specific targets and redundant guiding functions.
Human organoids could facilitate research of complex and currently incurable neuropathologies, such as age-related macular degeneration (AMD) which causes blindness. Here, we establish a human retinal organoid system reproducing several parameters of the human retina, including some within the macula, to model a complex combination of photoreceptor and glial pathologies. We show that combined application of TNF and HBEGF, factors associated with neuropathologies, is sufficient to induce photoreceptor degeneration, glial pathologies, dyslamination, and scar formation: These develop simultaneously and progressively as one complex phenotype. Histologic, transcriptome, live-imaging, and mechanistic studies reveal a previously unknown pathomechanism: Photoreceptor neurodegeneration via cell extrusion. This could be relevant for aging, AMD, and some inherited diseases. Pharmacological inhibitors of the mechanosensor PIEZO1, MAPK, and actomyosin each avert pathogenesis; a PIEZO1 activator induces photoreceptor extrusion. Our model offers mechanistic insights, hypotheses for neuropathologies, and it could be used to develop therapies to prevent vision loss or to regenerate the retina in patients suffering from AMD and other diseases.
U6 small nuclear RNAs are part of the splicing machinery. They exhibit several unique features setting them appart from other snRNAs. Reports of introns in structured non-coding RNAs have been very rare. U6 genes, however, were found to be interrupted by an intron in several Schizosaccharomyces species and in 2 Basidiomycota. We conducted a homology search across 147 currently available fungal genome and identified the U6 genes in all but 2 of them. A detailed comparison of their sequences and predicted secondary structures showed that intron insertion events in the U6 snRNA were much more common in the fungal lineage than previously thought. Their positional distribution across the entire mature snRNA strongly suggests a large number of independent events. All the intron sequences reported here show canonical splice site and branch site motifs indicating that they require the splicesomal pathway for their removal.
Gene correlation network inference from single-cell transcriptomics data potentially allows to gain unprecendented insights into cell type-specific regulatory programs. ScRNA-seq data is severely affected by dropout, which significantly hampers and restrains current downstream analysis. Although newly developed tools are capable to deal with sparse data, no appropriate single-cell network inference workflow has been established. A potential way to end this deadlock is the application of data imputation methods, which already proofed to be useful in specific contexts of single-cell data analysis, e.g., recovering cell clusters. In order to infer cell-type specific networks, two prerequisites must be met: the identification of cluster-specific cell-types and the network inference itself. Here, we propose a benchmarking framework to investigate both objections. By using suitable reference data with inherent correlation structure, six representative imputation tools and appropriate evaluation measures, we were able to systematically infer the impact of data imputation on network inference. Major network structures were found to be preserved in low dropout data sets. For moderately sparse data sets, DCA was able to recover gene correlation structures, although systematically introducing higher correlation values. No imputation tool was able to recover true signals from high dropout data. However, by using an additional biological data set we could show that cell-cell correlation by means of specific marker gene expression was not compromised through data imputation. Our analysis showed that network inference is feasible for low and moderately sparse data sets by using the unimputed and DCA-prepared data, respectively. High sparsity data, on the other side, still pose a major problem since current imputation techniques are not able to facilitate network inference. The annotation of cluster-specific cell-types as a prerequisite is not hampered by data imputation but their power to restore the deeply hidden correlation structures is still not sufficient enough.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.