Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.
The impact of microbial communities, also known as the microbiome, on human health and the environment is receiving increased attention. Studying translated gene products (proteins) and comparing metaproteomic profiles may elucidate how microbiomes respond to specific environmental stimuli, and interact with host organisms. Characterizing proteins expressed by a complex microbiome and interpreting their functional signature requires sophisticated informatics tools and workflows tailored to metaproteomics. Additionally, there is a need to disseminate these informatics resources to researchers undertaking metaproteomic studies, who could use them to make new and important discoveries in microbiome research. The Galaxy for proteomics platform (Galaxy-P) offers an open source, web-based bioinformatics platform for disseminating metaproteomics software and workflows. Within this platform, we have developed easily-accessible and documented metaproteomic software tools and workflows aimed at training researchers in their operation and disseminating the tools for more widespread use. The modular workflows encompass the core requirements of metaproteomic informatics: (a) database generation; (b) peptide spectral matching; (c) taxonomic analysis and (d) functional analysis. Much of the software available via the Galaxy-P platform was selected, packaged and deployed through an online metaproteomics “Contribution Fest“ undertaken by a unique consortium of expert software developers and users from the metaproteomics research community, who have co-authored this manuscript. These resources are documented on GitHub and freely available through the Galaxy Toolshed, as well as a publicly accessible metaproteomics gateway Galaxy instance. These documented workflows are well suited for the training of novice metaproteomics researchers, through online resources such as the Galaxy Training Network, as well as hands-on training workshops. Here, we describe the metaproteomics tools available within these Galaxy-based resources, as well as the process by which they were selected and implemented in our community-based work. We hope this description will increase access to and utilization of metaproteomics tools, as well as offer a framework for continued community-based development and dissemination of cutting edge metaproteomics software.
In Brief metaQuantome enables quantitative analysis of the taxonomic and functional state of a microbiome. Leveraging quantitative mass spectrometry data generated from metaproteomic samples along with taxonomic and functional annotations, metaQuantome unravels the complex and hierarchical data structure of taxonomic and functional ontologies. As a result, metaQuantome enables data exploration, tests hypotheses, and generates high-quality visualizations. metaQuantome deciphers the contribution of taxa to a functional process and vice versa. Its accessibility will pave the way for advanced multi-omic analysis of diverse microbiomes.
Multi-omics approaches focused on mass-spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein sequence database. These databases can be very large, containing millions of sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to sequences to generate peptide spectrum matches (PSMs). Here, we describe a sectioning method for generating an enriched database for those protein sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics. We demonstrate increased true positive PSM identifications using the sectioning method when compared to the traditional large database searching method, whereas it helped in reducing the false PSM identifications when compared to a previously described two-step method for reducing database size. The sectioning method for large sequence databases enables generation of an enriched protein sequence database and promotes increased sensitivity in identifying PSMs, while maintaining acceptable and manageable FDR. Furthermore, implementation in the Galaxy platform provides access to a usable and automated workflow for carrying out the method. Our results show the utility of this methodology for a wide-range of applications where genome-guided, large sequence databases are required for MS-based proteomics data analysis.
Intestinal proteases mediate digestion and immune signaling, while increased gut proteolytic activity disrupts the intestinal barrier and generates visceral hypersensitivity, which in common in irritable bowel syndrome (IBS). However, the mechanisms controlling protease function are unclear. Here we show that members of the gut microbiota suppress intestinal proteolytic activity through production of unconjugated bilirubin. This occurs via microbial β-glucuronidase-mediated conversion of bilirubin conjugates. Metagenomic analysis of fecal samples from patients with post-infection IBS (n=52) revealed an altered gut microbiota composition, in particular a reduction in Alistipes taxa, and high gut proteolytic activity driven by specific host serine proteases compared to controls. Germ-free mice showed 10-fold higher proteolytic activity compared with conventional mice. Colonization with microbiota from high proteolytic activity IBS patients failed to suppress proteolytic activity in germ-free mice, but suppression of proteolytic activity was achieved with colonization using microbiota from healthy donors. High proteolytic activity mice had higher intestinal permeability, a higher relative abundance of Bacteroides and a reduction in Alistipes taxa compared with low proteolytic activity mice. High proteolytic activity IBS patients had lower fecal β-glucuronidase activity and end-products of bilirubin deconjugation. Mice treated with unconjugated bilirubin and β-glucuronidase overexpressing E. coli , which significantly reduced proteolytic activity, while inhibitors of microbial β-glucuronidases increased proteolytic activity. Together, these data define a disease-relevant mechanism of host-microbial interaction that maintains protease homeostasis in the gut.
There is an ongoing explosion of scientific datasets being generated, brought on by recent technological advances in many areas of the natural sciences. As a result, the life sciences have become increasingly computational in nature, and bioinformatics has taken on a central role in research studies. However, basic computational skills, data analysis, and stewardship are still rarely taught in life science educational programs, resulting in a skills gap in many of the researchers tasked with analysing these big datasets. In order to address this skills gap and empower researchers to perform their own data analyses, the Galaxy Training Network (GTN) has previously developed the Galaxy Training Platform (https://training.galaxyproject.org), an open access, community-driven framework for the collection of FAIR (Findable, Accessible, Interoperable, Reusable) training materials for data analysis utilizing the user-friendly Galaxy framework as its primary data analysis platform. Since its inception, this training platform has thrived, with the number of tutorials and contributors growing rapidly, and the range of topics extending beyond life sciences to include topics such as climatology, cheminformatics, and machine learning. While initially aimed at supporting researchers directly, the GTN framework has proven to be an invaluable resource for educators as well. We have focused our efforts in recent years on adding increased support for this growing community of instructors. New features have been added to facilitate the use of the materials in a classroom setting, simplifying the contribution flow for new materials, and have added a set of train-the-trainer lessons. Here, we present the latest developments in the GTN project, aimed at facilitating the use of the Galaxy Training materials by educators, and its usage in different learning environments.
moFF is a modular and operating-system-independent tool for quantitative analysis of label-free mass-spectrometry-based proteomics data. The moFF workflow, comprising matching-between-runs and apex quantification, can be applied to any upstream search engine’s output, along with the corresponding Thermo or mzML raw file. We here present moFF 2.0, with improvements in speed through multithreading, the use of a new raw file access library, and a novel filtering approach in the matching-between-runs module. This filter allows moFF to correctly identify features that are present in one run but not in another, as demonstrated using spiked-in iRT peptides. Moreover, moFF 2.0 also provides a new peptide summary export that can be used in downstream statistical analysis. moFF is open source and freely available and can be downloaded from
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.