BackgroundThe Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging.ResultsH3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community.ConclusionThe H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.
Up to 30% of men with normal semen parameters suffer from infertility and the reason for this is unknown. Altered expression of sperm proteins may be a major cause of infertility in these men. Proteomic profiling was performed on pooled semen samples from eight normozoospermic fertile men and nine normozoospermic infertile men using LC-MS/MS. Furthermore, key differentially expressed proteins (DEPs) related to the fertilization process were selected for validation using Western blotting. A total of 1139 and 1095 proteins were identified in normozoospermic fertile and infertile men, respectively. Of these, 162 proteins were identified as DEPs. The canonical pathway related to free radical scavenging was enriched with upregulated DEPs in normozoospermic infertile men. The proteins associated with reproductive system development and function, and the ubiquitination pathway were underexpressed in normozoospermic infertile men. Western blot analysis revealed the overexpression of annexin A2 (ANXA2) (2.03 fold change; P = 0.0243), and underexpression of sperm surface protein Sp17 (SPA17) (0.37 fold change; P = 0.0205) and serine protease inhibitor (SERPINA5) (0.32 fold change; P = 0.0073) in men with unexplained male infertility (UMI). The global proteomic profile of normozoospermic infertile men is different from that of normozoospermic fertile men. Our data suggests that SPA17, ANXA2, and SERPINA5 may potentially serve as non-invasive protein biomarkers associated with the fertilization process of the spermatozoa in UMI.
Background: Neuroblastoma is the most common extracranial solid tumor in childhood. Amplification of MYCN in neuroblastoma is a predictor of poor prognosis. Materials and methods: DNA methylation data from the TARGET data matrix were stratified into MYCN amplified and non-amplified groups. Differential methylation analysis, clustering, recursive feature elimination (RFE), machine learning (ML), Cox regression analysis and Kaplan–Meier estimates were performed. Results and Conclusion: 663 CpGs were differentially methylated between the two groups. A total of 25 CpGs were selected by RFE for clustering and ML, and a 100% clustering accuracy was obtained. ML validation on three external datasets produced high accuracy scores of 100%, 97% and 93%. Eight survival-associated CpGs were also identified. Therapeutic interventions may need to be targeted to patient subgroups.
Abstract:Biobanks are an organized collection of biological material and associated data. They are a fundamental resource for life science research and contribute to the development of pharmaceutical drugs, diagnostic markers and to a deeper understanding of the genetics that regulate the development of all life on earth.Biobanks are well established in High Income Countries (HIC) and are rapidly emerging in Low and Middle Income Countries (LMIC). Surveys among biobanks operating in a LMIC setting indicate that limited resources and short term funding tied to specific projects threaten the sustainability of the biobanks. Fit-for-purpose biobanks targeting major societal challenges such as HIV and Malaria provide an excellent basis for integrating biobanks with the available research communities in LMIC regions. But to become sustainable for the future it is important that biobanks become an integrated part of local research communities. To achieve this, the cost of operating biobanks must be lowered, templates must be developed to support local ethics committees and researchers must be given the opportunity to build experience in successfully operating biobank based research projects.The B3Africa consortium is based on these conclusions and set up to support biobank based research by creating a cost efficient Laboratory Information Management System (LIMS) for developing biobanks and also contribute to the training and capacity building in the local research community. The technical platform called the eB3Kit is open source and consists of a LIMS and a bioinformatics module based on the eBiokit that allow researchers to take control over the analysis of their own data. Along with the technical platform the consortium will also contribute training and support for the associated infrastructures necessary to regulate the ethical and legal implications of biobank based research.
A laboratory information management system (LIMS) is central to the informatics infrastructure that underlies biobanking activities. To date, a wide range of commercial and open-source LIMSs are available and the decision to opt for one LIMS over another is often influenced by the needs of the biobank clients and researchers, as well as available financial resources. The Baobab LIMS was developed by customizing the Bika LIMS software () to meet the requirements of biobanking best practices. The need to implement biobank standard operation procedures as well as stimulate the use of standards for biobank data representation motivated the implementation of Baobab LIMS, an open-source LIMS for Biobanking. Baobab LIMS comprises modules for biospecimen kit assembly, shipping of biospecimen kits, storage management, analysis requests, reporting, and invoicing. The Baobab LIMS is based on the Plone web-content management framework. All the system requirements for Plone are applicable to Baobab LIMS, including the need for a server with at least 8 GB RAM and 120 GB hard disk space. Baobab LIMS is a server–client-based system, whereby the end user is able to access the system securely through the internet on a standard web browser, thereby eliminating the need for standalone installations on all machines.
The H3ABioNet pan-African bioinformatics network, which is funded to support the Human Heredity and Health in Africa (H3Africa) program, has developed node-assessment exercises to gauge the ability of its participating research and service groups to analyze typical genome-wide datasets being generated by H3Africa research groups. We describe a framework for the assessment of computational genomics analysis skills, which includes standard operating procedures, training and test datasets, and a process for administering the exercise. We present the experiences of 3 research groups that have taken the exercise and the impact on their ability to manage complex projects. Finally, we discuss the reasons why many H3ABioNet nodes have declined so far to participate and potential strategies to encourage them to do so.PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi
To elucidate cancer pathogenesis and its mechanisms at the molecular level, the collecting and characterization of large individual patient tissue cohorts are required. Since most pathology institutes routinely preserve biopsy tissues by standardized methods of formalin fixation and paraffin embedment, these archived FFPE tissues are important collections of pathology material that include patient metadata, such as medical history and treatments. FFPE blocks can be stored under ambient conditions for decades, while retaining cellular morphology, due to modifications induced by formalin. However, the effect of long-term storage, at resource-limited institutions in developing countries, on extractable protein quantity/quality has not yet been investigated. In addition, the optimal sample preparation techniques required for accurate and reproducible results from label-free LC-MS/MS analysis across block ages remains unclear. This study investigated protein extraction efficiency of 1, 5, and 10-year old human colorectal carcinoma resection tissue and assessed three different gel-free protein purification methods for label-free LC-MS/MS analysis. A sample size of n = 17 patients per experimental group (with experiment power = 0.7 and α = 0.05, resulting in 70% confidence level) was selected. Data were evaluated in terms of protein concentration extracted, peptide/protein identifications, method reproducibility and efficiency, sample proteome integrity (due to storage time), as well as protein/peptide distribution according to biological processes, cellular components, and physicochemical properties. Data are available via ProteomeXchange with identifier PXD017198. The results indicate that the amount of protein extracted is significantly dependent on block age (p < 0.0001), with older blocks yielding less protein than newer blocks. Detergent removal plates were the most efficient and overall reproducible protein purification method with regard to number of peptide and protein identifications, followed by the MagReSyn® SP3/HILIC method (with on-bead enzymatic digestion), and lastly the acetone precipitation and formic acid resolubilization method. Overall, the results indicate that long-term storage of FFPE tissues (as measured by methionine oxidation) does not considerably interfere with retrospective proteomic analysis (p > 0.1). Block age mainly affects initial protein extraction yields and does not extensively impact on subsequent label-free LC-MS/MS analysis results.
The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous compute environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on quay.io.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.