Abstract:Most bioinformatics tools available today were not written by professional software developers, but by people that wanted to solve their own problems, using computational solutions and spending the minimum time and effort possible, since these were just the means to an end. Consequently, a vast number of software applications are currently available, hindering the task of identifying the utility and quality of each. At the same time, this situation has hindered regular adoption of these tools in clinical pract… Show more
“…Suggestions for best development practices are provided in 15 of the reviewed resources. Some of the reviewed resources are fully dedicated to best practices, including general best practices for scientific software development 22, 23 and specific best practices for biomedical software 24, 25 . Common best practice suggestions are presented in Fig.…”
Section: Resultsmentioning
confidence: 99%
“…3A. Overall, developing with a version control system is the most common suggestion [9][10][11][12][22][23][24][25][26][27][28][29] . GitHub, BitBucket, and GitLab are commonly mentioned ready-to-use version control repositories.…”
Section: Standards and Best Practicesmentioning
confidence: 99%
“…In Step 1.a, we recommend following general standards and best practices for scientific software, such as working from a version control system (GitHub, Bitbucket, GitLab), having code level documentation (in code comments, description in the headers), and recording dependencies in a requirement.txt file or a README file. We recommend reading "Good enough practices in scientific computing" 22 and "General guidelines for biomedical software development" 24 . In Step 1.b, we recommend following language-specific standards and best practices, which depend on the development stack used.…”
Section: Standards and Best Practicesmentioning
confidence: 99%
“…From the reviewed resources, 14 have made suggestions about a suitable license for research software 6,[8][9][10][11]15,22,[24][25][26]28,33,44,68 . All agree that it is preferable to use an open-source license to make the software as reusable as possible (c.f.…”
Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles tailored for research software have been proposed by the FAIR for Research Software (FAIR4RS) Working Group. They provide a foundation for optimizing the reuse of research software. The FAIR4RS principles are, however, aspirational and do not provide practical instructions to the researchers. To fill this gap, we propose in this work the first actionable step-by-step guidelines for biomedical researchers to make their research software compliant with the FAIR4RS principles. We designate them as the FAIR Biomedical Research Software (FAIR-BioRS) guidelines. Our process for developing these guidelines, presented in this manuscript, is based on a re-classification of the FAIR4RS principles and a thorough review of current practices in the field. To support researchers, we have also developed a tool that streamlines the process of implementing these guidelines. This tool is incorporated in FAIRshare, a free and open-source software application aimed at simplifying the curation and sharing of FAIR biomedical data and software through user-friendly interfaces and automation. Details about this tool are also presented.
“…Suggestions for best development practices are provided in 15 of the reviewed resources. Some of the reviewed resources are fully dedicated to best practices, including general best practices for scientific software development 22, 23 and specific best practices for biomedical software 24, 25 . Common best practice suggestions are presented in Fig.…”
Section: Resultsmentioning
confidence: 99%
“…3A. Overall, developing with a version control system is the most common suggestion [9][10][11][12][22][23][24][25][26][27][28][29] . GitHub, BitBucket, and GitLab are commonly mentioned ready-to-use version control repositories.…”
Section: Standards and Best Practicesmentioning
confidence: 99%
“…In Step 1.a, we recommend following general standards and best practices for scientific software, such as working from a version control system (GitHub, Bitbucket, GitLab), having code level documentation (in code comments, description in the headers), and recording dependencies in a requirement.txt file or a README file. We recommend reading "Good enough practices in scientific computing" 22 and "General guidelines for biomedical software development" 24 . In Step 1.b, we recommend following language-specific standards and best practices, which depend on the development stack used.…”
Section: Standards and Best Practicesmentioning
confidence: 99%
“…From the reviewed resources, 14 have made suggestions about a suitable license for research software 6,[8][9][10][11]15,22,[24][25][26]28,33,44,68 . All agree that it is preferable to use an open-source license to make the software as reusable as possible (c.f.…”
Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles tailored for research software have been proposed by the FAIR for Research Software (FAIR4RS) Working Group. They provide a foundation for optimizing the reuse of research software. The FAIR4RS principles are, however, aspirational and do not provide practical instructions to the researchers. To fill this gap, we propose in this work the first actionable step-by-step guidelines for biomedical researchers to make their research software compliant with the FAIR4RS principles. We designate them as the FAIR Biomedical Research Software (FAIR-BioRS) guidelines. Our process for developing these guidelines, presented in this manuscript, is based on a re-classification of the FAIR4RS principles and a thorough review of current practices in the field. To support researchers, we have also developed a tool that streamlines the process of implementing these guidelines. This tool is incorporated in FAIRshare, a free and open-source software application aimed at simplifying the curation and sharing of FAIR biomedical data and software through user-friendly interfaces and automation. Details about this tool are also presented.
“…Over the last few years, bioinformatics has played a major role in the field of biology, raising the issue of best practices in software development for the members of the bioinformatics community 1 – 3 . These practices include facilitating the discovery, deployment, and usage of tools, and several helpful solutions are available.…”
Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of tools that are often missing important information, including parameters and metadata such as publication or links to documentation. ToolDog (Tool DescriptiOn Generator) facilitates the integration of tools - which have been registered in the ELIXIR tools registry (https://bio.tools) - into workbench environments by generating tool description templates. ToolDog includes two modules. The first module analyses the source code of the bioinformatics software with language-specific plugins, and generates a skeleton for a Galaxy XML or CWL tool description. The second module is dedicated to the enrichment of the generated tool description, using metadata provided by bio.tools. This last module can also be used on its own to complete or correct existing tool descriptions with missing metadata.
Reproducibility is not only essential for the integrity of scientific research but is also a prerequisite for model validation and refinement for the future application of predictive algorithms. However, reproducible research is becoming increasingly challenging, particularly in high-dimensional genomic data analyses with complex statistical or algorithmic techniques. Given that there are no mandatory requirements in most biomedical and statistical journals to provide the original data, analytical source code, or other relevant materials for publication, accessibility to these supplements naturally suggests a greater credibility of the published work. In this study, we performed a reproducibility assessment of the notable paper by Gerstung et al. (Nat Genet 49:332–340, 2017) by rerunning the analysis using their original code and data, which are publicly accessible. Despite an open science setting, it was challenging to reproduce the entire research project; reasons included: incomplete data and documentation, suboptimal code readability, coding errors, limited portability of intensive computing performed on a specific platform, and an R computing environment that could no longer be re-established. We learn that the availability of code and data does not guarantee transparency and reproducibility of a study; paradoxically, the source code is still liable to error and obsolescence, essentially due to methodological and computational complexity, a lack of reproducibility checking at submission, and updates for software and operating environment. The complex code may also hide problematic methodological aspects of the proposed research. Building on the experience gained, we discuss the best programming and software engineering practices that could have been employed to improve reproducibility, and propose practical criteria for the conduct and reporting of reproducibility studies for future researchers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.