The growth of preprints in the life sciences has been reported widely and is driving policy changes for journals and funders, but little quantitative information has been published about preprint usage. Here, we report how we collected and analyzed data on all 37,648 preprints uploaded to bioRxiv.org, the largest biology-focused preprint server, in its first five years. The rate of preprint uploads to bioRxiv continues to grow (exceeding 2,100 in October 2018), as does the number of downloads (1.1 million in October 2018). We also find that two-thirds of preprints posted before 2017 were later published in peer-reviewed journals, and find a relationship between the number of downloads a preprint has received and the impact factor of the journal in which it is published. We also describe Rxivist.org, a web application that provides multiple ways to interact with preprint metadata.
Background Preprint usage is growing rapidly in the life sciences; however, questions remain on the relative quality of preprints when compared to published articles. An objective dimension of quality that is readily measurable is completeness of reporting, as transparency can improve the reader’s ability to independently interpret data and reproduce findings. Methods In this observational study, we initially compared independent samples of articles published in bioRxiv and in PubMed-indexed journals in 2016 using a quality of reporting questionnaire. After that, we performed paired comparisons between preprints from bioRxiv to their own peer-reviewed versions in journals. Results Peer-reviewed articles had, on average, higher quality of reporting than preprints, although the difference was small, with absolute differences of 5.0% [95% CI 1.4, 8.6] and 4.7% [95% CI 2.4, 7.0] of reported items in the independent samples and paired sample comparison, respectively. There were larger differences favoring peer-reviewed articles in subjective ratings of how clearly titles and abstracts presented the main findings and how easy it was to locate relevant reporting information. Changes in reporting from preprints to peer-reviewed versions did not correlate with the impact factor of the publication venue or with the time lag from bioRxiv to journal publication. Conclusions Our results suggest that, on average, publication in a peer-reviewed journal is associated with improvement in quality of reporting. They also show that quality of reporting in preprints in the life sciences is within a similar range as that of peer-reviewed articles, albeit slightly lower on average, supporting the idea that preprints should be considered valid scientific contributions.
The importance of sampling from globally representative populations has been well established in human genomics. In human microbiome research, however, we lack a full understanding of the global distribution of sampling in research studies. This information is crucial to better understand global patterns of microbiome-associated diseases and to extend the health benefits of this research to all populations. Here, we analyze the country of origin of all 444,829 human microbiome samples that are available from the world’s 3 largest genomic data repositories, including the Sequence Read Archive (SRA). The samples are from 2,592 studies of 19 body sites, including 220,017 samples of the gut microbiome. We show that more than 71% of samples with a known origin come from Europe, the United States, and Canada, including 46.8% from the US alone, despite the country representing only 4.3% of the global population. We also find that central and southern Asia is the most underrepresented region: Countries such as India, Pakistan, and Bangladesh account for more than a quarter of the world population but make up only 1.8% of human microbiome samples. These results demonstrate a critical need to ensure more global representation of participants in microbiome studies.
19Researchers in the life sciences are posting their work to preprint servers at an 20 unprecedented and increasing rate, sharing papers online before (or instead of) 21 publication in peer-reviewed journals. Though the popularity and practical benefits of 22 preprints are driving policy changes at journals and funding organizations, there is little 23 bibliometric data available to measure trends in their usage. Here, we collected and 24 analyzed data on all 37,648 preprints that were uploaded to bioRxiv.org, the largest 25 biology-focused preprint server, in its first five years. We find that preprints on bioRxiv 26 are being read more than ever before (1.1 million downloads in October 2018 alone) 27 and that the rate of preprints being posted has increased to a recent high of more than 28 2,100 per month. We also find that two-thirds of bioRxiv preprints posted in 2016 or 29 earlier were later published in peer-reviewed journals, and that the majority of published 30 preprints appeared in a journal less than six months after being posted. We evaluate 31 which journals have published the most preprints, and find that preprints with more 32 downloads are likely to be published in journals with a higher impact factor. Lastly, we 33 developed Rxivist.org, a website for downloading and interacting programmatically with 34 indexed metadata on bioRxiv preprints. 35 37 full-length research articles. PLOS Biology published 19. Genetics published 23. Cell 38 published 35. BioRxiv had posted more articles than all four-combined-by the end of 39 September 3 (Table S1). 40 3 BioRxiv (pronounced "Bio Archive") is a preprint server, a repository to which 41 researchers can post their papers directly to bypass the months-long turnaround time of 42 the publishing process and share their findings with the community more quickly (Berg 43 et al. 2016). Though the idea of preprints is far from new (Cobb 2017), researchers 44 have become vocally frustrated about the lengthy process of distributing research 45 through the conventional pipelines (Powell 2016), and numerous public laments have 46 been published decrying increasingly impractical demands of journals and reviewers 47 53 Against this backdrop, preprints have become a steady source of the most recent 54 research in biology, providing a valuable way to learn about exciting, relevant and high-55impact findings-for free-months or years before that research will appear anywhere 56 else, if at all (Kaiser 2017). It's a practice long familiar to physicists, who began 57 submitting preprints to arXiv, one of the earliest preprint servers, in 1991 (Verma 2017). 58
Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed “easy to install,” and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.