2021
DOI: 10.1101/2021.03.04.433874
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Linguistic Analysis of the bioRxiv Preprint Landscape

Abstract: Preprints allow researchers to make their findings available to the scientific community before they have undergone peer review. Studies on preprints within bioRxiv have been largely focused on article metadata and how often these preprints are downloaded, cited, published, and discussed online. A missing element that has yet to be examined is the language contained within the bioRxiv preprint repository. We sought to compare and contrast linguistic features within bioRxiv preprints to published biomedical tex… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
11
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(13 citation statements)
references
References 61 publications
2
11
0
Order By: Relevance
“…The changes between the first version of the preprint (which we analysed) and the final journal publication may result from journal peer review, comments on the preprint, feedback from colleagues outside of the context of the preprint, and additional development by the authors independent of these sources. Perhaps as a result of these factors, we found an association between the degree of change and delay between preprint posting and journal publication, although only for non-COVID-19 articles, in agreement with Nicholson and colleagues [ 14 ]. COVID-19 articles appear to have consistently been expedited through publication processes, regardless of degree of changes during peer review.…”
Section: Discussionsupporting
confidence: 92%
See 1 more Smart Citation
“…The changes between the first version of the preprint (which we analysed) and the final journal publication may result from journal peer review, comments on the preprint, feedback from colleagues outside of the context of the preprint, and additional development by the authors independent of these sources. Perhaps as a result of these factors, we found an association between the degree of change and delay between preprint posting and journal publication, although only for non-COVID-19 articles, in agreement with Nicholson and colleagues [ 14 ]. COVID-19 articles appear to have consistently been expedited through publication processes, regardless of degree of changes during peer review.…”
Section: Discussionsupporting
confidence: 92%
“…Several studies have assessed such differences. For example, Klein and colleagues used quantitative measures of textual similarity to compare preprints from arXiv and bioRxiv with their published versions [ 13 ], concluding that papers change “very little.” Recently, Nicholson and colleagues employed document embeddings to show that preprints with greater textual changes compared with the journal versions took longer to be published and were updated more frequently [ 14 ]. However, changes in the meaning of the content may not be directly related to changes in textual characters, and vice versa (e.g., a major rearrangement of text or figures might simply represent formatting changes, while the position of a single decimal point could significantly alter conclusions).…”
Section: Introductionmentioning
confidence: 99%
“…The changes between the first version of the preprint (which we analysed) and the final journal publication may result from journal peer review, comments on the preprint, feedback from colleagues outside of the context of the preprint, and additional development by the authors independent of these sources. Perhaps as a result of these factors, we found an association between the degree of change and delay between preprint posting and journal publication, though only for non-COVID-19 articles, in agreement with Nicholson et al [14]. COVID-19 articles appear to have consistently been expedited through publication processes, regardless of degree of changes during peer review.…”
Section: Discussionsupporting
confidence: 91%
“…This demonstrates the necessity that future studies will focus on more semantic natural language processing approaches when comparing manuscripts that go beyond shallow differences between strings of texts [37]. Recent research has begun to explore the potential of word embeddings for this task (see for instance [14], and Knoth and Herrmannova have even coined the term “Semantometrics” [32] to describe the intersection of NLP and Scientometrics. Nevertheless, the difficulty when dealing with such complex semantic phenomena is that different assessors may annotate changes differently.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation