2021
DOI: 10.1002/asi.24454
|View full text |Cite
|
Sign up to set email alerts
|

Softcite dataset: A dataset of software mentions in biomedical and economic research publications

Abstract: Software contributions to academic research are relatively invisible, especially to the formalized scholarly reputation system based on bibliometrics. In this article, we introduce a gold‐standard dataset of software mentions from the manual annotation of 4,971 academic PDFs in biomedicine and economics. The dataset is intended to be used for automatic extraction of software mentions from PDF format research publications by supervised learning at scale. We provide a description of the dataset and an extended d… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
30
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(35 citation statements)
references
References 52 publications
1
30
0
Order By: Relevance
“…Software mentions in scientific articles have been analyzed for several reasons including mapping the landscape of available scientific software, analyses of software citation practices and measuring the impact of software in science ( Krüger & Schindler, 2020 ). This includes manual analyses based on high quality data, such as Howison & Bullard (2016) , Du et al (2021) , Nangia & Katz (2017) and Schindler et al (2021b) but also automatic analyses such as Pan et al (2015) , Duck et al (2016) and Schindler, Zapilko & Krüger (2020) . While manual analyses provide highly reliable data, results often only provide a small excerpt and do not generalize due to small sample size.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Software mentions in scientific articles have been analyzed for several reasons including mapping the landscape of available scientific software, analyses of software citation practices and measuring the impact of software in science ( Krüger & Schindler, 2020 ). This includes manual analyses based on high quality data, such as Howison & Bullard (2016) , Du et al (2021) , Nangia & Katz (2017) and Schindler et al (2021b) but also automatic analyses such as Pan et al (2015) , Duck et al (2016) and Schindler, Zapilko & Krüger (2020) . While manual analyses provide highly reliable data, results often only provide a small excerpt and do not generalize due to small sample size.…”
Section: Related Workmentioning
confidence: 99%
“… Howison & Bullard (2016) , for instance, analyzed software mentions in science by content analysis in 90 articles. The main objective of Du et al (2021) and Schindler et al (2021b) was to create annotated corpora of high quality for supervised learning of software mentions in scientific articles. Du et al (2021) provide labels for software, version, developer, and URL for articles from PMC, which is multidisciplinary but strongly skewed towards Medicine (see Table A11 ) and Economics.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Tracking the use of software via text-mentions introduces some methodological challenges, which might limit the identification of software names in large texts (Du et al, 2021). First, there may be different ways to invoke the same software, a software project name (e.g., in GitHub), the URL of the software's official website, the URL to the repository where it is hosted, mentions to unpublished manuscripts about the software, users' manuals, etc.…”
Section: Textual Approaches To Track Academic Software Usagementioning
confidence: 99%
“…Figure 5 shows the demo of a new feature that informs users how the searched-for software has been mentioned in existing research publications. This feature will be supported by a software knowledge base built on a gold-standard dataset of nearly 5,000 software mentions [3] that recognizes software mentioned in the research literature as well as its authorship, programming language, operating environment, license, etc. This feature can further support research software developers to understand how effective their requests for citation have been and to make an evidenced argument for credit in their local institutional environment.…”
Section: Future Workmentioning
confidence: 99%