Knowledge Discovery and Data Design Innovation 2017
DOI: 10.1142/9789813234482_0004
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Identification of Research Articles Containing Data Usage Statements

Abstract: Modern scientific research is characterized with sharing datasets and reusing data for developing new models and theories. This paper describes a study to identify research articles with data use and reuse information. Applying a bootstrapping-based unsupervised training strategy, we were able to develop text patterns automatically out of a large training collection of research articles. These patterns were then used to distinguish articles with data use and reuse from those without data usage. Our experiments… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 26 publications
(29 reference statements)
0
2
0
Order By: Relevance
“…This heterogeneity in the data set mentions translates into a difference in the performance of the empirical strategies used to solve the problem, according to the scientific field in which it is used. Zhang et al (2016Zhang et al ( , 2018 describe implementing a bootstrapping-based unsupervised training strategy based on previous work to distinguish articles with data use and reuse from those without data usage.…”
Section: Data Set Mentions Are Domain-specificmentioning
confidence: 99%
“…This heterogeneity in the data set mentions translates into a difference in the performance of the empirical strategies used to solve the problem, according to the scientific field in which it is used. Zhang et al (2016Zhang et al ( , 2018 describe implementing a bootstrapping-based unsupervised training strategy based on previous work to distinguish articles with data use and reuse from those without data usage.…”
Section: Data Set Mentions Are Domain-specificmentioning
confidence: 99%
“…Several methods, such as weakly supervised (Hoffmann et al , 2011) and unsupervised learning (Zhang and Elhadad, 2013), have been proposed to address training corpus acquisition. Zhang et al (2017) proposed an unsupervised approach based on pattern lists to identify data usage at the article level. By applying a bootstrapping strategy to generate text patterns automatically, their method can achieve an F-measure of 85% in determining whether a data usage statement is included in computer science literature.…”
Section: Literature Reviewmentioning
confidence: 99%