2019
DOI: 10.3389/frma.2018.00036
|View full text |Cite
|
Sign up to set email alerts
|

The NLP4NLP Corpus (I): 50 Years of Publication, Collaboration and Citation in Speech and Language Processing

Abstract: This paper introduces the NLP4NLP corpus, which contains articles published in 34 major conferences and journals in the field of speech and natural language processing over a period of 50 years (1965-2015), comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing ∼270 million words. Most of these publications are in English, some are in French, German, or Russian. Some are open access, others have been provided by the publishers. In order to constitute and analyze th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
20
0
1

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(21 citation statements)
references
References 18 publications
0
20
0
1
Order By: Relevance
“…This work is inspired by a vast amount of past research, including that on Google Scholar (Khabsa and Giles, 2014;Howland, 2010;Orduña-Malea et al, 2014;Martín-Martín et al, 2018), on the analysis of NLP papers (Radev et al, 2016;Anderson et al, 2012;Bird et al, 2008;Schluter, 2018;Mariani et al, 2018;Qazvinian et al, 2013;Teich, 2010;Saggion et al, 2017), on citation intent (Aya et al, 2005;Teufel et al, 2006;Pham and Hoffmann, 2003;Nanba et al, 2011;Mohammad et al, 2009;Zhu et al, 2015), and on measuring scholarly impact (Ravenscroft et al, 2017;Priem and Hemminger, 2010;Bulaitis, 2017;Bos and Nitza, 2019;Ioannidis et al, 2019;Yogatama et al, 2011;Mishra et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…This work is inspired by a vast amount of past research, including that on Google Scholar (Khabsa and Giles, 2014;Howland, 2010;Orduña-Malea et al, 2014;Martín-Martín et al, 2018), on the analysis of NLP papers (Radev et al, 2016;Anderson et al, 2012;Bird et al, 2008;Schluter, 2018;Mariani et al, 2018;Qazvinian et al, 2013;Teich, 2010;Saggion et al, 2017), on citation intent (Aya et al, 2005;Teufel et al, 2006;Pham and Hoffmann, 2003;Nanba et al, 2011;Mohammad et al, 2009;Zhu et al, 2015), and on measuring scholarly impact (Ravenscroft et al, 2017;Priem and Hemminger, 2010;Bulaitis, 2017;Bos and Nitza, 2019;Ioannidis et al, 2019;Yogatama et al, 2011;Mishra et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…The analysis in this paper is based on a subset of articles from the ACL Anthology. While corpora of NLP publications, including the ACL Anthology, already exist (Bird et al, 2008;Radev et al, 2009;Mariani et al, 2019a), none of them include publications newer than 2015. We compiled our own dataset because we are mostly interested in the papers published in recent years.…”
Section: Datamentioning
confidence: 99%
“…Scientific progress benefits from researchers "standing on the shoulders of giants" and one way for researchers to recognise those shoulders is by citing articles that influence and inform their work. The nature of citations in NLP publications has previously been analysed with regards to topic areas (Anderson et al, 2012;Gollapalli and Li, 2015;Mariani et al, 2019b), semantic relations (Gábor et al, 2016), gender issues (Vogel and Jurafsky, 2012;Schluter, 2018), the role of sharing software (Wieling et al, 2018), and citation and collaboration networks (Radev et al, 2016;Mariani et al, 2019a). Mohammad (2019) provides the most recent analysis of the ACL Anthology, looking at demographics, topic areas, and research impact via citation analysis.…”
mentioning
confidence: 99%
“…In the previous paper (Mariani et al, 2018b), we introduced the NLP4NLP corpus. This corpus contains articles published in 34 major conferences and journals in the field of speech and natural language processing over a period of 50 years , comprising 65,000 documents, gathering 50,000 authors, including 325,000 references and representing ∼270 million words.…”
Section: The Nlp4nlp Corpusmentioning
confidence: 99%
“…The results of this study are presented in two companion papers. The former one (Mariani et al, 2018b) introduces the corpus with various analyses: evolution over time of the number of papers and authors, including their distribution by gender, as well as collaboration among authors and citation patterns among authors and papers. In the present paper, we will consider the evolution of research topics over time and identify the authors who introduced and mainly contributed to key innovative topics, the use of Language Resources over time and the reuse of papers and plagiarism within and across publications.…”
Section: Introduction Preliminary Remarksmentioning
confidence: 99%