Proceedings of the First Workshop on Economics and Natural Language Processing 2018
DOI: 10.18653/v1/w18-3103
|View full text |Cite
|
Sign up to set email alerts
|

A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing

Abstract: We introduce JOCO, a novel text corpus for NLP analytics in the field of economics, business and management. This corpus is composed of corporate annual and social responsibility reports of the top 30 US, UK and German companies in the major (DJIA, FTSE 100, DAX), middlesized (S&P 500, FTSE 250, MDAX) and technology (NASDAQ, FTSE AIM 100, TECDAX) stock indices, respectively. Altogether, this adds up to 5,000 reports from 270 companies headquartered in three of the world's most important economies. The corpus s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 78 publications
0
3
0
Order By: Relevance
“…There are three major approaches for natural language processing (NLP), i.e., (1) the thesaurus-based approach [16], (2) the count-based approach (see [6] as a review), and (3) the inference-based approach. In this study, we adopt the inference-based approach as it involves deep learning, reported in previous studies as one of the most promising methods.…”
Section: Methodsmentioning
confidence: 99%
“…There are three major approaches for natural language processing (NLP), i.e., (1) the thesaurus-based approach [16], (2) the count-based approach (see [6] as a review), and (3) the inference-based approach. In this study, we adopt the inference-based approach as it involves deep learning, reported in previous studies as one of the most promising methods.…”
Section: Methodsmentioning
confidence: 99%
“…Previously released datasets typically comprise news articles or press releases annotated for sentiment analysis (Malo et al, 2014), event extraction (Jacobs and Hoste, 2022;Lee et al, 2022;Han et al, 2022), opinion analysis (Hu and Paroubek, 2021) or causality detection in finance (Mariko et al, 2020). In addition to news articles, corporate reports (Loukas et al, 2021;Händschke et al, 2018) are text corpora from economics and business, but lack token-level annotations. Our work fills two gaps by (i) being the first to address information extraction from scientific economic content and releasing a neural language model pretrained in that domain and (ii) defining a NER annotation scheme and releasing an annotated dataset for causal entities of economic impact evaluation.…”
Section: Nlp For Economicsmentioning
confidence: 99%
“…In more detail, as text-mining tools uncover and investigate patterns within texts, the preparation of the text corpus is highly relevant (see e.g., Händschke et al 2018). For instance, duplicates must be removed from text corpora to avoid biasing results.…”
Section: Transparency and Reproducibilitymentioning
confidence: 99%