The platform will undergo maintenance on Sep 14 at about 9:30 AM EST and will be unavailable for approximately 1 hour.
Proceedings of the Third Workshop on Economics and Natural Language Processing 2021
DOI: 10.18653/v1/2021.econlp-1.2
|View full text |Cite
|
Sign up to set email alerts
|

EDGAR-CORPUS: Billions of Tokens Make The World Go Round

Abstract: We release edgar-corpus, a novel corpus comprising annual reports from all the publicly traded companies in the us spanning a period of more than 25 years. To the best of our knowledge, edgar-corpus is the largest financial nlp corpus available to date. All the reports are downloaded, split into their corresponding items (sections), and provided in a clean, easy-to-use json format. We use edgar-corpus to train and release edgar-w2v, which are word2vec embeddings for the financial domain. We employ these embedd… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 16 publications
(22 citation statements)
references
References 9 publications
0
13
0
Order By: Relevance
“…In this section, we shortly review the key points of KPI-BERT, a model tailored for key performance indicator extraction, introduced and illustrated in much greater detail in [2] Thereafter, one span-level approach is briefly touched upon, namely the SpERT model introduced by [16]. We then provide two further baselines building on EDGAR-W2V [5] and GloVe [43], leveraging a similar setup like [2]. These four models will be the baselines we provide to other researchers to benchmark their model against on KPI-EDGAR.…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…In this section, we shortly review the key points of KPI-BERT, a model tailored for key performance indicator extraction, introduced and illustrated in much greater detail in [2] Thereafter, one span-level approach is briefly touched upon, namely the SpERT model introduced by [16]. We then provide two further baselines building on EDGAR-W2V [5] and GloVe [43], leveraging a similar setup like [2]. These four models will be the baselines we provide to other researchers to benchmark their model against on KPI-EDGAR.…”
Section: Methodsmentioning
confidence: 99%
“…Due to EDGAR's popularity, many researchers ( [3], [4], [6]) have developed methods to extract data from EDGAR and used it in their own research. [5] even released a comprehensive corpus comprising annual reports from all the publicly traded companies in the US spanning a period of more than 25 years and accompanied it with a word2vec [37] model titled EDGAR-W2V, which we will also use as a baseline in our experiments. Furthermore, [38], [39], and [40] have applied machine learning methods to EDGAR data and have provided useful results that can be used in real-world financial applications.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations