Discovering Finance Keywords via Continuous-Space Language Models

Tsai, Ming-Feng; Wang, Chuan‐Ju; Chien, Po-Chuan

doi:10.1145/2948072

Cited by 25 publications

(24 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, concern can be replaced with pertain only if it does not have any sentiment polarity. It can be seen that expanding the lexicon using word embeddings, like previous works did (Tsai and Wang, 2014;Tsai et al, 2016;Rekabsaz et al, 2017), can be problematic and may end up with a lexicon expansion contain- ing semantically close but sentimentally far words. Another interesting word in the list is DMAA.…”

Section: Discussionmentioning

confidence: 96%

“…Stemming decreases the vocabulary size of the word embeddings and thus reduces the parameters of the model. Stemming is also required to use word vectors trained by Tsai et al (2016) since the corpora which is used to train the word embeddings consists of stemmed reports.…”

Section: Preprocessingmentioning

confidence: 99%

“…Instead of random initialization of the embedding layer of the model, initialization with pretrained word embeddings enables the model to capture contextual information faster and better. In our work, we used pretrained word embeddings supported by Tsai et al (2016). They used MD&A section of 10-K reports from 1996 to 2013 to train the word embeddings with a vector dimension of 200 by word2vec 6 continuous bagof-words (CBOW) (Mikolov et al, 2013).…”

Section: Word Embeddingmentioning

confidence: 99%

“…• CNN-NTC-multichannel: Same as CNN-STC-multichannel but its transferred convolution layer is non-static Table 1 indicates that performance of our CNNsimple (baseline) model is comparable with EXP-SYN, the best model represented by Tsai et al (2016), which uses a manually created lexicon and POS tagger. Furthermore, the best predictions for the years 2008 and 2010 are achieved by the CNNsimple model.…”

Section: Extended Modelsmentioning

confidence: 99%

See 3 more Smart Citations

Convolutional Neural Networks for Financial Text Regression

Dereli¹,

Saraçlar²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

Forecasting financial volatility of a publiclytraded company from its annual reports has been previously defined as a text regression problem. Recent studies use a manually labeled lexicon to filter the annual reports by keeping sentiment words only. In order to remove the lexicon dependency without decreasing the performance, we replace bag-of-words model word features by word embedding vectors. Using word vectors increases the number of parameters. Considering the increase in number of parameters and excessive lengths of annual reports, a convolutional neural network model is proposed and transfer learning is applied. Experimental results show that the convolutional neural network model provides more accurate volatility predictions than lexicon based models.

show abstract

Section: Discussionmentioning

confidence: 96%

Section: Preprocessingmentioning

confidence: 99%

Section: Word Embeddingmentioning

confidence: 99%

Section: Extended Modelsmentioning

confidence: 99%

See 2 more Smart Citations

Convolutional Neural Networks for Financial Text Regression

Dereli¹,

Saraçlar²

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

show abstract

“…They found that doing so both improves the performance of a Ranking Support Vector Machine (SVM rank ) as well as a Support Vector Regression (SVR) model with bag-of-word vectors as features and stock return volatility in the year after the filing date as label. Following up, Tsai et al [33] show that such an expanded dictionary can effectively be used to not only predict return volatility, but also post-event volatility (estimated with the Fama-French 3-factor model [7]) in the following year. Although the authors acknowledge that the regression on post-event volatility is sensitive with regard to the number of added candidates k [33, cf.…”

Section: Natural Language Processing Literaturementioning

confidence: 99%

Explaining Financial Uncertainty through Specialized Word Embeddings

Theil

Štajner²,

Stuckenschmidt

2020

ACM/IMS Trans. Data Sci.

View full text Add to dashboard Cite

The detection of vague, speculative, or otherwise uncertain language has been performed in the encyclopedic, political, and scientific domains yet left relatively untouched in finance. However, the latter benefits from public sources of big financial data that can be linked with extracted measures of linguistic uncertainty as a mean of extrinsic model validation. Doing so further helps in understanding how the linguistic uncertainty of financial disclosures might induce financial uncertainty to the market. To explore this field, we use term weighting methods to detect linguistic uncertainty in a large dataset of financial disclosures. As a baseline, we use an existing dictionary of financial uncertainty triggers; furthermore, we retrieve related terms in specialized word embedding models to automatically expand this dictionary. Apart from an industry-agnostic expansion, we create expansions incorporating industry-specific jargon. In a set of cross-sectional event study regressions, we show that the such enriched dictionary explains a significantly larger share of future volatility, a common financial uncertainty measure, than before. Furthermore, we show that-different to the plain dictionary-our embedding models are well suited to explain future analyst forecast uncertainty. Notably, our results indicate that enriching the dictionary with industry-specific vocabulary explains a significantly larger share of financial uncertainty than an industry-agnostic expansion.

show abstract

Smart Beta and Risk Factors Based on Textural Data and Machine Learning

Zhang

Xie

2022

Alternative Data and Artificial Intelligence Techniques

View full text Add to dashboard Cite

Discovering Finance Keywords via Continuous-Space Language Models

Cited by 25 publications

References 22 publications

Convolutional Neural Networks for Financial Text Regression

Convolutional Neural Networks for Financial Text Regression

Explaining Financial Uncertainty through Specialized Word Embeddings

Smart Beta and Risk Factors Based on Textural Data and Machine Learning

Contact Info

Product

Resources

About