2020
DOI: 10.2139/ssrn.3720213
|View full text |Cite
|
Sign up to set email alerts
|

Textual Information and IPO Underpricing: A Machine Learning Approach

Abstract: This study examines the predictive power of textual information from S-1 filings in explaining IPO underpricing. Our empirical approach differs from previous research, as we utilize several machine learning algorithms to predict whether an IPO will be underpriced, or not. We analyze a large sample of 2,481 U.S. IPOs from 1997 to 2016, and we find that textual information can effectively complement traditional financial variables in terms of prediction accuracy. In fact, models that use both textual data and fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 97 publications
(173 reference statements)
1
3
0
Order By: Relevance
“…This means that the TF-IDF weighting scheme produces a set of weights for our textual features that enhance the ability of our models to classify bidders from non-involved banks. This result is consistent with previous findings, as the TF-IDF approach tends to perform better in many NLP tasks compared to simple proportional weighting (Loughran and McDonald, 2011;Loughran and McDonald, 2016;Katsafados et al, 2020). Second, the use of our finance word embeddings increases the performance of both the TF and the TF-IDF centroid embedding model, compared to using generic word embeddings.…”
Section: Combination Of Financial Variables With Bag Of Words Textual Featuressupporting
confidence: 91%
“…This means that the TF-IDF weighting scheme produces a set of weights for our textual features that enhance the ability of our models to classify bidders from non-involved banks. This result is consistent with previous findings, as the TF-IDF approach tends to perform better in many NLP tasks compared to simple proportional weighting (Loughran and McDonald, 2011;Loughran and McDonald, 2016;Katsafados et al, 2020). Second, the use of our finance word embeddings increases the performance of both the TF and the TF-IDF centroid embedding model, compared to using generic word embeddings.…”
Section: Combination Of Financial Variables With Bag Of Words Textual Featuressupporting
confidence: 91%
“…We plan two types of tree-based machine learning models, namely, Random Forest Regressor (Baba and Sevil, 2020; Katsafados et al , 2020; Quintana et al , 2017) and Gradient Boosting Regressor (Muditomo and Broto, 2021). Both methods are very popular prediction algorithms.…”
Section: Methodsmentioning
confidence: 99%
“…For example, Goel and Gangolly (2012), Purda and Skillicorn (2015), and Goel and Uzuner (2016) perform textual analysis on Item 7 to detect corporate fraud. Katsafados et al (2020) combine Item 7 and Item 4 We use Beautiful Soup (https:// beautiful-soup-4.readthedocs.io/en/latest).…”
Section: Data and Toolkitmentioning
confidence: 99%