Article HistoryKeywords Finance Social media data Factorization Machine Overnight information Statistical arbitrage High-frequency trading.
JEL ClassificationC2, C5, C6, G1, G12. This paper develops a statistical arbitrage strategy based on overnight social media data and applies it to high-frequency data of the S&P 500 constituents from January 2014 to December 2015. The established trading framework predicts future financial markets using Factorization Machines, which represent a state-of-the-art algorithm coping with high-dimensional data in very sparse settings. Essentially, we implement and analyze the effectiveness of support vector machines (SVM), second-order Factorization Machines (SFM), third-order Factorization Machines (TFM), and adaptive-order Factorization Machines (AFM). In the back-testing study, we prove the efficiency of Factorization Machines in general and show that increasing complexity of Factorization Machines provokes higher profitability -annualized returns after transaction costs vary between 5.96 percent for SVM and 13.52 percent for AFM, compared to 5.63 percent for a naive buy-and-hold strategy of the S&P 500 index. The corresponding Sharpe ratios range between 1.00 for SVM and 2.15 for AFM. Varying profitability during the opening minutes can be explained by the effects of market efficiency and trading turmoils. Additionally, the AFM approach achieves the highest accuracy rate and generates statistically and economically remarkable returns after transaction costs without loading on any systematic risk exposure.Contribution/Originality: This study contributes in the existing literature by predicting financial markets based on overnight social media data. For this purpose, we observe tweets about the S&P 500 companies during the time span in which stock markets are closed and forecast the future price changes based on the collected information. stock market by applying support vector machines. Jin et al. (2013) made forecasts by deploying a linear regression model based on news articles, historical stock indices, and currency exchange values. Chatrath et al. (2014) examined the impact of macro news on currency jumps by a stepwise multivariate regression in a Probit model. All of theses studies are not in a position to consider the effect of overnight textual data on future price changes -an obvious deficit since information in social media, news, blogs, forums, and announcements are published 24 hours a day, 7 days a week.