The majority of electronic markets worldwide employ limit order books, and the recently emerging exchanges for cryptocurrencies pose no exception. With this work, we empirically analyze whether commonly observed empirical properties from established limit order exchanges transfer to the cryptocurrency domain. Based on the literature, we establish a structured methodological framework to conduct analyses in a systematic and comprehensive way. We then present results from a unique and extensive limit order data set acquired from major cryptocurrency exchanges for the currency pair Bitcoin to US Dollar. We recover many observations from mature markets, such as a symmetry between the average ask and the average bid side of the order book, autocorrelation in returns on the smallest time scales only, volatility clustering and the timing of large trades. We also observe some idiosyncrasies: The distributions of trade size and limit order prices deviate from commonly observed patterns. Also, we find limit order books to be relatively shallow and liquidity costs to be relatively high when compared to established markets.
Most statistical arbitrage strategies in the academic literature soley rely on price time series. By contrast, alternative data sources are of growing importance for professional investors. We contribute to bridging this gap by assessing the price-predictive value of more than nine million tweets on intraday returns of the S&P 500 constituents. For this purpose, we design a machine learning pipeline addressing specific challenges inherent to this task. At first, we engineer domain-specific features along three categories, i.e., directional indicators, relevance indicators and meta features. Next, we leverage a random forest to extract the relationship between these features and subsequent stock returns in a low signal-to-noise setting. For performance evaluation, we run a rigorous eventbased backtesting study across all tweets and stocks. We find annualized returns of 6.4 percent and a Sharpe ratio of 2.2 after transaction costs. Finally, we illuminate the machine learning black box and unveil sources of profitability: First, results are both driven and limited by the temporal clustering of tweets, i.e., the majority of profits stem from tweets clustered closely together in time, corresponding to high-event situations. Second, the importance of included features follows an economic rationale, e.g., tweets with positive sentiment tend to yield positive returns and vice versa. Third, we find that stocks of medium market capitalization and from the consumer and technology sectors contribute most to our results, which we interpret as a trade-off between tweet coverage and tweet relevance.
In this paper, we demonstrate how a well-established machine learning-based statistical arbitrage strategy can be successfully transferred from equity to futures markets. First, we preprocess futures time series comprised of front months to render them suitable for our returns-based trading framework and compile a data set comprised of 60 futures covering nearly 10 trading years. Next, we train several machine learning models to predict whether the h-day-ahead return of each future out- or underperforms the corresponding cross-sectional median return. Finally, we enter long/short positions for the top/flop-k futures for a duration of h days and assess the financial performance of the resulting portfolio in an out-of-sample testing period. Thereby, we find the machine learning models to yield statistically significant out-of-sample break-even transaction costs of 6.3 bp—a clear challenge to the semi-strong form of market efficiency. Finally, we discuss sources of profitability and the robustness of our findings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.