The increasing availability of data demands for techniques to filter information in large complex networks of interactions. A number of approaches have been proposed to extract network backbones by assessing the statistical significance of links against null hypotheses of random interaction. Yet, it is well known that the growth of most real-world networks is non-random, as past interactions between nodes typically increase the likelihood of further interaction. Here, we propose a filtering methodology inspired by the Pólya urn, a combinatorial model driven by a self-reinforcement mechanism, which relies on a family of null hypotheses that can be calibrated to assess which links are statistically significant with respect to a given network’s own heterogeneity. We provide a full characterization of the filter, and show that it selects links based on a non-trivial interplay between their local importance and the importance of the nodes they belong to.
We introduce the first end-to-end Deep Reinforcement Learning (DRL) based framework for active high frequency trading. We train DRL agents to trade one unit of Intel Corporation stock by employing the Proximal Policy Optimization algorithm. The training is performed on three contiguous months of high frequency Limit Order Book data, of which the last month constitutes the validation data. In order to maximise the signal to noise ratio in the training data, we compose the latter by only selecting training samples with largest price changes. The test is then carried out on the following month of data. Hyperparameters are tuned using the Sequential Model Based Optimization technique. We consider three different state characterizations, which differ in their LOB-based meta-features. Analysing the agents' performances on test data, we argue that the agents are able to create a dynamic representation of the underlying environment. They identify occasional regularities present in the data and exploit them to create long-term profitable trading strategies. Indeed, agents learn trading strategies able to produce stable positive returns in spite of the highly stochastic and non-stationary environment.
Synchronising a database of stock specific news with 5 years worth of order book data on 300 stocks, we show that abnormal price movements following news releases (exogenous) exhibit markedly different dynamical features from those arising spontaneously (endogenous). On average, large volatility fluctuations induced by exogenous events occur abruptly and are followed by a decaying power-law relaxation, while endogenous price jumps are characterized by progressively accelerating growth of volatility, also followed by a power-law relaxation, but slower than for exogenous jumps. Remarkably, our results are reminiscent of what is observed in different contexts, namely Amazon book sales and YouTube views. Finally, we show that fitting power-laws to individual volatility profiles allows one to classify large events into endogenous and exogenous dynamical classes, without relying on the news feed.
natural and social multivariate systems are commonly studied through sets of simultaneous and timespaced measurements of the observables that drive their dynamics, i.e., through sets of time series. Typically, this is done via hypothesis testing: the statistical properties of the empirical time series are tested against those expected under a suitable null hypothesis. This is a very challenging task in complex interacting systems, where statistical stability is often poor due to lack of stationarity and ergodicity. Here, we describe an unsupervised, data-driven framework to perform hypothesis testing in such situations. This consists of a statistical mechanical approach-analogous to the configuration model for networked systems-for ensembles of time series designed to preserve, on average, some of the statistical properties observed on an empirical set of time series. We showcase its possible applications with a case study on financial portfolio selection. Hypothesis testing lies at the very core of the scientific method. In its general formulation, it hinges upon contrasting the observed statistical properties of a system with those expected under a null hypothesis. In particular, hypothesis testing allows to discard potential models of a system when empirical measurement that would be exceedingly unlikely under them are made. However, there is often no theory to guide the investigation of a system's dynamics. What is worse, in many practical situations one may be given a single-and possibly unreproducible-set of experimental data. This is indeed the case when dealing with most complex systems, whose collective dynamics often are markedly nonstationary, ranging from climate 1,2 to brain activity 3 and financial markets 4-6. This, in turn, makes hypothesis testing in complex systems a very challenging task, that potentially prevents from assessing which properties observed in a given data sample are "untypical", i.e, unlikely to be observed again in a sample collected at a different point in time. This issue is usually tackled by constructing ensembles of artificial time series sharing some characteristics with those generated by the dynamics of the system under study. This can be done either via modelling or in a purely data-driven way. In the latter case, the technique most frequently used by both researchers and practitioners is bootstrapping 7,8 , which amounts to generating partially randomised versions of the available data via resampling that can then be used as a null benchmark to perform hypothesis testing. Depending on its specificities, bootstrapping can account for autocorrelations and cross-correlations in time series sampled from multivariate systems. However, it relies on assumptions, such as sample independence and some form of stationarity 9 , which limit its power when dealing with complex systems. As far as model-driven approaches are concerned, the literature is extremely vast 10. Broadly speaking, modelling approaches rely on a priori structural assumptions for the system's dynamics, and on i...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.