We introduce a technique to filter out complex data sets by extracting a subgraph of representative links. Such a filtering can be tuned up to any desired level by controlling the genus of the resulting graph. We show that this technique is especially suitable for correlation-based graphs, giving filtered graphs that preserve the hierarchical organization of the minimum spanning tree but containing a larger amount of information in their internal structure. In particular in the case of planar filtered graphs (genus equal to 0), triangular loops and four-element cliques are formed. The application of this filtering procedure to 100 stocks in the U.S. equity markets shows that such loops and cliques have important and significant relationships with the market structure and properties.cluster analysis ͉ complex networks ͉ correlation analysis S everal complex systems have been investigated recently from the perspective of the (weighted) networks that are linking the different elements comprising them (1-4). Indeed, complex systems are in general made of several interacting elements, and it is rather natural to associate to each element a node and to each interaction a link yielding to a graph. Examples include food webs (5), scientific citations (6), social networks (7, 8), communication networks (9), sexual contacts among individuals (10), company links in a stock portfolio (11), the Internet (12), and the World Wide Web (13). The properties of such graphs have been studied with the aim of catching basic features of the investigated systems (14-16). However, the complexity of the system is generally reflected in the associated graph, which results in an intricate interweaved and densely connected structure. There is therefore a general need to find methods that are able to single out the key information by filtering such a complex graph into a simpler relevant subgraph. Such a filtering is especially essential for correlation-based graphs where, in the absence of any filtering procedure, all links among elements are present.In this work, we introduce a filtering procedure that extracts a representative subgraph with a controlled complexity and maximal information content out of the graph describing the system. To illustrate the method, we present a concrete example dealing with 100 stocks belonging to a U.S. equity portfolio. In the modeling of equity portfolios, a natural starting point is the investigation of cross-correlation among time series of returns of stock pairs. The correlation provides a similarity measure among the behavior of different elements in the system. It was shown by one of us that a powerful method to investigate financial systems consists in the extraction of a minimal set of relevant interactions associated with the strongest correlations belonging to the minimum spanning tree (MST) (11). However, the reduction to a minimal skeleton of links is necessarily very drastic in filtering correlation-based networks, losing therefore valuable information. The necessity of a less drastic filtering ...
Many complex systems present an intrinsic bipartite structure where elements of one set link to elements of the second set. In these complex systems, such as the system of actors and movies, elements of one set are qualitatively different than elements of the other set. The properties of these complex systems are typically investigated by constructing and analyzing a projected network on one of the two sets (for example the actor network or the movie network). Complex systems are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set, and this heterogeneity makes it very difficult to discriminate links of the projected network that are just reflecting system's heterogeneity from links relevant to unveil the properties of the system. Here we introduce an unsupervised method to statistically validate each link of a projected network against a null hypothesis that takes into account system heterogeneity. We apply the method to a biological, an economic and a social complex system. The method we propose is able to detect network structures which are very informative about the organization and specialization of the investigated systems, and identifies those relationships between elements of the projected network that cannot be explained simply by system heterogeneity. We also show that our method applies to bipartite systems in which different relationships might have different qualitative nature, generating statistically validated networks in which such difference is preserved.
What are the dominant stocks which drive the correlations present among stocks traded in a stock market? Can a correlation analysis provide an answer to this question? In the past, correlation based networks have been proposed as a tool to uncover the underlying backbone of the market. Correlation based networks represent the stocks and their relationships, which are then investigated using different network theory methodologies. Here we introduce a new concept to tackle the above question—the partial correlation network. Partial correlation is a measure of how the correlation between two variables, e.g., stock returns, is affected by a third variable. By using it we define a proxy of stock influence, which is then used to construct partial correlation networks. The empirical part of this study is performed on a specific financial system, namely the set of 300 highly capitalized stocks traded at the New York Stock Exchange, in the time period 2001–2003. By constructing the partial correlation network, unlike the case of standard correlation based networks, we find that stocks belonging to the financial sector and, in particular, to the investment services sub-sector, are the most influential stocks affecting the correlation profile of the system. Using a moving window analysis, we find that the strong influence of the financial stocks is conserved across time for the investigated trading period. Our findings shed a new light on the underlying mechanisms and driving forces controlling the correlation profile observed in a financial market.
We discuss some methods to quantitatively investigate the properties of correlation matrices.\ud Correlation matrices play an important role in portfolio optimization and in several\ud other quantitative descriptions of asset price dynamics in financial markets. Here, we discuss\ud how to define and obtain hierarchical trees, correlation based trees and networks from\ud a correlation matrix. The hierarchical clustering and other procedures performed on the correlation\ud matrix to detect statistically reliable aspects of it are seen as filtering procedures of\ud the correlation matrix. We also discuss a method to associate a hierarchically nested factor\ud model to a hierarchical tree obtained from a correlation matrix. The information retained\ud in filtering procedures and its stability with respect to statistical fluctuations is quantified\ud by using the Kullback–Leibler distance
We investigate the planar maximally filtered graphs of the portfolio of the 300 most capitalized stocks traded at the New York Stock Exchange during the time period [2001][2002][2003]. Topological properties such as the average length of shortest paths, the betweenness and the degree are investigated on different planar maximally filtered graphs generated by sampling the returns at different time horizons ranging from 5 min up to one trading day. This investigation confirms that the selected stocks compose a hierarchical system progressively structuring as the sampling time horizon increases. Finally, a cluster formation, associated to economic sectors, is quantitatively investigated.
We introduce a new technique to associate a spanning tree to the average linkage cluster analysis.We term this tree as the Average Linkage Minimum Spanning Tree. We also introduce a technique to associate a value of reliability to links of correlation based graphs by using bootstrap replicas of data. Both techniques are applied to the portfolio of the 300 most capitalized stocks traded at New York Stock Exchange during the time period [2001][2002][2003]. We show that the Average Linkage Minimum Spanning Tree recognizes economic sectors and sub-sectors as communities in the network slightly better than the Minimum Spanning Tree does. We also show that the average reliability of links in the Minimum Spanning Tree is slightly greater than the average reliability of links in the Average Linkage Minimum Spanning Tree.
We investigate the daily correlation present among market indices of stock exchanges located all over the world in the time period Jan 1996 -Jul 2009. We discover that the correlation among market indices presents both a fast and a slow dynamics. The slow dynamics reflects the development and consolidation of globalization. The fast dynamics is associated with critical events that originate in a specific country or region of the world and rapidly affect the global system. We provide evidence that the short term timescale of correlation among market indices is less than 3 trading months (about 60 trading days). The average values of the non diagonal elements of the correlation matrix, correlation based graphs and the spectral properties of the largest eigenvalues and eigenvectors of the correlation matrix are carrying information about the fast and slow dynamics of correlation of market indices. We introduce a measure of mutual information based on link co-occurrence in networks, in order to detect the fast dynamics of successive changes of correlation based graphs in a quantitative way.
We show how to achieve a statistical description of the hierarchical structure of a multivariate data set. Specifically we show that the similarity matrix resulting from a hierarchical clustering procedure is the correlation matrix of a factor model, the hierarchically nested factor model. In this model, factors are mutually independent and hierarchically organized. Finally, we use a bootstrap based procedure to reduce the number of factors in the model with the aim of retaining only those factors significantly robust with respect to the statistical uncertainty due to the finite length of data records.PACS numbers: 02.50. Sk, 89.65.Gh Many complex systems observed in the physical, biological and social sciences are organized in a nested hierarchical structure, i.e. the elements of the system can be partitioned in clusters which in turn can be partitioned in subclusters and so on up to a certain level [1,2]. Several examples of hierarchically organized physical [3,4], biological [5,6,7] and social [8,9,10,11] systems have been investigated in the literature. The hierarchical structure of interactions among elements strongly affects the dynamics of complex systems. Therefore, a quantitative description of hierarchical properties of the system is a key step in the modeling of complex systems. In this letter, we address the problem of inferring a factor model from a multivariate data set. A factor model is a mathematical model which attempts to explain the correlation between a large set of variables in terms of a small number of underlying factors. A major assumption of factor analysis is that it is not possible to observe these factors directly; the variables depend upon the factors but are also subject to random errors [12]. We show that the factor model we introduce fully describes the hierarchical structure of interactions among elements of the complex system. Such a structure is elicited by hierarchical clustering of multivariate data. The analysis of multivariate data provides crucial information in the investigation of a wide variety of systems. Multivariate analysis methods are designed to extract information both on the number of main factors characterizing the dynamics of the investigated system and on the composition of the groups (clusters) in which the system is intrinsically organized. Recently, physicists started to contribute to the development of new multivariate techniques (e.g. [11,13,14,15,16,17,18]). Among multivariate techniques, natural candidates for detecting the hierarchical structure of a set of data are hierarchical clustering methods [19]. These methods allow to associate a dendrogram with a correlation matrix (or more generally with a similarity matrix), i.e. they give a schematic description of hierarchies. It is worth pointing out that the whole information contained in the dendrogram can be stored in a filtered similarity matrix C < [19]. The matrix C < has well defined metric properties. When the matrix C < of elements ρ < ij is obtained by starting from a correlation matrix, the...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.