Abstract:A nonlinear dynamics approach can be used in order to quantify complexity in written texts. As a first step, a one-dimensional system is examined : two written texts by one author (Lewis Carroll) are considered, together with one translation, into an artificial language, i.e. Esperanto are mapped into time series. Their corresponding shuffled versions are used for obtaining a "base line". Two different one-dimensional time series are used here: (i) one based on word lengths (LTS), (ii) the other on word freque… Show more
“…Pearson correlation coefficient |r| between F (X) (j=0) and the most informative measurements found for W = 1300 (see table 3). Because all correlations assume low values, the information conveyed by F (X) (2) differs from the simple average X = F (X) (0) .…”
Section: Stylistic Variation Among Booksmentioning
confidence: 99%
“…To construct the tree, the C4.5 algorithm was employed in subtexts comprising W = 1300 tokens. Note that the second component of the average shortest path length and vocabulary size (F ( l ) (2) and F ( M ) (2) ) are relevant as they appear at the top of the tree. Table 4.…”
Section: Stylistic Variation Among Booksmentioning
confidence: 99%
“…The application of concepts from physics in textual analysis has increasingly become widespread [1][2][3][4][5][6][7]. The use of entropy concepts is perhaps one of the most known examples of adapting methods from physics in language-based models [8].…”
Abstract. Statistical methods have been widely employed in many practicalnatural language processing applications. More specifically, complex network concepts and methods from dynamical systems theory have been successfully applied to recognize stylistic patterns in written texts. Despite the large number of studies devoted to representing texts with physical models, only a few studies have assessed the relevance of attributes derived from the analysis of stylistic fluctuations. Because fluctuations represent a pivotal factor for characterizing a myriad of real systems, this study focused on the analysis of the properties of stylistic fluctuations in texts via topological analysis of complex networks and intermittency measurements. The results showed that different authors display distinct fluctuation patterns. In particular, it was found that it is possible to identify the authorship of books using the intermittency of specific words. Taken together, the results described here suggest that the patterns found in stylistic fluctuations could be used to analyze other related complex systems. Furthermore, the discovery of novel patterns related to textual stylistic fluctuations indicates that these patterns could be useful to improve the state of the art of many stylistic-based natural language processing tasks.
“…Pearson correlation coefficient |r| between F (X) (j=0) and the most informative measurements found for W = 1300 (see table 3). Because all correlations assume low values, the information conveyed by F (X) (2) differs from the simple average X = F (X) (0) .…”
Section: Stylistic Variation Among Booksmentioning
confidence: 99%
“…To construct the tree, the C4.5 algorithm was employed in subtexts comprising W = 1300 tokens. Note that the second component of the average shortest path length and vocabulary size (F ( l ) (2) and F ( M ) (2) ) are relevant as they appear at the top of the tree. Table 4.…”
Section: Stylistic Variation Among Booksmentioning
confidence: 99%
“…The application of concepts from physics in textual analysis has increasingly become widespread [1][2][3][4][5][6][7]. The use of entropy concepts is perhaps one of the most known examples of adapting methods from physics in language-based models [8].…”
Abstract. Statistical methods have been widely employed in many practicalnatural language processing applications. More specifically, complex network concepts and methods from dynamical systems theory have been successfully applied to recognize stylistic patterns in written texts. Despite the large number of studies devoted to representing texts with physical models, only a few studies have assessed the relevance of attributes derived from the analysis of stylistic fluctuations. Because fluctuations represent a pivotal factor for characterizing a myriad of real systems, this study focused on the analysis of the properties of stylistic fluctuations in texts via topological analysis of complex networks and intermittency measurements. The results showed that different authors display distinct fluctuation patterns. In particular, it was found that it is possible to identify the authorship of books using the intermittency of specific words. Taken together, the results described here suggest that the patterns found in stylistic fluctuations could be used to analyze other related complex systems. Furthermore, the discovery of novel patterns related to textual stylistic fluctuations indicates that these patterns could be useful to improve the state of the art of many stylistic-based natural language processing tasks.
“…Among the reasons behind the popularity of DFA was its ability of detecting fractal character of signals, which was subsequently extended to the multifractal case (the MFDFA method [17]), which also proved very useful if applied to empirical data [18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36], especially owing to its superior reliability if compared to other methods [37]. DCCA was also generalized in order to be applicable to signals with multifractal cross-correlations and the resulting MFDCCA/MFDXA algorithm [38] also attracted some attention [39][40][41][42].…”
The detrended cross-correlation coefficient ρDCCA has recently been proposed to quantify the strength of cross-correlations on different temporal scales in bivariate, non-stationary time series. It is based on the detrended cross-correlation and detrended fluctuation analyses (DCCA and DFA, respectively) and can be viewed as an analogue of the Pearson coefficient in the case of the fluctuation analysis. The coefficient ρDCCA works well in many practical situations but by construction its applicability is limited to detection of whether two signals are generally cross-correlated, without possibility to obtain information on the amplitude of fluctuations that are responsible for those cross-correlations. In order to introduce some related flexibility, here we propose an extension of ρDCCA that exploits the multifractal versions of DFA and DCCA: MFDFA and MFCCA, respectively. The resulting new coefficient ρq not only is able to quantify the strength of correlations, but also it allows one to identify the range of detrended fluctuation amplitudes that are correlated in two signals under study. We show how the coefficient ρq works in practical situations by applying it to stochastic time series representing processes with long memory: autoregressive and multiplicative ones. Such processes are often used to model signals recorded from complex systems and complex physical phenomena like turbulence, so we are convinced that this new measure can successfully be applied in time series analysis. In particular, we present an example of such application to highly complex empirical data from financial markets. The present formulation can straightforwardly be extended to multivariate data in terms of the q-dependent counterpart of the correlation matrices and then to the network representation.
“…We should really distinguish two important movements. First, the 'Econophysics' movement which applies formalisms from statistical mechanics to the social sciences, and championed by Eugene Stanley and others (for instance [1][2][3][4][5][6]). Second, the movement which applies the mathematical apparatus from quantum information to the cognitive and social sciences, and championed by Andrei khrennikov and others (see [7][8][9][10][11][12]).…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.