In recent years a new type of tradable assets appeared, generically known as cryptocurrencies. Among them, the most widespread is Bitcoin. Given its novelty, this paper investigates some statistical properties of the Bitcoin market. This study compares Bitcoin and standard currencies dynamics and focuses on the analysis of returns at different time scales. We test the presence of long memory in return time series from 2011 to 2017, using transaction data from one Bitcoin platform. We compute the Hurst exponent by means of the Detrended Fluctuation Analysis method, using a sliding window in order to measure long range dependence. We detect that Hurst exponents changes significantly during the first years of existence of Bitcoin, tending to stabilize in recent times. Additionally, multiscale analysis shows a similar behavior of the Hurst exponent, implying a self-similar process.Comment: 17 pages, 6 figures. arXiv admin note: text overlap with arXiv:1605.0670
Broad‐scale spatial patterns in species richness have been widely investigated with spatial statistics tools in the past few years. The primary goal of these investigations has been to understand the ecological and evolutionary processes underlying such patterns. Nevertheless, most of the current (climate) explanations for these patterns actually rely on the geographical range limits of species, so that a better understanding of such processes may be achieved by coupling richness and distribution (niche) models. We analysed the geographical ranges and richness patterns for 115 triatomine species in the Neotropics, modelled as a function of 12 environmental variables expressing alternative hypotheses that have been used to explain richness gradients. These analyses were based on spatial [spatial eigenvector mapping (SEVM)] and non‐spatial ordinary least‐squares multiple regression models. The geographical ranges of species were also individually analysed using a general linear model (GLM). The coefficients of the regression models for richness and distribution were then compared. Spatial analyses revealed that the unique contributions of spatial eigenvectors and environmental variables to richness were, respectively, equal to 24.2% and 12.2%, with high coefficient values for temperature, actual evapotranspiration, and seasonality. Similar results were obtained using a GLM, and the mean GLM coefficients had a relatively high correlation with those obtained with SEVM (r = 0.586; P < 0.05). Our analyses show that the drivers of Neotropical Triatominae richness and of its species ranges show a high correlation, although the differences among the drivers may be important for understanding the emergent properties (historical processes and species‐specific environmental drivers) that explain richness patterns. Moreover, although our analyses identified an important role for temperature and temperature seasonality in explaining both species richness and distributions, other spatially structured environmental variables and historical factors may explain a large part of the variation in diversity patterns.
Addressing the huge amount of data continuously generated is an important challenge in the Machine Learning field. The need to adapt the traditional techniques or create new ones is evident. To do so, distributed technologies have to be used to deal with the significant scalability constraints due to the Big Data context. In many Big Data applications for classification, there are some classes that are highly underrepresented, leading to what is known as the im balanced classification problem. In this scenario, learning algorithms are often biased towards the majority classes, treating minority ones as out liers or noise. Consequently, preprocessing techniques to balance the class distribution were developed. This can be achieved by suppressing majority instances (undersampling) or by creating minority examples (oversampling). Re garding the oversampling methods, one of the most widespread is the SMOTE algorithm, which creates artificial examples according to the neighborhood of each minority class instance. In this work, our objective is to analyze the SMOTE behavior in Big Data as a function of some key aspects such as the oversampling degree, the neighborhood value and, specially, the type of distributed design (local vs. global).
The volume of data in today's applications has meant a change in the way Machine Learning issues are addressed. Indeed, the Big Data scenario involves scalability constraints that can only be achieved through intelligent model design and the use of distributed technologies. In this context, solutions based on the Spark platform have established themselves as a de facto standard. In this contribution, we focus on a very important framework within Big Data Analytics, namely classification with imbalanced datasets. The main characteristic of this problem is that one of the classes is underrepresented, and therefore it is usually more complex to find a model that identifies it correctly. For this reason, it is common to apply preprocessing techniques such as oversampling to balance the distribution of examples in classes. In this work we present SMOTE-BD, a fully scalable preprocessing approach for imbalanced classification in Big Data. It is based on one of the most widespread preprocessing solutions for imbalanced classification, namely the SMOTE algorithm, which creates new synthetic instances according to the neighborhood of each example of the minority class. Our novel development is made to be independent of the number of partitions or processes created to achieve a higher degree of efficiency. Experiments conducted on different standard and Big Data datasets show the quality of the proposed design and implementation.
entre otras, para identificar las investigaciones que han aplicado técnicas de minería de datos, para la extracción y análisis de datos de Twitter en la educación superior; y, (2) destacar las prácticas pedagógicas que han incorporado Twitter y minería de datos para mejorar los procesos educativos. De los 315 artículos obtenidos, fueron seleccionados 65 que cumplieron con los criterios de inclusión. Los principales resultados indican que: (1) las técnicas de minería de datos más utilizadas son predictivas con tareas de clasificación; (2) Twitter se usa principalmente para: (a) determinar percepción estudiantil; (b) compartir información, material y recursos; (c) generar comunicación y participación; (d) fomentar habilidades; y (e) mejorar la expresión oral y el rendimiento académico; (3) Estados Unidos es el país con mayor número de trabajos; sin embargo, en países de Latinoamérica los hallazgos son pocos, por lo que se apertura un campo de investigación en esta región; y (4) los estudios incluyeron modelos, métodos, estrategias, teorías o instrumentos como práctica pedagógica; de modo que no existe un consenso en la forma en que los datos extraídos de Twitter podrían ser incorporados en la educación superior para mejorar los procesos de enseñanza y aprendizaje.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.