Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This article presents a knowledge discovery approach for studying manifestations of the lack of modularity support in that sort of languages. The study is focused on Matlab, as a well‐established representative of those languages. We present a technique for the automatic detection and quantification of concerns in Matlab and their exploration in a code base. The Ubiquitous Self Organizing Map (UbiSOM) is used to perform exploratory data analysis over concerns detected in a, possibly changing, repository of Matlab files. The UbiSOM is quite effective in detecting patterns of co‐occurrence of multiple concerns. To illustrate the technique, a repository comprising over 35,000 Matlab files is analysed. The results show that the use of Token Density metrics in conjunction with UbiSOM enables the detection of patterns of co‐occurrence of multiple concerns in m‐files.
IntroductionAt present, all kinds of stream data processing based on instantaneous data have become critical issues of Internet, Internet of Things (ubiquitous computing), social networking and other technologies. The massive amounts of data being generated in all these environments push the need for algorithms that can extract knowledge in a readily manner.Within this increasingly important field of research the application of artificial neural networks to such task remains a fairly unexplored path. The self-organizing map (SOM) [1] is an unsupervised neural-network algorithm with topology preservation. The SOM has been applied extensively within fields ranging from engineering sciences to medicine, biology, and economics [2] over the years. The powerful visualization techniques for SOM models result from the useful and unique feature of SOM for detection of emergent complex cluster structures and non-linear relationships in the feature space [3]. The SOM can be visualized as a sheet-like neural network array, whose neurons become specifically tuned to various input vectors (examples) in an orderly fashion. For instance, the SOM and K-means both represent data in a similar way through prototypes of data, i.e., centroids in K-means and neuron weights in SOM, and their relation
AbstractThe Internet of things promises a continuous flow of data where traditional database and data-mining methods cannot be applied. This paper presents improvements on the Ubiquitous Self-Organized Map (UbiSOM), a novel variant of the well-known Self-Organized Map (SOM), tailored for streaming environments. This approach allows ambient intelligence solutions using multidimensional clustering over a continuous data stream to provide continuous exploratory data analysis. The average quantization error and average neuron utility over time are proposed and used to estimating the learning parameters, allowing the model to retain an indefinite plasticity and to cope with changes within a multidimensional data stream. We perform parameter sensitivity analysis and our experiments show that UbiSOM outperforms existing proposals in continuously modeling possibly non-stationary data streams, converging faster to stable models when the underlying distribution is stationary and reacting accordingly to the nature of the change in continuous real world data streams. which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. and different usages has already been studied [4]. However, it is the topological ordering of these prototypes in large SOM networks that allows the application of exploratory visualization techniques.This paper is an extended version of work published in [5], introducing a novel variant of SOM, called the ubiquitous self-organizing map (UbiSOM), specially tailored for streaming and big data. We extend our previous work by improving the overall algorithm...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.