Abstract. Self-organizing maps (SOM) have been recognized as a powerful tool in data exploratoration, especially for the tasks of clustering on high dimensional data. However, clustering on categorical data is still a challenge for SOM. This paper aims to extend standard SOM to handle feature values of categorical type. A batch SOM algorithm (NCSOM) is presented concerning the dissimilarity measure and update method of map evolution for both numeric and categorical features simultaneously.
Matrix and data manipulation programming languages are an essential tool for data analysts. However, these languages are often unstructured and lack modularity mechanisms. This article presents a knowledge discovery approach for studying manifestations of the lack of modularity support in that sort of languages. The study is focused on Matlab, as a well‐established representative of those languages. We present a technique for the automatic detection and quantification of concerns in Matlab and their exploration in a code base. The Ubiquitous Self Organizing Map (UbiSOM) is used to perform exploratory data analysis over concerns detected in a, possibly changing, repository of Matlab files. The UbiSOM is quite effective in detecting patterns of co‐occurrence of multiple concerns. To illustrate the technique, a repository comprising over 35,000 Matlab files is analysed. The results show that the use of Token Density metrics in conjunction with UbiSOM enables the detection of patterns of co‐occurrence of multiple concerns in m‐files.
Many children with speech sound disorders cannot pronounce the sibilant consonants correctly. We have developed a serious game, which is controlled by the children's voices in real time, with the purpose of helping children on practicing the production of European Portuguese (EP) sibilant consonants. For this, the game uses a sibilant consonant classifier. Since the game does not require any type of adult supervision, children can practice producing these sounds more often, which may lead to faster improvements of their speech. Recently, the use of deep neural networks has given considerable improvements in the classification of a variety of use cases, from image classification to speech and language processing. Here, we propose to use deep convolutional neural networks to classify sibilant phonemes of EP in our serious game for speech and language therapy. We compared the performance of several different artificial neural networks that used Mel frequency cepstral coefficients or log Mel filterbanks. Our best deep learning model achieves classification scores of 95.48% using a 2D convolutional model with log Mel filterbanks as input features. Such results are then further improved for specific classes with simple binary classifiers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.