An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the largest corpus which is suitable for conducting the speech recognition research and building speech recognition systems for Mandarin. The recording procedure, including audio capturing devices and environments are presented in details. The preparation of the related resources, including transcriptions and lexicon are described. The corpus is released with a Kaldi recipe. Experimental results implies that the quality of audio recordings and transcriptions are promising.
Recognition of motor imagery intention is one of the hot current research focuses of brain-computer interface (BCI) studies. It can help patients with physical dyskinesia to convey their movement intentions. In recent years, breakthroughs have been made in the research on recognition of motor imagery task using deep learning, but if the important features related to motor imagery are ignored, it may lead to a decline in the recognition performance of the algorithm. This paper proposes a new deep multi-view feature learning method for the classification task of motor imagery electroencephalogram (EEG) signals. In order to obtain more representative motor imagery features in EEG signals, we introduced a multi-view feature representation based on the characteristics of EEG signals and the differences between different features. Different feature extraction methods were used to respectively extract the time domain, frequency domain, time-frequency domain and spatial features of EEG signals, so as to made them cooperate and complement. Then, the deep restricted Boltzmann machine (RBM) network improved by t-distributed stochastic neighbor embedding(t-SNE) was adopted to learn the multi-view features of EEG signals, so that the algorithm removed the feature redundancy while took into account the global characteristics in the multi-view feature sequence, reduced the dimension of the multi-visual features and enhanced the recognizability of the features. Finally, support vector machine (SVM) was chosen to classify deep multi-view features. Applying our proposed method to the BCI competition IV 2a dataset we obtained excellent classification results. The results show that the deep multi-view feature learning method further improved the classification accuracy of motor imagery tasks.
Various types of knowledge and features have been explored for level set-based segmentation. On the ground, the prior knowledge and carefully-designed features perform well to identify the foregroundbackground contrast, which improves the performance of the segmentation method for complicated and distorted data. However, this is not the case for underwater environments, since the features available on the ground are not suitable for challenging underwater environments. Thus, underwater image segmentation currently lags behind ground-based segmentation. In this paper, novel cues and a suitable model formulation for object segmentation from underwater images are proposed. We consider the special haze effect over underwater images and extract an informative feature (transmission feature) from haze condensation. The saliency feature is also used for underwater object segmentation. Consequently, in our method, the objectbackground difference can be presented by these features on two levels, i.e., the edge-level transmission and region-level saliency features. These two types of features are integrated into a unified level set formulation to propose a solution that handles the challenging issues in underwater object segmentation. The experimental comparisons of our method with other methods comprehensively demonstrate the satisfactory performance of our method.
Recently, many algorithms based on locally graph embedding are proposed for dimensional reduction in nonlinear data. However, these algorithms are not effective when dealing with face images affected by variations in illumination conditions, poses or perspectives and different facial expressions. So, distant data points are not deemphasized efficiently by locally graph embedding algorithms and it may degrade the performance of classification. In order to solve the aforementioned problem, this paper proposes a new efficient dimension reduction method-local graph embedding method based on maximum margin criterion via Fuzzy Set for face recognition. Firstly, the goal of this algorithm is preserved under nearest neighbor premise by constructing the fuzzy intrinsic graph and the fuzzy penalty graph. Secondly, two novel fuzzy Laplacian scatter matrices are calculated using Fuzzy K-Nearest Neighbor (FKNN) in the proposed method. Finally, Maximum Margin Criterion (MMC) is used to avoid the "small size sample" problem. The results of face recognition experiments on the ORL, YALE and AR face databases demonstrate the effectiveness of our proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.