In this paper, we propose the discriminative multiple canonical correlation analysis (DMCCA) for multimodal information analysis and fusion. DMCCA is capable of extracting more discriminative characteristics from multimodal information representations. Specifically, it finds the projected directions, which simultaneously maximize the within-class correlation and minimize the between-class correlation, leading to better utilization of the multimodal information. In the process, we analytically demonstrate that the optimally projected dimension by DMCCA can be quite accurately predicted, leading to both superior performance and substantial reduction in computational cost. We further verify that canonical correlation analysis (CCA), multiple canonical correlation analysis (MCCA) and discriminative canonical correlation analysis (DCCA) are special cases of DMCCA, thus establishing a unified framework for canonical correlation analysis. We implement a prototype of DMCCA to demonstrate its performance in handwritten digit recognition and human emotion recognition. Extensive experiments show that DMCCA outperforms the traditional methods of serial fusion, CCA, MCCA, and DCCA.
The objective of multimodal information fusion is to mathematically analyze information carried in different sources and create a new representation which will be more effectively utilized in pattern recognition and other multimedia information processing tasks. In this paper, we introduce a new method for multimodal information fusion and representation based on the Labeled Multiple Canonical Correlation Analysis (LMCCA). By incorporating class label information of the training samples, the proposed LMCCA ensures that the fused features carry discriminative characteristics of the multimodal information representations, and are capable of providing superior recognition performance. We implement a prototype of LMCCA to demonstrate its effectiveness on handwritten digit recognition, face recognition and object recognition utilizing multiple features, bimodal human emotion recognition involving information from both audio and visual domains. The generic nature of LMCCA allows it to take as input features extracted by any means, including those by deep learning (DL) methods. Experimental results show that the proposed method enhanced the performance of both statistical machine learning (SML) methods, and methods based on DL.
The method based on the two-stream networks has achieved great success in video action recognition. However, most existing methods employ the same structure for both spatial and temporal networks, leading to unsatisfied performance. In this paper, we propose a spatiotemporal heterogeneous two-stream network, which employs two different network structures for spatial and temporal information, respectively. Specifically, the Residual network (ResNet) and BN-Inception are utilized as the base networks to present the spatiotemporal characteristics of different human actions. In addition, a segmental architecture is employed to model long-range temporal structure over video sequences to better distinguish the similar actions owning sub-action sharing phenomenon. Moreover, combined with the strategy of data augment, a modified cross-modal pre-training strategy is proposed and applied to the spatiotemporal heterogeneous network to improve the final performance of human actions recognition. The experiments on UCF101 and HMDB51 datasets demonstrate the proposed spatiotemporal heterogeneous two-stream network outperforms the spatiotemporal isomorphic networks and other related methods.INDEX TERMS Action recognition, spatiotemporal heterogeneous, two-stream networks, ResNet, longrange temporal structure, training strategies.
In the rapidly time-varying channel environment, the performance of traditional MIMO-OFDM system is deteriorated due to the intercarrier interference. In this paper, a novel MIMO-OFDM system is proposed, in which the modulation and demodulation of the symbols are implemented by the fractional Fourier transform instead of traditional Fourier transform. Through selecting the optimal order of the fractional Fourier transform, the modulated signals can match the time-varying channel characteristics, which results in a mitigation of the intercarrier interference. Furthermore, an algorithm is presented for selecting the optimal order of fractional Fourier transform, and the impact of system parameters on the optimal order is analyzed. Simulation results show that the proposed system can concentrate the power of desired signal effectively and improve the performance over rapidly time-varying channels with respect to the traditional MIMO-OFDM system.fractional Fourier transform, MIMO-OFDM, intercarrier interference
Edge detection is a fundamental task in many computer vision applications. In this paper, we propose a novel multiscale edge detection approach based on the nonsubsampled contourlet transform (NSCT): a fully shift-invariant, multiscale, and multidirection transform. Indeed, unlike traditional wavelets, contourlets have the ability to fully capture directional and other geometrical features for images with edges. Firstly, compute the NSCT of the input image. Secondly, theK-means clustering algorithm is applied to each level of the NSCT for distinguishing noises from edges. Thirdly, we select the edge point candidates of the input image by identifying the NSCT modulus maximum at each scale. Finally, the edge tracking algorithm from coarser to finer is proposed to improve robustness against spurious responses and accuracy in the location of the edges. Experimental results show that the proposed method achieves better edge detection performance compared with the typical methods. Furthermore, the proposed method also works well for noisy images.
The detection and recognition of moving objects in image sequence images involve many aspects, such as pattern recognition, image processing, and computer vision. The main difficulties of target detection and recognition are complex background interference, local occlusion, real-time recognition, illumination changes, target size type changes, etc. However, it is very difficult to solve these problems in practical applications. This article introduces image pre-processing for the pre-processing of image sequences. Selectively we highlight the visually obvious features that are helpful for target detection in the image, weaken the image background and features that are not related to the target, and improve the quality of the image sequence. A multi-information integrated probability density estimation kernel integrating gray scale, spatial relationship and local standard deviation information is designed, and the multiinformation integrated kernel is used to extract the feature of the moving target. In terms of moving target recognition, Naive Bayes is used as a weak learner. In order to avoid the over-fitting of the classifier caused by high-noise moving image sequence features, the regularized Adaboost recognition model is introduced as a moving target recognition classifier. In order to completely separate the target and the background, we propose a moving target extraction method based on multi-information kernel density estimation, and input relevant target feature description vectors into the regularized Adaboost-based moving target recognition framework. Robust target recognition performance is obtained, and the reliability of target recognition under high noise data is improved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.