We present a new approach to deal with visual tracking target tasks. This method uses a convolutional neural network able to rank a set of patches depending on how well the target is framed (centered). To cover the possible interferences our proposal is to feed the network with patches located in the surroundings of the object detected in the previous frame, and with different sizes, thus taking into account eventual changes of scale. In order to train the network, we had to create an ad-hoc large dataset with positive and negative examples of framed objects extracted from the Imagenet detection database. The positive examples were those containing the object in a correct frame, while the negative ones were the incorrectly framed. Finally, we select the most promising patch, using a matching function based on the deep features provided by the well-known AlexNet network. All the training stage of this method is offline, so it is fast and useful for real-time visual tracking. Experimental results show that the method is very competitive with respect to state-of-the-art algorithms, being also very robust against typical interferences during the visual target tracking process.
We proposed a new method of gist feature extraction for building recognition and named the feature extracted by this method as the histogram of oriented gradient based gist (HOG-gist). The proposed method individually computes the normalized histograms of multiorientation gradients for the same image with four different scales. The traditional approach uses the Gabor filters with four angles and four different scales to extract orientation gist feature vectors from an image. Our method, in contrast, uses the normalized histogram of oriented gradient as orientation gist feature vectors of the same image. These HOG-based orientation gist vectors, combined with intensity and color gist feature vectors, are the proposed HOG-gist vectors. In general, the HOG-gist contains four multiorientation histograms (four orientation gist feature vectors), and its texture description ability is stronger than that of the traditional gist using Gabor filters with four angles. Experimental results using Sheffield Buildings Database verify the feasibility and effectiveness of the proposed HOG-gist.
With the advent of biomedical imaging technology, the number of captured and stored biomedical images is rapidly increasing day by day in hospitals, imaging laboratories and biomedical institutions. Therefore, more robust biomedical image analysis technology is needed to meet the requirement of the diagnosis and classification of various kinds of diseases using biomedical images. However, the current biomedical image classification methods and general non-biomedical image classifiers cannot extract more compact biomedical image features or capture the tiny differences between similar images with different types of diseases from the same category. In this paper, we propose a novel fused convolutional neural network to develop a more accurate and highly efficient classifier for biomedical images, which combines shallow layer features and deep layer features from the proposed deep neural network architecture. In the analysis, it was observed that the shallow layers provided more detailed local features, which could distinguish different diseases in the same category, while the deep layers could convey more high-level semantic information used to classify the diseases among the various categories. A detailed comparison of our approach with traditional classification algorithms and popular deep classifiers across several public biomedical image datasets showed the superior performance of our proposed method for biomedical image classification. In addition, we also evaluated the performance of our method in modality classification of medical images using the ImageCLEFmed dataset. Graphical abstract The graphical abstract shows the fused, deep convolutional neural network architecture proposed for biomedical image classification. In the architecture, we can clearly see the feature-fusing process going from shallow layers and the deep layers.
Numerous human actions such as “Phoning,” “PlayingGuitar,” and “RidingHorse” can be inferred by static cue-based approaches even if their motions in video are available considering one single still image may already sufficiently explain a particular action. In this research, we investigate human action recognition in still images and utilize deep ensemble learning to automatically decompose the body pose and perceive its background information. Firstly, we construct an end-to-end NCNN-based model by attaching the nonsequential convolutional neural network (NCNN) module to the top of the pretrained model. The nonsequential network topology of NCNN can separately learn the spatial- and channel-wise features with parallel branches, which helps improve the model performance. Subsequently, in order to further exploit the advantage of the nonsequential topology, we propose an end-to-end deep ensemble learning based on the weight optimization (DELWO) model. It contributes to fusing the deep information derived from multiple models automatically from the data. Finally, we design the deep ensemble learning based on voting strategy (DELVS) model to pool together multiple deep models with weighted coefficients to obtain a better prediction. More importantly, the model complexity can be reduced by lessening the number of trainable parameters, thereby effectively mitigating overfitting issues of the model in small datasets to some extent. We conduct experiments in Li’s action dataset, uncropped and 1.5x cropped Willow action datasets, and the results have validated the effectiveness and robustness of our proposed models in terms of mitigating overfitting issues in small datasets. Finally, we open source our code for the model in GitHub (https://github.com/yxchspring/deep_ensemble_learning) in order to share our model with the community.
In a real world application, we seldom get all images at one time. Considering this case, if a company hired an employee, all his images information needs to be recorded into the system; if we rerun the face recognition algorithm, it will be time consuming. To address this problem, In this paper, firstly, we proposed a novel subspace incremental method called incremental graph regularized nonnegative matrix factorization (IGNMF) algorithm which imposes manifold into incremental nonnegative matrix factorization algorithm (INMF); thus, our new algorithm is able to preserve the geometric structure in the data under incremental study framework; secondly, considering we always get many face images belonging to one person or many different people as a batch, we improved our IGNMF algorithms to Batch-IGNMF algorithms (B-IGNMF), which implements incremental study in batches. Experiments show that (1) the recognition rate of our IGNMF and B-IGNMF algorithms is close to GNMF algorithm while it runs faster than GNMF. (2) The running times of our IGNMF and B-IGNMF algorithms are close to INMF while the recognition rate outperforms INMF. (3) Comparing with other popular NMF-based face recognition incremental algorithms, our IGNMF and B-IGNMF also outperform then both the recognition rate and the running time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.