Convolutional neural networks (CNN) are widely used in computer vision and medical image analysis as the state-of-the-art technique. In CNN, pooling layers are included mainly for downsampling the feature maps by aggregating features from local regions. Pooling can help CNN to learn invariant features and reduce computational complexity. Although the
max
and the
average
pooling are the widely used ones, various other pooling techniques are also proposed for different purposes, which include techniques to reduce overfitting, to capture higher-order information such as correlation between features, to capture spatial or structural information, etc. As not all of these pooling techniques are well-explored for medical image analysis, this paper provides a comprehensive review of various pooling techniques proposed in the literature of computer vision and medical image analysis. In addition, an extensive set of experiments are conducted to compare a selected set of pooling techniques on two different medical image classification problems, namely HEp-2 cells and diabetic retinopathy image classification. Experiments suggest that the most appropriate pooling mechanism for a particular classification task is related to the scale of the class-specific features with respect to the image size. As this is the first work focusing on pooling techniques for the application of medical image analysis, we believe that this review and the comparative study will provide a guideline to the choice of pooling mechanisms for various medical image analysis tasks. In addition, by carefully choosing the pooling operations with the standard ResNet architecture, we show new state-of-the-art results on both HEp-2 cells and diabetic retinopathy image datasets.
The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resourceallocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods.
Frequencies of occurrence of low-level image features is the representation of choice in the design of state-of-theart visual object recognition systems. A crucial step in this process is the construction of a codebook of visual features, which is usually done by cluster analysis of a large number of low-level image features detected as interest points. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. Here we extend our recent work on constructing a one-pass discriminant codebook design procedure inspired by the resource allocating network model from the artificial neural networks literature. Unlike clustering, this approach retains data spread out more widely in the input space, thereby including rare low-level features in the codebook. It simultaneously achieves increased discrimination and a drastic reduction in the computational needs. We illustrate some properties of our method and compare it to a closely related approach.
Single object tracking is a well-known and challenging research topic in computer vision. Over the last two decades, numerous researchers have proposed various algorithms to solve this problem and achieved promising results. Recently, Transformer-based tracking approaches have ushered in a new era in single object tracking due to their superior tracking robustness. Although several survey studies have been conducted to analyze the performance of trackers, there is a need for another survey study after the introduction of Transformers in single object tracking. In this survey, we aim to analyze the literature and performances of Transformer tracking approaches. Therefore, we conduct an in-depth literature analysis of Transformer tracking approaches and evaluate their tracking robustness and computational efficiency on challenging benchmark datasets. In addition, we have measured their performances on different tracking scenarios to find their strength and weaknesses. Our survey provides insights into the underlying principles of Transformer tracking approaches, the challenges they face, and their future directions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.