In this paper, we present a novel descriptor for human action recognition, called Motion of Oriented Magnitudes Patterns (MOMP), which considers the relationships between the local gradient distributions of neighboring patches coming from successive frames in video. The proposed descriptor also characterizes the information changing across different orientations, is therefore very discriminative and robust. The major advantages of MOMP are its very fast computation time and simple implementation. Subsequently, our features are combined with an effective coding scheme VLAD (Vector of locally aggregated descriptors) in the feature representation step, and a SVM (Support Vector Machine) classifier in order to better represent and classify the actions. By experimenting on several common benchmarks, we obtain the state-of-the-art results on the KTH dataset as well as the performance comparable to the literature on the UCF Sport dataset.
Here, the authors introduce a novel system which incorporates the discriminative motion of oriented magnitude patterns (MOMP) descriptor into simple yet efficient techniques. The authors' descriptor both investigates the relations of the local gradient distributions in neighbours among consecutive image sequences and characterises information changing across different orientations. The proposed system has two main contributions: (i) the authors adopt feature post-processing principal component analysis followed by vector of locally aggregated descriptors encoding to de-correlate MOMP descriptor and reduce the dimension in order to speed up the algorithm; (ii) then the authors include the feature selection (i.e. statistical dependency, mutual information, and minimal redundancy maximal relevance) to find out the best feature subset to improve the performance and decrease the computational expense in classification through support vector machine techniques. Experiment results on four data sets, Weizmann (98.4%), KTH (96.3%), UCF Sport (82.0%), and HMDB51 (31.5%), prove the efficiency of the authors' algorithm.
In this paper, we propose a Handcrafted NormalizedConvolution Network (NmzNet) for efficient texture classification. NmzNet is implemented by a three-layer normalized convolution network, which computes successive normalized convolution with a predefined filter bank (Gabor filter bank) and modulus non-linearities. Coefficients from different layers are aggregated by Fisher Vector aggregation to form the final discriminative features. The results of experimental evaluation on three texture datasets UIUC, KTH-TIPS-2a, and KTH-TIPS-2b indicate that our proposed approach achieves the good classification rate compared with other handcrafted methods. The results additionally indicate that only a marginal difference exists between the best classification rate of recent frontiers CNN and that of the proposed method on the experimented datasets.
In this paper, we propose a combined feature approach which takes full advantages of local structure information and the more global one for improving texture image classification results. In this way, Local Binary Pattern is used for extracting local features, whilst the Scattering Transform feature plays the role of a global descriptor. Intensive experiments conducted on many texture benchmarks such as ALOT, CUReT, KTH-TIPS2-a, KTH-TIPS2b, and OUTEX show that the combined method outweigh each one which stands alone in term of classification accuracy. Also, our method outperforms many others, whilst it is comparable to state of the art on the experimented datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.