1For the image classification task, the color histogram is widely used as an important color feature indicating the content of the image. However, the high-resolution color histograms are usually of high dimension and contain much redundant information which does not relate to the image content, while the low-resolution histograms cannot provide adequate discriminative information for image classification. In this paper, a new color feature representation is proposed which not only takes the correlation among neighbouring components of the conventional color histogram into account but removes the redundant information as well. A high-resolution, uniform quantized color histogram is first obtained from the image. Then the redundant bins are removed and some neighbouring bins are combined together to generate a new feature component to maximize the discriminative ability. The mutual information is adopted to evaluate the discriminative power of a specific feature set and an iterative algorithm is performed to derive the histogram quantization and their corresponding feature generation. To illustrate the effectiveness of the proposed feature representation, an application of detecting adult images, i.e., image classification between erotic and benign images, is carried out. Two widely used classification techniques, SVM and Adaboost, are employed as the classifier. Experimental results show the superior performance of our color representation compared with the conventional color histogram in image classification.
It is well known that visual cues of lip movement contain important speech relevant information. This paper presents an automatic lipreading system for small vocabulary speech recognition tasks. Using the lip segmentation and modeling techniques we developed earlier, we obtain a visual feature vector composed of outer and inner mouth features from the lip image sequence for recognition. A spline representation is employed to transform the discrete-time sampled features from the video frames into the continuous domain. The spline coefficients in the same word class are constrained to have similar expression and are estimated from the training data by the EM algorithm. For the multiple-speaker/speaker-independent recognition task, an adaptive multimodel approach is proposed to handle the variations caused by various talking styles. After building the appropriate word models from the spline coefficients, a maximum likelihood classification approach is taken for the recognition. Lip image sequences of English digits from 0 to 9 have been collected for the recognition test. Two widely used classification methods, HMM and RDA, have been adopted for comparison and the results demonstrate that the proposed algorithm deliver the best performance among these methods.Index Terms-Lipreading, visual feature extraction, visual speech recognition.
Speech recognition solely based on visual information such as the lip shape and its movement is referred to as lipreading. This paper presents an automatic lipreading technique for speaker dependent (SD) and speaker independent (SI) speech recognition tasks. Since the visual features are derived according to the frame rate of the video sequence, spline representation is then employed to translate the discrete-time sampled visual features into continuous domain. The spline coefficients in the same word class are constrained to have similar expression and can be estimated from the training data by the EM algorithm. In addition, an adaptive multi-model approach is proposed to overcome the variation caused by different speaking style in speaker-independent recognition task. The experiments are carried out to recognize the ten English digits and an accuracy of 96% for speaker dependent recognition and 88% for speaker independent recognition have been achieved, which shows the superiority of our approach compared with other classifiers investigated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.