We present the application of different nonlinear image deformation models to the task of image recognition. The deformation models are especially suited for local changes as they often occur in the presence of image object variability. We show that, among the discussed models, there is one approach that combines simplicity of implementation, low-computational complexity, and highly competitive performance across various real-world image recognition tasks. We show experimentally that the model performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity. In particular, an error rate of 0.54 percent on the MNIST benchmark is achieved, as well as the lowest reported error rate, specifically 12.6 percent, in the 2005 international ImageCLEF evaluation of medical image categorization.
We evaluate different two-dimensional non-linear deformation models for handwritten character recognition. Starting from a true two-dimensional model, we derive pseudo-two-dimensional and zero-order deformation models. Experiments show that it is most important to include suitable representations of the local image context of each pixel to increase performance. With these methods, we achieve very competitive results across five different tasks, in particular 0.5% error rate on the MNIST task.
Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. We introduce a novel framework which combines the advantages of different well known segmentation methods. An automatically estimated log-linear segment model is used to determine the segmentation of an audio stream in a holistic way by a maximum a posteriori decoding strategy, instead of classifying change points locally. A comparison to other segmentation techniques in terms of speech recognition performance is presented, showing a promising segmentation quality of our approach.
This paper focuses on confidence scores for use in acoustic model adaptation. Frame-based confidence estimates are used in linear transform (CMLLR and MLLR) and MAP adaptation. We show that adaptation approaches with a limited number of free parameters such as linear transform-based approaches are robust in the face of frame labeling errors whereas adaptation approaches with a large number of free parameters such as MAP are sensitive to the quality of the supervision and hence benefit most from use of confidences. Different approaches for using confidence information in adaptation are investigated. This analysis shows that a thresholding approach is effective in that it improves the frame labeling accuracy with little detrimental effect on frame recall. Experimental results show an absolute WER reduction of 2.1% over a CMLLR adapted system on a video transcription task.
We present a writer adaptive training and writer clustering approach for an HMM based Arabic handwriting recognition system to handle different handwriting styles and their variations. Additionally, a writing variant model refinement for specific writing variants is proposed.Current approaches try to compensate the impact of different writing styles during preprocessing and normalization steps.Writer adaptive training with a CMLLR based feature adaptation is used to train writer dependent models. An unsupervised writer clustering with Bayesian information criterion based stopping condition for a CMLLR based feature adaptation during a two-pass decoding process is used to cluster different handwriting styles of unknown test writers.The proposed methods are evaluated on the IFN/ENIT Arabic handwriting database.
We propose the application of two-dimensional distortion models for comparisons of medical images in a distance-based classifier. We extend a simple zero-order distortion model by using local context within the compared image parts. Vertical and horizontal image gradients as well as small sub images are used as local context. Taking into account dependencies within the displacement field of the distortion by using a pseudo two-dimensional hidden Markov model with additional distortion possibilities further improves the error rate. Using the methods presented in this work, the previous best error rate of 8.0% on the used medical data could be considerably reduced by about one third to 5.3%.
This paper describes the RWTH speech recognition system for Arabic. Several design aspects of the system, including cross-adaptation, multiple system design and combination, are analyzed. We summarize the semi-automatic lexicon generation for Arabic using a statistical approach to grapheme-tophoneme conversion and pronunciation statistics. Furthermore, a novel ASR-based audio segmentation algorithm is presented. Finally, we discuss practical approaches for parallelized acoustic training and memory efficient lattice rescoring. Systematic results are reported on recent GALE evaluation corpora.
This paper describes the rapid development of a Polish language speech recognition system. The system development was performed without access to any transcribed acoustic training data. This was achieved through the combined use of cross-language bootstrapping and confidence based unsupervised acoustic model training. A Spanish acoustic model was ported to Polish, through the use of a manually constructed phoneme mapping. This initial model was refined through iterative recognition and retraining of the untranscribed audio data.The system was trained and evaluated on recordings from the European Parliament, and included several state-of-the-art speech recognition techniques in addition to the use of unsupervised model training. Confidence based speaker adaptive training using features space transform adaptation, as well as vocal tract length normalization and maximum likelihood linear regression, was used to refine the acoustic model. Through the combination of the different techniques, good performance was achieved on the domain of parliamentary speeches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.