John S. Mason scite author profile

Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces. Most developments in speech-based automatic recognition have relied on acoustic speech as the sole input signal, disregarding its visual counterpart. However, recognition based on acoustic speech alone can be afflicted with deficiencies that preclude its use in many real-world applications, particularly under adverse conditions. The combination of auditory and visual modalities promises higher recognition accuracy and robustness than can be obtained with a single modality. Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems. This paper reviews the components of bimodal recognizers, discusses the accuracy of bimodal recognition, and highlights some outstanding research issues as well as possible application domains.

show abstract

A comparative assessment of three approaches to pixel-level human skin-detection

Brand

Mason²

157

View full text Add to dashboard Cite

This paper assesses the merits of three diflerent approaches t o pixel-level h u m a n skin detection. T h e basisfor the 3 approaches has been reported recently in the literature. T h e first two approaches [1, 21 use simple ratios and colour space transforms respectively, whereas the third is a numerically eficient approach based o n a 3-D RGB probability map, first implemented by Rehg [3]. T h e Bayesian probabilities are made possible t o compute only with the availability of a large appropriately labeled database. Over 12,000 images f r o m the Compaq skin and nonskin databases [4] are used t o quantitatively assess the three approaches. Thresholds are determined empiricall y t o detect 95% of all skin-associated pixels and assessment is then made in terms of the percentage of non-skin pixels incorrectly accepted. T h e lowest of these false acceptance rates is found t o be about 20% given by the 3-D probability map.

show abstract

Comparative Analysis and Fusion of Spatiotemporal Information for Footstep Recognition

Vera-Rodríguez

Mason

Fiérrez

et al. 2013

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Esta es la versión de autor del artículo publicado en: This is an author produced version of a paper published in: Abstract-Footstep recognition is a relatively new biometric, which aims to discriminate persons using walking characteristics extracted from floor-based sensors. This paper reports for the first time a comparative assessment of the spatio-temporal information contained in the footstep signals for person recognition. Experiments are carried out on the largest footstep database collected to date, with almost 20,000 valid footstep signals and more than 120 persons. Results show very similar performance for both spatial and temporal approaches (5% to 15% EER depending on the experimental setup), and a significant improvement is achieved for their fusion (2.5% to 10% EER). The assessment protocol is focused on the influence of the quantity of data used in the reference models, which serves to simulate conditions of different potential applications such as smart homes or security access scenarios.

show abstract

Robust voice activity detection using cepstral features

Haigh

Mason

141

View full text Add to dashboard Cite

On the Results of the First Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation

Marcel

McCool

Matějka

et al. 2010

View full text Add to dashboard Cite

International audienceThis paper evaluates the performance of face and speaker verification techniques in the context of a mobile environment. The mobile environment was chosen as it provides a realistic and challenging test-bed for biometric person verification techniques to operate. For instance the audio environment is quite noisy and there is limited control over the illumination conditions and the pose of the subject for the video. To conduct this evaluation, a part of a database captured during the " Mobile Biometry " (MOBIO) European Project was used. In total there were nine participants to the evaluation who submitted a face verification system and five participants who submitted speaker verification systems. The results have shown that the best performing face and speaker verification systems obtained the same level of performance, respectively 10.9% and 10.6% of HTER

show abstract

State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software

Fauve

Matrouf

Scheffer

et al. 2007

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Optimisation of neural models for speaker identification

Oglesby

Mason

View full text Add to dashboard Cite

A quantitative assessment of the relative speaker discriminating properties of phonemes

Eatock¹,

Mason²

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John S. Mason

A review of speech-based bimodal recognition

A comparative assessment of three approaches to pixel-level human skin-detection

Comparative Analysis and Fusion of Spatiotemporal Information for Footstep Recognition

Robust voice activity detection using cepstral features

On the Results of the First Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation

State-of-the-Art Performance in Text-Independent Speaker Verification Through Open-Source Software

Optimisation of neural models for speaker identification

A quantitative assessment of the relative speaker discriminating properties of phonemes

Contact Info

Product

Resources

About