Cross-lingual Speaker Verification: Evaluation on X-Vector Method

Mandalapu, Hareesh; Elbo, Thomas Møller; Ramachandra, Raghavendra; Busch, Christoph

doi:10.1007/978-3-030-71711-7_18

“…The dependency of speaker recognition on the speaker's language has been observed in the recent works [82]. The mismatch of languages of speech samples in training, enrolling, and testing is a challenging problem in AV biometrics.…”

Section: ) Multi-lingual Speaker Recognitionmentioning

confidence: 98%

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Mandalapu

¹

,

N

²

,

Ramachandra

³

et al. 2021

Self Cite

View full text Add to dashboard Cite

Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and face capture sensors in smartphones, laptops, and tablets has made the advantage of voice and face biometrics more exceptional when compared to other biometrics. For many years, acoustic information alone has been a great success in automatic speaker verification applications. Meantime, the last decade or two has also witnessed a remarkable ascent in face recognition technologies. Nonetheless, in adverse unconstrained environments, neither of these techniques achieves optimal performance. Since audio-visual information carries correlated and complementary information, integrating them into one recognition system can increase the system's performance. The vulnerability of biometrics towards presentation attacks and audio-visual data usage for the detection of such attacks is also a hot topic of research. This paper made a comprehensive survey on existing state-of-the-art audio-visual recognition techniques, publicly available databases for benchmarking, and Presentation Attack Detection (PAD) algorithms. Further, a detailed discussion on challenges and open problems is presented in this field of biometrics.INDEX TERMS Biometrics, audio-visual person recognition, presentation attack detection.

show abstract

“…The dependency of speaker recognition on the speaker's language has been observed in the recent works [83]. The mismatch of languages of speech samples in training, en-rolling, and testing is a challenging problem in AV biometrics.…”

Section: ) Multi-lingual Speaker Recognitionmentioning

confidence: 98%

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Mandalapu,

Reddy,

Ramachandra

et al. 2021

Preprint

Self Cite

1

0

View full text Add to dashboard Cite

Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and face capture sensors in smartphones, laptops, and tablets has made the advantage of voice and face biometrics more exceptional when compared to other biometrics. For many years, acoustic information alone has been a great success in automatic speaker verification applications. Meantime, the last decade or two has also witnessed a remarkable ascent in face recognition technologies. Nonetheless, in adverse unconstrained environments, neither of these techniques achieves optimal performance. Since audio-visual information carries correlated and complementary information, integrating them into one recognition system can increase the system's performance. The vulnerability of biometrics towards presentation attacks and audio-visual data usage for the detection of such attacks is also a hot topic of research. This paper made a comprehensive survey on existing state-of-the-art audio-visual recognition techniques, publicly available databases for benchmarking, and Presentation Attack Detection (PAD) algorithms. Further, a detailed discussion on challenges and open problems is presented in this field of biometrics.

show abstract

“…The degradation of biometric recognition due to language mismatch is presented in some previous works [21], [16], [17]. Our dataset comprises of the same subjects speaking three different languages, therefore, providing scope for inter-language speaker recognition evaluation.…”

Section: ) Inter-language Speaker Recognitionmentioning

confidence: 99%

Multilingual Audio-Visual Smartphone Dataset And Evaluation

Mandalapu¹,

N²,

Ramachandra³

et al. 2021

Preprint

Self Cite

0

View full text Add to dashboard Cite

Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to the usability and also it will be challenging to spoof because of multi-modal nature. In this work, we present an audiovisual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems. These unique characteristics of this dataset will pave the way to implement novel state-of-the-art unimodal or audio-visual speaker recognition systems. We also report the performance of the bench-marked biometric verification systems on our dataset. The robustness of biometric algorithms is evaluated towards multiple dependencies like signal noise, device, language and presentation attacks like replay and synthesized signals with extensive experiments. The obtained results raised many concerns about the generalization properties of state-of-the-art biometrics methods in smartphones.

show abstract

Cross-lingual Speaker Verification: Evaluation on X-Vector Method

Cited by 6 publications

References 16 publications

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

Multilingual Audio-Visual Smartphone Dataset And Evaluation

Contact Info

Product

Resources

About