Wilmer Lobato scite author profile

Resumo-Este trabalho apresenta um conformador de feixe robusto para aplicações em aparelhos auditivos biauriculares. É utilizado o método de otimização de desempenho do pior caso para obter aumento na robustez do conformador de mínima variância com resposta sem distorção em relação a erros em seus parâmetros. Simulações computacionais indicam um aumento na qualidade da fala em até 1,1 MOS-WPESQ e no conforto acústico em até 6,2 dB em termos de razão sinal-ruído (SNR). O método proposto é especialmente efetivo para SNRs de entrada entre 5 dB e 15 dB. Palavras-Chave-Aparelho auditivo, conformação de feixe, otimização de desempenho do pior caso, diferença de nível interaural.

show abstract

Performance Comparison of TTS Models for Brazilian Portuguese to Establish a Baseline

Lobato¹,

Farias²,

Cruz³

et al. 2023

View full text Add to dashboard Cite

This paper compares the performance of three text-to-speech (TTS) models released from June 2021 to January 2022 in order to establish a baseline for Brazilian Portuguese. Those models were trained using dataset for Brazilian Portuguese. The experimental setup considers tts-portuguese dataset to fine-tune t he f ollowing T TS m odels: V ITS end-to-end model; glowtts and gradtts acoustic models both using hifigan vocoder. Performance metrics are arranged into objective and subjective metrics. As subjective metrics, the naturalness and intelligibility are measured based on the mean opinion score (MOS). Results shows that gradtts+hifigan model achieved naturalness of 4.07 MOS, close to performance of current commercial models.

show abstract

Performance Comparison of TTS Models for Brazilian Portuguese to Establish a Baseline

Lobato¹,

Farias²,

Castañeda³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper compares the performance of three text-to-speech (TTS) models released from June 2021 to January 2022 in order to establish a baseline for Brazilian Portuguese. Those models were trained using dataset for Brazilian Portuguese. The experimental setup considers tts-portuguese dataset to fine-tune the following TTS models: VITS end-to-end model; glowtts and gradtts acoustic models both using hifi-gan vocoder. Performance metrics are arranged into objective and subjective metrics. As subjective metrics, the naturalness and intelligibility are measured based on the mean opinion score (MOS). Results shows that gradtts+hifigan model achieved naturalness of 4.07 MOS, close to performance of current commercial models.

show abstract

Bilingual Asr Model With Language Identification for Brazilian Portuguese and South-American Spanish

Farias¹,

Lobato²,

Castañeda³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper documents the development of a special case of multilingual Automatic Speech Recognition model, specifically tailored to attend two languages spoken by the majority of Latin America, Portuguese and Spanish. The bilingual model combines Language Identification and Speech Recognition developed with the Wav2Vec2.0 architecture and trained on several open and private speech datasets. In this model, the feature encoder is trained jointly for all tasks and different context encoders are trained for each task. The model is evaluated separately on two tasks: language identification and speech recognition. The results indicate that this model achieves good performance on speech recognition and average performance on language identification, training on a low quantity of speech material. The average accuracy of the language identification module on the MLS dataset is 66.75%. The average Word Error Rate in the same scenario is 13.89%, which is better than average 22.58% achieved by the commercial speech recognizer developed by Google.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wilmer Lobato

Worst-Case-Optimization Robust-MVDR Beamformer for Stereo Noise Reduction in Hearing Aids

Conformador de feixe robusto MVDR baseado na otimização de desempenho do pior caso para aparelhos auditivos binaurais

Performance Comparison of TTS Models for Brazilian Portuguese to Establish a Baseline

Performance Comparison of TTS Models for Brazilian Portuguese to Establish a Baseline

Bilingual Asr Model With Language Identification for Brazilian Portuguese and South-American Spanish

Contact Info

Product

Resources

About