Zied Mnasri scite author profile

This paper investigates the use of hidden Markov models (HMM) for Modern Standard Arabic speech synthesis. HMM-based speech synthesis systems require a description of each speech unit with a set of contextual features that specifies phonetic, phonological and linguistic aspects. To apply this method to Arabic language, a study of its particularities was conducted to extract suitable contextual features. Two phenomena are highlighted: vowel quantity and gemination. This work focuses on how to model geminated consonants (resp. long vowels), either considering them as fully-fledged phonemes or as the same phonemes as their simple (resp. short) counterparts but with a different duration. Four modelling approaches have been proposed for this purpose. Results of subjective and objective evaluations show that there is no important difference between differentiating modelling units associated to geminated consonants (resp. long vowels) from modelling units associated to simple consonants (resp. short vowels) and merging them as long as gemination and vowel quantity information is included in the set of features.

show abstract

Anomalous sound event detection: A survey of machine learning based methods and applications

Mnasri

Rovetta

Masulli

2021

Multimed Tools Appl

View full text Add to dashboard Cite

Duration modeling using DNN for Arabic speech synthesis

Zangar¹,

Mnasri²,

Colotte³

et al. 2018

View full text Add to dashboard Cite

Duration modeling is a key task for every parametric speech synthesis system. Though such parametric systems have been adapted to many languages, no special attention was paid to explicitly handling Arabic speech characteristics. Actually, in Arabic phoneme duration has a distinctive role, because of consonant gemination and vowel quantity. Therefore, a precise modeling of sound durations is critical. In this paper we compare several modeling of phoneme durations (including duration modeling by HTS and MERLIN toolkits), and we propose a new approach which relies on using a set of models, each one being optimal for a given phoneme class (e.g., simple consonants, geminated consonants, short vowels, and long vowels). An objective evaluation carried out on a set of test sentences shows that the proposed approach leads to a more accurate modeling of the phoneme durations.

show abstract

Audio surveillance of roads using deep learning and autoencoder-based sample weight initialization

Mnasri

Rovetta

Masulli

2020

View full text Add to dashboard Cite

Duration modelling and evaluation for Arabic statistical parametric speech synthesis

Zangar

Mnasri

Colotte

et al. 2020

Multimed Tools Appl

View full text Add to dashboard Cite

Sound duration is responsible for rhythm and speech rate. Furthermore, in some languages phoneme length is an important phonetic and prosodic factor. For example, in Arabic, gemination and vowel quantity are two important characteristics of the language. Therefore, accurate duration modelling is crucial for Arabic TTS systems. This paper is interested in improving the modelling of phone duration for Arabic statistical parametric speech synthesis using DNN-based models. In fact, since a few years, DNN have been frequently used for parametric speech synthesis, instead of HMM. Therefore, several variants of DNN-based duration models for Arabic are investigated. The novelty consists in training a specific DNN model for each class of sounds, i.e. short vowels, long vowels, simple consonants and geminated consonants. The main idea behind this choice is the improvement that we already achieved in the quality of Arabic parametric speech synthesis by the introduction of two specific features of Arabic, i.e. gemination and vowel quantity into the standard HTS feature set. Both objective and subjective evaluations show that using a specific model for each class of sounds leads to a more accurate modelling of the phone duration in Arabic parametric speech synthesis, outperforming the state-of-the-art duration modelling systems.

show abstract

Detection of Hazardous Road Events From Audio Streams: An Ensemble Outlier Detection Approach

Rovetta

Mnasri²,

Masulli

2020

View full text Add to dashboard Cite

DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis

Ali

Mnasri

Lachiri

2020

Int J Speech Technol

View full text Add to dashboard Cite

High quality Arabic text-to-speech synthesis using unit selection

Abdelmalek

Mnasri

2016

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zied Mnasri

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Anomalous sound event detection: A survey of machine learning based methods and applications

Duration modeling using DNN for Arabic speech synthesis

Audio surveillance of roads using deep learning and autoencoder-based sample weight initialization

Duration modelling and evaluation for Arabic statistical parametric speech synthesis

Detection of Hazardous Road Events From Audio Streams: An Ensemble Outlier Detection Approach

DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis

High quality Arabic text-to-speech synthesis using unit selection

Contact Info

Product

Resources

About