Aghilas Sini scite author profile

Aghilas Sini

4Publications

48Citation Statements Received

78Citation Statements Given

How they've been cited

How they cite others

Affiliations

Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Nancy - Grand-Est research centre, Université de Lorraine

Publications

Order By: Most citations

Audio source localization by optimal control of a mobile robot

Sini

Charpillet

2015

View full text Add to dashboard Cite

We consider the task of audio source localization using a microphone array on a mobile robot. Active localization algorithms have been proposed in the literature that can estimate the 3D position of a source by fusing the measurements taken for different poses of the robot. The robot movements are typically fixed, however, or they obey heuristic strategies, such as turning the head and moving towards the source, which may be suboptimal. In this paper, we propose to control the robot movements so as to locate the source as quickly as possible. We represent the belief about the source position by a discrete grid and we introduce a dynamic programming algorithm to find the optimal robot motion minimizing the entropy of the grid. We report initial results in a real environment.

show abstract

L1-L2 Interference: The Case of Final Devoicing of French Voiced Fricatives in Final Position by German Learners

Ghosh¹,

Fauth²,

Sini³

et al. 2016

View full text Add to dashboard Cite

This work is dealing with a case of L1-L2 interference in language learning. The Germans learning French as a second language frequently produce unvoiced fricatives in word-final position instead of the expected voiced fricatives. We investigated the production of French fricatives for 16 non-native (8 beginner-and 8 advanced-learners) and 8 native speakers, and designed auditory feedback to help them realize the right voicing feature. The productions of all speakers were categorized either as voiced or unvoiced by experts. The same fricatives were also evaluated by non-experts in a perception experiment targeting VCs. We compare the ratings by experts and non-experts with the feature-based analysis. The ratio of locally unvoiced frames in the consonantal segment and also the ratio between consonantal duration and V1 duration were measured. The acoustic cues of neighboring sounds and pitch-based features play a significant role in the voicing judgment. As expected, we found that beginners face more difficulties to produce voiced fricatives than advanced learners. Also, the production becomes easier for the learners, especially for the beginners, if they practice repetition after a native speaker. We use these findings to design and develop feedback via speech analysis/synthesis technique TD-PSOLA using the learner's own voice.

show abstract

Towards confidence measures on fundamental frequency estimations

Deng

Jouvet

Laprie

et al. 2017

View full text Add to dashboard Cite

The fundamental frequency is one of the prosodic parameters, and many algorithms have been developed for estimating the fundamental frequency of speech signals. Most of them provide good results on good quality speech signals, but their performance degrades when dealing with noisy signals. Moreover, although some provide a probability for the voicing decision, none of them indicate how reliable the estimated fundamental frequency is. In this paper, we investigate the computation of a confidence (or reliability) measure on the estimated fundamental frequency values. A neural network based approach is proposed for computing the posterior probability that the estimated fundamental frequency is correct. Experiments are conducted on the PTDB-TUG pitch-tracking database, using three fundamental frequency estimation algorithms.

show abstract

Introducing Prosodic Speaker Identity for a Better Expressive Speech Synthesis Control

Sini¹,

Maguer²,

Lolive³

et al. 2020

View full text Add to dashboard Cite

To have more control over Text-to-Speech (TTS) synthesis and to improve expressivity, it is necessary to disentangle prosodic information carried by the speaker's voice identity from the one belonging to linguistic properties. In this paper, we propose to analyze how information related to speaker voice identity affects a Deep Neural Network (DNN) based multi-speaker speech synthesis model. To do so, we feed the network with a vector encoding speaker information in addition to a set of basic linguistic features. We then compare three main speaker coding configurations: a) simple one-hot vector describing the speaker gender and identifier ; b) an embedding vector extracted from a speaker recognition pre-trained model ; c) a prosodic vector which summarizes information such as melody, intensity, and duration. To measure the impact of the input feature vector, we investigate the representation of the latent space at the output of the first layer of the network. The aim is to have an overview of our data representation and model behavior. Furthermore, we conducted a subjective assessment to validate the result. Results show that the prosodic identity of the speaker is captured by the model and therefore allows the user to control more precisely synthesis.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aghilas Sini

Audio source localization by optimal control of a mobile robot

L1-L2 Interference: The Case of Final Devoicing of French Voiced Fricatives in Final Position by German Learners

Towards confidence measures on fundamental frequency estimations

Introducing Prosodic Speaker Identity for a Better Expressive Speech Synthesis Control

Contact Info

Product

Resources

About