Meriem Bendris scite author profile

Meriem Bendris

3Publications

21Citation Statements Received

44Citation Statements Given

How they've been cited

How they cite others

Affiliations

Aix-Marseille University, Orange (France), Laboratoire d’Informatique Fondamentale de Marseille

Publications

Order By: Most citations

Unsupervised face identification in TV content using audio-visual sources

Bendris

Favre

Charlet

et al. 2013

View full text Add to dashboard Cite

Our goal is to automatically identify faces in TV content without pre-defined dictionary of identities. Most of methods are based on identity detection (from OCR and ASR) and require a propagation strategy based on visual clusterings. In TV content, people appear with many variation making the clustering very difficult. In this case, identifying speakers can be a reliable link to identify faces. In this work, we propose to combine reliable unsupervised face and speaker identification systems through talking-faces detection in order to improve face identification results. First, OCR and ASR results are combined to extract locally the identities. Then, the reliable visual associations are used to propagate those identities locally. The reliable identified faces are used as unsupervised models to identify similar faces. Finally speaker identities are propagated to the faces in case of lip activity detection. Experiments performed on the REPERE database show an improvement of the recall of +5% compared to the baseline, without degrading the precision.

show abstract

Multimodal embedding fusion for robust speaker role recognition in video broadcast

Rouvier

Delecraz

Favre

et al. 2015

View full text Add to dashboard Cite

International audiencePerson role recognition in video broadcasts consists in classifying people into roles such as anchor, journalist, guest, etc. Existing approaches mostly consider one modality, either audio (speaker role recognition) or image (shot role recognition), firstly because of the non-synchrony between both modalities, and secondly because of the lack of a video corpus annotated in both modalities. Deep Neural Networks (DNN) approaches offer the ability to learn simultaneously feature representations (embeddings) and classification functions. This paper presents a multimodal fusion of audio, text and image embeddings spaces for speaker role recognition in asynchronous data. Monomodal embeddings are trained on exogenous data and fine-tuned using a DNN on 70 hours of French Broadcasts corpus for the target task. Experiments on the REPERE corpus show the benefit of the embeddings level fusion compared to the monomodal embeddings systems and to the standard late fusion method

show abstract

Introduction of quality measures in audio-visual identity verification

Bendris

Charlet

Chollet

2009

View full text Add to dashboard Cite

Audiovisual identity verification exploits both image and audio information to improve the performance of the identification system. Unfortunately, both image and audio systems are sensitive to signal quality. In this paper, we propose a method to combine output classifiers based on both image and audio quality measures. We define classes of signal degradation within which we estimate the fusion weights and normalization parameters. Results of experiments on the BANCA database show that fusion using quality measures improves verification performance by 25% compared to the baseline fusion method.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Meriem Bendris

Unsupervised face identification in TV content using audio-visual sources

Multimodal embedding fusion for robust speaker role recognition in video broadcast

Introduction of quality measures in audio-visual identity verification

Contact Info

Product

Resources

About