Secondary metabolites (SMs) of <i>Isaria cicadae</i> and <i>Isaria tenuipes</i>

Most state-of-the-art Deep Learning (DL) approaches for speaker recognition work on a short utterance level. Given the speech signal, these algorithms extract a sequence of speaker embeddings from short segments and those are averaged to obtain an utterance level speaker representation. In this work we propose the use of an attention mechanism to obtain a discriminative speaker embedding given non fixed length speech utterances. Our system is based on a Convolutional Neural Network (CNN) that encodes short-term speaker features from the spectrogram and a self multi-head attention model that maps these representations into a long-term speaker embedding. The attention model that we propose produces multiple alignments from different subsegments of the CNN encoded states over the sequence. Hence this mechanism works as a pooling layer which decides the most discriminative features over the sequence to obtain an utterance level representation. We have tested this approach for the verification task for the VoxCeleb1 dataset. The results show that self multi-head attention outperforms both temporal and statistical pooling methods with a 18% of relative EER. Obtained results show a 58% relative improvement in EER compared to i-vector+PLDA.

show abstract

Self-Attention Encoding and Pooling for Speaker Recognition

Safari

India

Hernando

2020

View full text Add to dashboard Cite

From Features to Speaker Vectors by means of Restricted Boltzmann Machine Adaptation

Safari¹,

Ghahabi²,

Hernando³

2016

View full text Add to dashboard Cite

Restricted Boltzmann Machines (RBMs) have shown success in different stages of speaker recognition systems. In this paper, we propose a novel framework to produce a vector-based representation for each speaker, which will be referred to as RBMvector. This new approach maps the speaker spectral features to a single fixed-dimensional vector carrying speaker-specific information. In this work, a global model, referred to as Universal RBM (URBM), is trained taking advantage of RBM unsupervised learning capabilities. Then, this URBM is adapted to the data of each speaker in the development, enrolment and evaluation datasets. The network connection weights of the adapted RBMs are further concatenated and subject to a whitening with dimension reduction stage to build the speaker vectors. The evaluation is performed on the core test condition of the NIST SRE 2006 database, and it is shown that RBM-vectors achieve 15% relative improvement in terms of EER compared to i-vectors using cosine scoring. The score fusion with i-vector attains more than 24% relative improvement. The interest of this result for score fusion yields on the fact that both vectors are produced in an unsupervised fashion and can be used instead of i-vector/PLDA approach, when no data label is available. Results obtained for RBM-vector/PLDA framework is comparable with the ones from i-vector/PLDA. Their score fusion achieves 14% relative improvement compared to i-vector/PLDA.

show abstract

Self Multi-Head Attention for Speaker Recognition

India¹,

Safari²,

Hernando³

2019

Preprint

View full text Add to dashboard Cite

Feature classification by means of deep belief networks for speaker recognition

Safari

Ghahabi

Hernando

2015

View full text Add to dashboard Cite

In this paper, we propose to discriminatively model target and impostor spectral features using Deep Belief Networks (DBNs) for speaker recognition. In the feature level, the number of impostor samples is considerably large compared to previous works based on i-vectors. Therefore, those i-vector based impostor selection algorithms are not computationally practical. On the other hand, the number of samples for each target speaker is different from one speaker to another which makes the training process more difficult. In this work, we take advantage of DBN unsupervised learning to train a global model, which will be referred to as Universal DBN (UDBN). Then we adapt this UDBN to the data of each target speaker. The evaluation is performed on the core test condition of the NIST SRE 2006 database and it is shown that the proposed architecture achieves more than 8% relative improvement in comparison to the conventional Multilayer Perceptron (MLP).

show abstract

Double Multi-Head Attention for Speaker Verification

India

Safari

Hernando

2021

View full text Add to dashboard Cite

Most state-of-the-art Deep Learning systems for text-independent speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. In this paper we present Double Multi-Head Attention (MHA) pooling, which extends our previous approach based on Self MHA. An additional self attention layer is added to the pooling layer that summarizes the context vectors produced by MHA into a unique speaker representation. This method enhances the pooling mechanism by giving weights to the information captured for each head and it results in creating more discriminative speaker embeddings. We have evaluated our approach with the VoxCeleb2 dataset. Our results show 6.09% and 5.23% relative improvement in terms of EER compared to Self Attention pooling and Self MHA, respectively. According to the obtained results, Double MHA has shown to be an excellent approach to efficiently select the most relevant features captured by the CNN-based front-ends from the speech signal.

show abstract

ML-assisted QoT estimation: a dataset collection and data visualization for dataset quality evaluation

Bergk¹,

Shariati²,

Safari³

et al. 2021

J. Opt. Commun. Netw.

View full text Add to dashboard Cite

Machine learning (ML)-assisted solutions for quality of transmission (QoT) estimation or classification have received significant attention in recent years. However, due to the unavailability of large and well-structured datasets, individual research groups need to create and use their own datasets for validating their proposed solutions. Therefore, the reported results (obtained using different datasets) are difficult to reproduce and hardly comparable. Regardless of this limitation, the unavailability of a technique to be followed by different research groups for the explainability of the dataset makes it even harder to validate the developed ML-assisted solutions across different papers. In this work, we present a publicly available dataset collection to open the problem of data-driven QoT estimation to the ML community. The dataset collection allows various solutions presented by different research groups to be compared. Furthermore, we present techniques to visualize and evaluate datasets for QoT estimation. The presented visualizations can also deliver deep insight into the error analysis of ML models. We apply these new methods to evaluate an artificial neural network on different datasets. The results show the relevance of the presented visualizations for comparing different approaches and different datasets. The proposed methods enable the comparison and validation of different ML-based solutions and published datasets.

show abstract

Machine and deep learning approaches to localization and range estimation of underwater acoustic sources

Houégnigan

Safari

Nadeu

et al. 2017

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pooyan Safari

Self Multi-Head Attention for Speaker Recognition

Self-Attention Encoding and Pooling for Speaker Recognition

From Features to Speaker Vectors by means of Restricted Boltzmann Machine Adaptation

Self Multi-Head Attention for Speaker Recognition

Feature classification by means of deep belief networks for speaker recognition

Double Multi-Head Attention for Speaker Verification

ML-assisted QoT estimation: a dataset collection and data visualization for dataset quality evaluation

Machine and deep learning approaches to localization and range estimation of underwater acoustic sources

Contact Info

Product

Resources

About