Arindam Jati scite author profile

Abstract-The paper proposes a robust approach to automatic segmentation of leukocyte"s nucleus from microscopic blood smear images under normal as well as noisy environment by employing a new exponential intuitionistic fuzzy divergence based thresholding technique. The algorithm minimizes the divergence between the actual image and the ideally thresholded image to search for the final threshold. A new divergence formula based on exponential intuitionistic fuzzy entropy has been proposed. Further, to increase its noise handling capacity, a neighborhood-based membership function for the image pixels has been designed. The proposed scheme has been applied on 110 normal and 54 leukemia (chronic myelogenous leukemia) affected blood samples. The nucleus segmentation results have been validated by three expert haematologists. The algorithm achieves an average segmentation accuracy of 98.52% in noise-free environment. It beats the competitor algorithms in terms of several other metrics. The proposed scheme with neighborhood based membership function outperforms the competitor algorithms in terms of segmentation accuracy under noisy environment. It achieves 93.90% and 94.93% accuracies for Speckle and Gaussian noises respectively. The average area under the ROC curves comes out to be 0.9514 in noisy conditions, which proves the robustness of the proposed algorithm.

show abstract

Neural Predictive Coding Using Convolutional Neural Networks Toward Unsupervised Learning of Speaker Characteristics

Jati

Georgiou

2019

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Learning speaker-specific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speaker-specific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain many non-speech events and multi-speaker audio streams. The NPC framework exploits the proposed short-term active-speaker stationarity hypothesis which assumes two temporally-close short speech segments belong to the same speaker, and thus a common representation that can encode the commonalities of both the segments, should capture the vocal characteristics of that speaker. We train a convolutional deep siamese network to produce "speaker embeddings" by learning to separate 'same' vs 'different' speaker pairs which are generated from an unlabeled data of audio streams. Two sets of experiments are done in different scenarios to evaluate the strength of NPC embeddings and compare with state-of-the-art in-domain supervised methods. First, two speaker identification experiments with different context lengths are performed in a scenario with comparatively limited within-speaker channel variability. NPC embeddings are found to perform the best at short duration experiment, and they provide complementary information to i-vectors for full utterance experiments. Second, a large scale speaker verification task having a wide range of within-speaker channel variability is adopted as an upper-bound experiment where comparisons are drawn with in-domain supervised methods.

show abstract

Robust Speaker Recognition Using Unsupervised Adversarial Invariance

Peri

Pal

Jati

et al. 2020

View full text Add to dashboard Cite

In this paper, we address the problem of speaker recognition in challenging acoustic conditions using a novel method to extract robust speaker-discriminative speech representations. We adopt a recently proposed unsupervised adversarial invariance architecture to train a network that maps speaker embeddings extracted using a pretrained model onto two lower dimensional embedding spaces. The embedding spaces are learnt to disentangle speaker-discriminative information from all other information present in the audio recordings, without supervision about the acoustic conditions. We analyze the robustness of the proposed embeddings to various sources of variability present in the signal for speaker verification and unsupervised clustering tasks on a large-scale speaker recognition corpus. Our analyses show that the proposed system substantially outperforms the baseline in a variety of challenging acoustic scenarios. Furthermore, for the task of speaker diarization on a real-world meeting corpus, our system shows a relative improvement of 36% in the diarization error rate compared to the state-of-the-art baseline.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Arindam Jati

Adversarial attack and defense strategies for deep speaker recognition systems

Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

Automatic leukocyte nucleus segmentation by intuitionistic fuzzy divergence based thresholding

Neural Predictive Coding Using Convolutional Neural Networks Toward Unsupervised Learning of Speaker Characteristics

Robust Speaker Recognition Using Unsupervised Adversarial Invariance

Contact Info

Product

Resources

About