Speaker gender as a biometric feature plays an important role in numerous voice-based services. In this work we perform an accuracy analysis of a gender recognition system in different acoustical environments (indoor and outdoor auditory scenes). At the evaluation stage, each sentence has been mixed with several types of background noise using various signalto-noise ratio levels. Then a voiced parts of speech have been extracted and parametrized using features based on filter banks and vocal-tract properties. The obtained feature trajectories have been non-linearly smoothed in order to minimize the influence of adverse conditions on the spoken sentences. The observed accuracy is acceptable for voice-based tasks where the gender information can improve their performance.
The analysis of three sets of feature vectors used in speaker identification (ID) systems for speech signals received in encoding-decoding process with AMR, SPEEX and MELP coders has been presented. We have analyzed feature sets for various speech coding bit rates using SVM-based speaker ID system. The results were compared with identification accuracy obtained with vectors where fundamental frequency was an additional feature. Performed experiments show that such feature contributes better identification accuracy for coded speech than uncoded one in most cases.
Abstract-The study is aimed to investigate the properties of auditory-based features for audio change point detection process. In the performed analysis, two popular techniques have been used: a metric-based approach and the ∆BIC scheme. The efficiency of the change point detection process depends on the type and size of the feature space. Therefore, we have compared two auditory-based feature sets (MFCC and GTEAD) in both change point detection schemes. We have proposed a new technique based on multiscale analysis to determine the content change in the audio data. The comparison of the two typical change point detection techniques with two different feature spaces has been performed on the set of acoustical scenes with single change point. As the results show, the accuracy of the detected positions depends on the feature type, feature space dimensionality, detection technique and the type of audio data. In case of the ∆BIC approach, the better accuracy has been obtained for MFCC feature space in the most cases. However, the change point detection with this feature results in a lower detection ratio in comparison to the GTEAD feature. Using the same criteria as for ∆BIC, the proposed multiscale metric-based technique has been executed. In such case, the use of the GTEAD feature space has led to better accuracy. We have shown that the proposed multiscale change point detection scheme is competitive to the ∆BIC scheme with the MFCC feature space.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.