Phonetic segmentation plays a key role in developing various speech applications. In this work, we propose to use various features for automatic phonetic segmentation task for forced Viterbi alignment and compare their effectiveness. We propose to use novel multiscale fractal dimension-based features concatenated with MelFrequency Cepstral Coefficients (MFCC). The novel features are expected to capture additional nonlinearities in speech production which should improve the performance of segmentation task. However, to evaluate effectiveness of these segmentation algorithms, we require manual accurate phoneme-level labeled data which is not available for low resource languages such as Gujarati (a low resource language and one of the official languages of India). In order to measure effectiveness of various segmentation algorithms,
HMM-based speech synthesis system (HTS) for Gujarati have been built. From the subjective and objective evaluations, it is observed that FD-based features for segmentation work moderately better than other state-ofthe-art features such as MFCC, Perceptual Linear Prediction Cepstral Coefficients (PLP-CC), Cochlear Filter Cepstral Coefficients (CFCC), and RelAtive SpecTrAl(RASTA)-based PLP-CC. The Mean Opinion Score (MOS) and the Degraded-MOS, which are the measures of naturalness indicate an improvement of 9.69% with the proposed features from the MFCC (which is found to be the best among the other features) based features.
In recent times, BERT based transformer models have become an inseparable part of the 'tech stack' of text processing models. Similar progress is being observed in the speech domain with a multitude of models observing state-of-the-art results by using audio transformer models to encode speech. This begs the question of what are these audio transformer models learning. Moreover, although the standard methodology is to choose the last layer embedding for any downstream task, but is it the optimal choice? We try to answer these questions for the two recent audio transformer models, Mockingjay and wave2vec2.0. We compare them on a comprehensive set of language delivery and structure features including audio, fluency and pronunciation features. Additionally, we probe the audio models' understanding of textual surface, syntax, and semantic features and compare them to BERT. We do this over exhaustive settings for native, non-native, synthetic, read and spontaneous speech datasets
Bioassay data classification is an important task in drug d iscovery. However, the data used in classificat ion is h ighly imbalanced, lead ing to inaccuracies in classification for the minority class. We propose a novel approach for classification in wh ich we train separate models by using different features that are derived by training stacked autoencoders (SA E). Experiments are performed on 7 b ioassay datasets, in wh ich each data file consists of feature descriptors for every compound along with class label of compound being active, or inact ive. We first perform data cleaning using borderline synthetic minority oversampling technique (SMOTE) followed by removing the Tomek links, and then learn different features hierarchically, based on the cleaned data or feature vectors. We then train separate costsensitive feed-forward neural network (FNN) classifiers using the hierarchical features in order to obtain the final classification. To increase the True Positive Rate (TPR), a test sample is labeled as active if at least one classifier predicts it as active. In this paper, we demonstrate that by data cleaning and learn ing separate classifiers one can improve the TPR and F1 score when compared to other mach ine learning approaches. To the best of our knowledge, the researchers have not yet attempted the use of SAE and FNN for classifying bioassay data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.