Determination of an emotional state through speech increases the amount of information associated with a speaker. It is therefore important to be able to detect and identify a speaker's emotional state or state of stress. The paper proposes an approach based on genetic algorithms to determine a set of features that will allow robust classification of emotional states. Starting from a vector of 462 features, a subset of features is obtained providing a good discrimination between neutral, angry, loud and Lombard states for the SUSAS simulated domain and between neutral and stressed states for the SUSAS actual domain.
Discontinuous transmission based on speech/pause detection represents a valid solution to improve the spectral efficiency of new-generation wireless communication systems. In this context, robust Voice Activity Detection (VAD) algorithms are required, as traditional solutions present a high misclassification rate in the presence of the background noise typical of mobile environments. This paper presents a voice detection algorithm which is robust to noisy environments thanks to a new methodology adopted for the matching process. More specifically, the VAD proposed is based on a pattern recognition approach in which the matching phase is performed by a set of six fuzzy rules trained by means of a new hybrid learning tool. A series of objective tests performed on a large speech database, varying the signal-to-noise ratio, the types of background noise and the input signal level, showed that, as compared with the VAD recently standardized by ITU-T in Rec. G.729 Annex B, the Fuzzy VAD on average achieves an improvement in reduction both of the activity factor of about 25 % and of the clipping introduced of about 43 %. Informal listening tests also confirm an improvement in the perceived speech quality.
SUMMARYThe paper presents a new low-complexity algorithm for silence suppression in adverse acoustic environments. The algorithm uses a single time-domain input parameter (signal power) given to a simple matching block. The decision module adapts a series of thresholds depending on the current estimated signal-to-noise-ratio (SNR) of the signal. A series of tests carried out using a large speech database confirm a 10% improvement in pause detection performance as compared with the AMR VAD option 1 recently adopted by ETSI for 3rd-generation mobile systems.
This paper proposes a background noise classifier based on a new, computationally simple, robust set of acoustic features. Complementary to a previous work [l], reporting on the first studies carried out by the authors on background noise classification, this paper mainly presents: 1) a criterion to group a large range of environmental noise into a reduced set of classes of noise with similar acoustic characteristics; 2) a larger set of background noise together with a new multilevel classification architecture; 3) a new set of robust acoustic parameters. We have maintained the pattern recognition approach proposed in [ 13 in which the matching phase is performed using a set of trained fuzzy rules. The improved version of the Fuzzy Noise Classifier (FNC) has been assessed in terms of misclassification percentage and compared with a Quadratic Gaussian Classifier (QGC).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.