Kun Qian scite author profile

The INTERSPEECH 2017 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: In the Addressee sub-challenge, it has to be determined whether speech produced by an adult is directed towards another adult or towards a child; in the Cold sub-challenge, speech under cold has to be told apart from 'healthy' speech; and in the Snoring sub-challenge, four different types of snoring have to be classified. In this paper, we describe these sub-challenges, their conditions, and the baseline feature extraction and classifiers, which include data-learnt feature representations by end-to-end learning with convolutional and recurrent neural networks, and bag-of-audio-words for the first time in the challenge series.

show abstract

The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats

Schuller

Steidl

Batliner

et al. 2018

131

View full text Add to dashboard Cite

The INTERSPEECH 2018 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Atypical Affect Sub-Challenge, four basic emotions annotated in the speech of handicapped subjects have to be classified; in the Self-Assessed Affect Sub-Challenge, valence scores given by the speakers themselves are used for a three-class classification problem; in the Crying Sub-Challenge, three types of infant vocalisations have to be told apart; and in the Heart Beats Sub-Challenge, three different types of heart beats have to be determined. We describe the Sub-Challenges, their conditions, and baseline feature extraction and classifiers, which include data-learnt (supervised) feature representations by end-to-end learning, the 'usual' ComParE and BoAW features, and deep unsupervised representation learning using the AUDEEP toolkit for the first time in the challenge series.

show abstract

An Early Study on Intelligent Analysis of Speech Under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

Qian

Song

Yang

et al. 2020

View full text Add to dashboard Cite

The COVID-19 outbreak was announced as a global pandemic by the World Health Organisation in March 2020 and has affected a growing number of people in the past few weeks. In this context, advanced artificial intelligence techniques are brought to the fore in responding to fight against and reduce the impact of this global health crisis. In this study, we focus on developing some potential use-cases of intelligent speech analysis for COVID-19 diagnosed patients. In particular, by analysing speech recordings from these patients, we construct audio-onlybased models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety. For this purpose, two established acoustic feature sets and support vector machines are utilised. Our experiments show that an average accuracy of .69 obtained estimating the severity of illness, which is derived from the number of days in hospitalisation. We hope that this study can foster an extremely fast, low-cost, and convenient way to automatically detect the COVID-19 disease.

show abstract

COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis

Schuller

Qian

et al. 2021

Front. Digit. Health

View full text Add to dashboard Cite

At the time of writing this article, the world population is suffering from more than 2 million registered COVID-19 disease epidemic-induced deaths since the outbreak of the corona virus, which is now officially known as SARS-CoV-2. However, tremendous efforts have been made worldwide to counter-steer and control the epidemic by now labelled as pandemic. In this contribution, we provide an overview on the potential for computer audition (CA), i.e., the usage of speech and sound analysis by artificial intelligence to help in this scenario. We first survey which types of related or contextually significant phenomena can be automatically assessed from speech or sound. These include the automatic recognition and monitoring of COVID-19 directly or its symptoms such as breathing, dry, and wet coughing or sneezing sounds, speech under cold, eating behaviour, sleepiness, or pain to name but a few. Then, we consider potential use-cases for exploitation. These include risk assessment and diagnosis based on symptom histograms and their development over time, as well as monitoring of spread, social distancing and its effects, treatment and recovery, and patient well-being. We quickly guide further through challenges that need to be faced for real-life usage and limitations also in comparison with non-audio solutions. We come to the conclusion that CA appears ready for implementation of (pre-)diagnosis and monitoring tools, and more generally provides rich and significant, yet so far untapped potential in the fight against COVID-19 spread.

show abstract

National and international agricultural research and rural poverty: the case of rice research in India and China

Fan

Chan‐Kang

Qian

et al. 2005

Agricultural Economics

View full text Add to dashboard Cite

We measure the total benefits from rice varietal improvement research in China and India, using variety adoption and performance data over the last two decades. Genetic or pedigree information is used to partition the total benefits between these two countries and International Rice Research Institute (IRRI). Finally, reported elasticities of poverty reduction with respect to agricultural output growth are used to assess the effects of national and international research on poverty reduction in rural India and China. The results indicate that rice varietal improvement research has contributed tremendously to increase in rice production, accounting for 14% to 24% of the total production value over the last two decades in both countries. Rice research has also helped reduce large numbers of rural poor. IRRI played a crucial role in these successes. In 1999, for every US$1 million invested at IRRI, more than 800 and 15,000 rural poor were lifted above the poverty line in China and India, respectively. These poverty reduction effects were even larger in the earlier years. Copyright 2005 International Association of Agricultural Economics.

show abstract

Snoring classified: The Munich-Passau Snore Sound Corpus

Janott

Schmitt

Zhang

et al. 2018

Computers in Biology and Medicine

View full text Add to dashboard Cite

Deep Scalogram Representations for Acoustic Scene Classification

Ren

Qian

Wang

et al. 2018

IEEE/CAA J. Autom. Sinica

102

View full text Add to dashboard Cite

Spectrogram representations of acoustic scenes have achieved competitive performance for acoustic scene classification. Yet, the spectrogram alone does not take into account a substantial amount of time-frequency information. In this study, we present an approach for exploring the benefits of deep scalogram representations, extracted in segments from an audio stream. The approach presented firstly transforms the segmented acoustic scenes into bump and morse scalograms, as well as spectrograms; secondly, the spectrograms or scalograms are sent into pre-trained convolutional neural networks; thirdly, the features extracted from a subsequent fully connected layer are fed into (bidirectional) gated recurrent neural networks, which are followed by a single highway layer and a softmax layer; finally, predictions from these three systems are fused by a margin sampling value strategy. We then evaluate the proposed approach using the acoustic scene classification data set of 2017 IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). On the evaluation set, an accuracy of 64.0 % % % from bidirectional gated recurrent neural networks is obtained when fusing the spectrogram and the bump scalogram, which is an improvement on the 61.0 % % % baseline result provided by the DCASE 2017 organisers. This result shows that extracted bump scalograms are capable of improving the classification accuracy, when fusing with a spectrogram-based system.

show abstract

Learning Image-based Representations for Heart Sound Classification

Ren

Cummins

Pandit

et al. 2018

View full text Add to dashboard Cite

Machine learning based heart sound classification represents an efficient technology that can help reduce the burden of manual auscultation through the automatic detection of abnormal heart sounds. In this regard, we investigate the efficacy of using the pretrained Convolutional Neural Networks (CNNs) from large-scale image data for the classification of Phonocardiogram (PCG) signals by learning deep PCG representations. First, the PCG files are segmented into chunks of equal length. Then, we extract a scalogram image from each chunk using a wavelet transformation. Next, the scalogram images are fed into either a pre-trained CNN, or the same network fine-tuned on heart sound data. Deep representations are then extracted from a fully connected layer of each network and classification is achieved by a static classifier. Alternatively, the scalogram images are fed into an end-to-end CNN formed by adapting a pre-trained network via transfer learning. Key results indicate that our deep PCG representations extracted from a fine-tuned CNN perform the strongest, 56.2 % mean accuracy, on our heart sound classification task. When compared to a baseline accuracy of 46.9 %, gained using conventional audio processing features and a support vector machine, this is a significant relative improvement of 19.8 % (p < .001 by one-tailed z-test).

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kun Qian

The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring

The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats

An Early Study on Intelligent Analysis of Speech Under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis

National and international agricultural research and rural poverty: the case of rice research in India and China

Snoring classified: The Munich-Passau Snore Sound Corpus

Deep Scalogram Representations for Acoustic Scene Classification

Learning Image-based Representations for Heart Sound Classification

Contact Info

Product

Resources

About