Amr El-Desoky Mousa scite author profile

Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single-and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks.

show abstract

A randomized controlled trial to assess the safety and efficacy of silymarin on symptoms, signs and biomarkers of acute hepatitis

El-Kamary¹,

Shardell²,

Abdel‐Hamid³

et al. 2009

Phytomedicine

View full text Add to dashboard Cite

Purpose-Milk thistle or its purified extract, silymarin (Silybum marianum), is widely used in treating acute or chronic hepatitis. Although silymarin is hepatoprotective in animal experiments and some human hepatotoxic exposures, its efficacy in ameliorating the symptoms of acute clinical hepatitis remains inconclusive. In this study, our purpose was to determine whether silymarin improves symptoms, signs and laboratory test results in patients with acute clinical hepatitis, regardless of etiology.Methods-This is a randomized, placebo-controlled trial in which participants, treating physicians and data management staff were blinded to treatment group. The study was conducted at two fever hospitals in Tanta and Banha, Egypt where patients with symptoms compatible with acute clinical hepatitis and serum alanine aminotransferase (ALT) levels > 2.5 times the upper limit of normal were enrolled. The intervention consisted of three times daily ingestion of either a standard recommended dose of 140 mg of silymarin (Legalon®, MADAUS GmbH, Cologne, Germany), or a vitamin placebo for four weeks with an additional four-week follow-up. The primary outcomes were symptoms and signs of acute hepatitis and results of liver function tests on days 2, 4 and 7 and weeks 2, 4, and 8. Side-effects and adverse events were ascertained by self-report. Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Results-From NIH Public Access Author ManuscriptPhytomedicine. Author manuscript; available in PMC 2010 May 1.

show abstract

Contextual Bidirectional Long Short-Term Memory Recurrent Neural Network Language Models: A Generative Approach to Sentiment Analysis

Mousa

Schuller

2017

View full text Add to dashboard Cite

Traditional learning-based approaches to sentiment analysis of written text use the concept of bag-of-words or bag-of-ngrams, where a document is viewed as a set of terms or short combinations of terms disregarding grammar rules or word order. Novel approaches de-emphasize this concept and view the problem as a sequence classification problem. In this context, recurrent neural networks (RNNs) have achieved significant success. The idea is to use RNNs as discriminative binary classifiers to predict a positive or negative sentiment label at every word position then perform a type of pooling to get a sentence-level polarity. Here, we investigate a novel generative approach in which a separate probability distribution is estimated for every sentiment using language models (LMs) based on long short-term memory (LSTM) RNNs. We introduce a novel type of LM using a modified version of bidirectional LSTM (BLSTM) called contextual BLSTM (cBLSTM), where the probability of a word is estimated based on its full left and right contexts. Our approach is compared with a BLSTM binary classifier. Significant improvements are observed in classifying the IMDB movie review dataset. Further improvements are achieved via model combination.

show abstract

I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance

et al. 2016

View full text Add to dashboard Cite

We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.