Buvana Ramanan scite author profile

Buvana Ramanan

2Publications

1Citation Statement Received

16Citation Statements Given

How they've been cited

How they cite others

Affiliations

Nokia (Finland)

Publications

Order By: Most citations

Does Speech Enhancement of Publicly Available Data Help Build Robust Speech Recognition Systems? (Student Abstract)

Ghai

Ramanan

Mueller

2020

AAAI

View full text Add to dashboard Cite

Automatic speech recognition(ASR) systems play a key role in many commercial products including voice assistants. Typically, they require large amounts of high quality speech data for training which gives an undue advantage to large organizations which have tons of private data. We investigated if speech data obtained from publicly available sources can be further enhanced to train better speech recognition models. We begin with noisy/contaminated speech data, apply speech enhancement to produce 'cleaned' version and use both the versions to train the ASR model. We have found that using speech enhancement gives 9.5% better word error rate than training on just the original noisy data and 9% better than training on just the ground truth 'clean' data. It's performance is also comparable to the ideal case scenario when trained on noisy and it's ground truth 'clean' version.

show abstract

Automated Techniques for Creating Speech Corpora from Public Data Sources for ML Training

Drabeck¹,

Ramanan²,

Woo³

et al. 2020

IJMLC

View full text Add to dashboard Cite

For machine learning (ML) to work well, there is a need for large amounts of good quality training data. Obtaining such data is often the key bottleneck for the entire ML development process. Using humans to do explicit collection has been the main approach, but this tends to be expensive and time-consuming. Therefore, there is significant interest in creating alternative data collection techniques. We explore these alternative data collection techniques in the context of speech data in this paper. We were initially motivated by the problem of wake word engine training, where we need a large number of utterances for specific wake words. Given that there are already large public repositories of media data (e.g., YouTube, DailyMotion), we were curious as to how feasible it is to find the utterances that we need. Our results are encouraging as we found many different types of words can readily be found and downloaded in the quantity and quality needed to create training corpora for DL training. Usually > 30% of the found words are suitable for corpus creation. Greater than 80% of the top 10,000 ranks words and > 50% of the top 20,000 words we selected easily produced > 5000 found words, which is sufficient to train a high quality Wake Word Engine. Besides general words, we specifically looked for words used in wake word engine construction such as Name/Place/Product Name. Here, again, we find most common names/places/products return more than a sufficient number of words for corpus creation. Only uncommon names and places (like Atticus or Maximus) are difficult to find in sufficient quantities for corpus creation. We demonstrate a wake word engine trained from words we found in YouTube has the equivalent performance to one trained with traditional human collected words. Even though we were focused on wake words, our approach is general. It can be applied to create speech corpus for various purposes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Buvana Ramanan

Does Speech Enhancement of Publicly Available Data Help Build Robust Speech Recognition Systems? (Student Abstract)

Automated Techniques for Creating Speech Corpora from Public Data Sources for ML Training

Contact Info

Product

Resources

About