Sharath Rao scite author profile

Sharath Rao

5Publications

3Citation Statements Received

122Citation Statements Given

How they've been cited

How they cite others

122

Affiliations

Technical University of Munich

Publications

Order By: Most citations

Optimizing sentence segmentation for spoken language translation

Rao¹,

Lane²,

Schultz³

2007

View full text Add to dashboard Cite

The conventional approach in text-based machine translation (MT) is to translate complete sentences, which are conveniently indicated by sentence boundary markers. However, since such boundary markers are not available for speech, new methods are required that define an optimal unit for translation. Our experimental results show that with a segment length optimized for a particular MT system, intrasentence segmentation can improve translation performance (measured in BLEU) by up to 11% for Arabic Broadcast Conversation (BC) and 6% for Arabic Broadcast News (BN). We show that acoustic segmentation that minimizes Word Error Rate (WER) may not give the best translation performance. We improve upon it by automatically resegmenting the ASR output in a way that is optimized for translation and argue that it might be necessary for different stages of a Spoken Language Translation (SLT) system to define their own optimal units.

show abstract

The Algorithms That Make Instacart Roll: How Machine Learning And Other Tech Tools Guide Your Groceries From Store To Doorstep

Rao¹,

Zhang²

2021

IEEE Spectr.

View full text Add to dashboard Cite

The effect of noise in automatic text classification

Samant

Rao²

2011

View full text Add to dashboard Cite

An Improved Bi-Level Thresholding Based Uncertainty Evaluation for Speech Enhancement in Non-Stationary Noises

Rao¹,

Sankar²,

Naidu³

2018

IJET

View full text Add to dashboard Cite

This paper proposes a new speech enhancement framework to improve the quality of speeches recorded under adverse acoustic environments based on the speech presence uncertainty. Since the uncertainty evaluation gives a more and clear discrimination about the speech and noise, this paper proposes a new uncertainty evaluation mechanism as a preprocessing mechanism to the noise suppression methods. This mechanism relates with energies of a noisy speech signal and classifies the speech segments and noise segments more perfectly. In addition to the quality enhancement, this approach also reduces the unnecessary computational burden over the speech processing system. Extensive simulations are carried out over the speech signals with different types of non-stationary noises like babble noise, exhibition noise, restaurant noise and train station noises and the performance is measured with the performance metrics namely the Output SNR, AvgSegSNR, PESQ and COMP. The comparative analysis of proposed approach over the conventional approaches shows an outstanding performance in all environments.

show abstract

Ki-Cook: clustering multimodal cooking representations through knowledge-infused learning

et al. 2023

View full text Add to dashboard Cite

Cross-modal recipe retrieval has gained prominence due to its ability to retrieve a text representation given an image representation and vice versa. Clustering these recipe representations based on similarity is essential to retrieve relevant information about unknown food images. Existing studies cluster similar recipe representations in the latent space based on class names. Due to inter-class similarity and intraclass variation, associating a recipe with a class name does not provide sufficient knowledge about recipes to determine similarity. However, recipe title, ingredients, and cooking actions provide detailed knowledge about recipes and are a better determinant of similar recipes. In this study, we utilized this additional knowledge of recipes, such as ingredients and recipe title, to identify similar recipes, emphasizing attention especially on rare ingredients. To incorporate this knowledge, we propose a knowledge-infused multimodal cooking representation learning network, Ki-Cook, built on the procedural attribute of the cooking process. To the best of our knowledge, this is the first study to adopt a comprehensive recipe similarity determinant to identify and cluster similar recipe representations. The proposed network also incorporates ingredient images to learn multimodal cooking representation. Since the motivation for clustering similar recipes is to retrieve relevant information for an unknown food image, we evaluated the ingredient retrieval task. We performed an empirical analysis to establish that our proposed model improves the Coverage of Ground Truth by 12% and the Intersection Over Union by 10% compared to the baseline models. On average, the representations learned by our model contain an additional 15.33% of rare ingredients compared to the baseline models. Owing to this difference, our qualitative evaluation shows a 39% improvement in clustering similar recipes in the latent space compared to the baseline models, with an inter-annotator agreement of the Fleiss kappa score of 0.35.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sharath Rao

Optimizing sentence segmentation for spoken language translation

The Algorithms That Make Instacart Roll: How Machine Learning And Other Tech Tools Guide Your Groceries From Store To Doorstep

The effect of noise in automatic text classification

An Improved Bi-Level Thresholding Based Uncertainty Evaluation for Speech Enhancement in Non-Stationary Noises

Ki-Cook: clustering multimodal cooking representations through knowledge-infused learning

Contact Info

Product

Resources

About