The conventional approach in text-based machine translation (MT) is to translate complete sentences, which are conveniently indicated by sentence boundary markers. However, since such boundary markers are not available for speech, new methods are required that define an optimal unit for translation. Our experimental results show that with a segment length optimized for a particular MT system, intrasentence segmentation can improve translation performance (measured in BLEU) by up to 11% for Arabic Broadcast Conversation (BC) and 6% for Arabic Broadcast News (BN). We show that acoustic segmentation that minimizes Word Error Rate (WER) may not give the best translation performance. We improve upon it by automatically resegmenting the ASR output in a way that is optimized for translation and argue that it might be necessary for different stages of a Spoken Language Translation (SLT) system to define their own optimal units.
This paper proposes a new speech enhancement framework to improve the quality of speeches recorded under adverse acoustic environments based on the speech presence uncertainty. Since the uncertainty evaluation gives a more and clear discrimination about the speech and noise, this paper proposes a new uncertainty evaluation mechanism as a preprocessing mechanism to the noise suppression methods. This mechanism relates with energies of a noisy speech signal and classifies the speech segments and noise segments more perfectly. In addition to the quality enhancement, this approach also reduces the unnecessary computational burden over the speech processing system. Extensive simulations are carried out over the speech signals with different types of non-stationary noises like babble noise, exhibition noise, restaurant noise and train station noises and the performance is measured with the performance metrics namely the Output SNR, AvgSegSNR, PESQ and COMP. The comparative analysis of proposed approach over the conventional approaches shows an outstanding performance in all environments.
Cross-modal recipe retrieval has gained prominence due to its ability to retrieve a text representation given an image representation and vice versa. Clustering these recipe representations based on similarity is essential to retrieve relevant information about unknown food images. Existing studies cluster similar recipe representations in the latent space based on class names. Due to inter-class similarity and intraclass variation, associating a recipe with a class name does not provide sufficient knowledge about recipes to determine similarity. However, recipe title, ingredients, and cooking actions provide detailed knowledge about recipes and are a better determinant of similar recipes. In this study, we utilized this additional knowledge of recipes, such as ingredients and recipe title, to identify similar recipes, emphasizing attention especially on rare ingredients. To incorporate this knowledge, we propose a knowledge-infused multimodal cooking representation learning network, Ki-Cook, built on the procedural attribute of the cooking process. To the best of our knowledge, this is the first study to adopt a comprehensive recipe similarity determinant to identify and cluster similar recipe representations. The proposed network also incorporates ingredient images to learn multimodal cooking representation. Since the motivation for clustering similar recipes is to retrieve relevant information for an unknown food image, we evaluated the ingredient retrieval task. We performed an empirical analysis to establish that our proposed model improves the Coverage of Ground Truth by 12% and the Intersection Over Union by 10% compared to the baseline models. On average, the representations learned by our model contain an additional 15.33% of rare ingredients compared to the baseline models. Owing to this difference, our qualitative evaluation shows a 39% improvement in clustering similar recipes in the latent space compared to the baseline models, with an inter-annotator agreement of the Fleiss kappa score of 0.35.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.