Anusha Prakash scite author profile

Anusha Prakash

5Publications

63Citation Statements Received

86Citation Statements Given

How they've been cited

How they cite others

Affiliations

Indian Institute of Technology Madras, National Institute of Technology Karnataka, Bangalore University

Publications

Order By: Most citations

Building Multilingual End-to-End Speech Synthesisers for Indian Languages

Prakash¹,

Thomas²,

Umesh³

et al. 2019

View full text Add to dashboard Cite

Building text-to-speech (TTS) synthesisers is a difficult task, especially for low resource languages. Language-specific modules need to be developed for system building. End-to-end speech synthesis has become a popular paradigm as a TTS can be trained using only pairs. However, end-to-end speech synthesis is not scalable in a multilanguage scenario, as the vocabulary increases with the number of different scripts. In this paper, TTSes are trained for Indian languages using two text representations-character-based and phone-based. For the character-based approach, a multi-language character map (MLCM) is proposed to easily train Indic speech synthesisers. The phone-based approach uses the common label set (CLS) representation for Indian languages. Both approaches leverage the similarities that exist among the languages. The advantage is a compact representation across multiple languages. Experiments are conducted by building TTSes using monolingual data and by pooling data across two languages. The ability to synthesise code-mixed text using the phone-based approach is also assessed. Subjective evaluations indicate that reasonably good quality Indic TTSes can be developed using both approaches. This emphasises the need to incorporate multilingual text processing in the end-to-end framework.

show abstract

Generic Indic Text-to-Speech Synthesisers with Rapid Adaptation in an End-to-End Framework

Prakash¹,

Murthy²

2020

View full text Add to dashboard Cite

Code-switching in Indic Speech Synthesisers

Thomas

Prakash

Baby

et al. 2018

View full text Add to dashboard Cite

Most Indians are inherently bilingual or multilingual owing to the diverse linguistic culture in India. As a result, code-switching is quite common in conversational speech. The objective of this work is to train good quality text-to-speech (TTS) synthesisers that can seamlessly handle code-switching. To achieve this, bilingual TTSes that are capable of handling phonotactic variations across languages are trained using combinations of monolingual data in a unified framework. In addition to segmenting Indic speech data using signal processing cues in tandem with hidden Markov model-deep neural network (HMM-DNN), we propose to segment Indian English data using the same approach after NIST syllabification. Then, bilingual HTS-STRAIGHT based systems are trained by randomizing the order of data so that the systematic interactions between the two languages are captured better. Experiments are conducted by considering three language pairs: Hindi+English, Tamil+English and Hindi+Tamil. The code-switched systems are evaluated on monolingual, code-mixed and code-switched texts. Degradation mean opinion score (DMOS) for monolingual sentences shows marginal degradation over that of an equivalent monolingual TTS system, while the DMOS for bilingual sentences is significantly better than that of the corresponding monolingual TTS systems.

show abstract

Acoustic Analysis of Syllables Across Indian Languages

Prakash

Murthy

2016

View full text Add to dashboard Cite

Indian languages are broadly classified as Indo-Aryan or Dravidian. The basic set of phones is more or less the same, varying mostly in the phonotactics across languages. There has also been borrowing of sounds and words across languages over time due to intermixing of cultures. Since syllables are fundamental units of speech production and Indian languages are characterised by syllable-timed rhythm, acoustic analysis of syllables has been carried out. In this paper, instances of common and most frequent syllables in continuous speech have been studied across six Indian languages, from both Indo-Aryan and Dravidian language groups. The distributions of acoustic features have been compared across these languages. This kind of analysis is useful for developing speech technologies in a multilingual scenario. Owing to similarities in the languages, text-to-speech (TTS) synthesisers have been developed by segmenting speech data at the phone level using hidden Markov models (HMM) from other languages as initial models. Degradation mean opinion scores and word error rates indicate that the quality of synthesised speech is comparable to that of TTSes developed by segmenting the data using language-specific HMMs.

show abstract

Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds

Labutov

Yang

Prakash

et al. 2018

View full text Add to dashboard Cite

Question Answering (QA), as a research field, has primarily focused on either knowledge bases (KBs) or free text as a source of knowledge. These two sources have historically shaped the kinds of questions that are asked over these sources, and the methods developed to answer them. In this work, we look towards a practical use-case of QA over user-instructed knowledge that uniquely combines elements of both structured QA over knowledge bases, and unstructured QA over narrative, introducing the task of multirelational QA over personal narrative. As a first step towards this goal, we make three key contributions: (i) we generate and release TEXTWORLDSQA, a set of five diverse datasets, where each dataset contains dynamic narrative that describes entities and relations in a simulated world, paired with variably compositional questions over that knowledge, (ii) we perform a thorough evaluation and analysis of several state-of-the-art QA models and their variants at this task, and (iii) we release a lightweight Python-based framework we call TEXTWORLDS for easily generating arbitrary additional worlds and narrative, with the goal of allowing the community to create and share a growing collection of diverse worlds as a test-bed for this task. . 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In EMNLP.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Anusha Prakash

Building Multilingual End-to-End Speech Synthesisers for Indian Languages

Generic Indic Text-to-Speech Synthesisers with Rapid Adaptation in an End-to-End Framework

Code-switching in Indic Speech Synthesisers

Acoustic Analysis of Syllables Across Indian Languages

Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds

Contact Info

Product

Resources

About