The end-to-end (E2E) framework has emerged as a viable alternative to conventional hybrid systems in automatic speech recognition (ASR) domain. Unlike the monolingual case, the challenges faced by an E2E system in code-switching ASR task include (i) the expansion of target set to account for multiple languages involved, (ii) the requirement of a robust target-to-word (T2W) transduction, and (iii) the need for more effective context modeling. In this paper, we aim to address those challenges for reliable training of the E2E ASR system on a limited amount of code-switching data. The main contribution of this work lies in the E2E target set reduction by exploiting the acoustic similarity and the proposal of a novel context-dependent T2W transduction scheme. Additionally, a novel textual feature has been proposed to enhance the context modeling in the case of code-switching data. The experiments are performed on a recently created Hindi-English code-switching corpus. For contrast purposes, the existing combined target set based system is also evaluated. The proposed system outperforms the existing one and yields a target error rate of 18.1% along with a word error rate of 29.79%. INDEX TERMS Code-switching, speech recognition, end-to-end system, factored language model, targetto-word transduction.
End-to-end (E2E) systems are fast replacing the conventional systems in the domain of automatic speech recognition. As the target labels are learned directly from speech data, the E2E systems need a bigger corpus for effective training. In the context of code-switching task, the E2E systems face two challenges: (i) the expansion of the target set due to multiple languages involved, and (ii) the lack of availability of sufficiently large domain-specific corpus. Towards addressing those challenges, we propose an approach for reducing the number of target labels for reliable training of the E2E systems on limited data. The efficacy of the proposed approach has been demonstrated on two prominent architectures, namely CTCbased and attention-based E2E networks. The experimental validations are performed on a recently created Hindi-English code-switching corpus. For contrast purpose, the results for the full target set based E2E system and a hybrid DNN-HMM system are also reported.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.