This work is an effort towards building Neural Speech Recognizers system for Quranic recitations that can be effectively used by anyone regardless of their gender and age. Despite having a lot of recitations available online, most of them are recorded by professional male adult reciters, which means that an ASR system trained on such datasets would not work for female/ child reciters. We address this gap by adopting a benchmark dataset of audio records of Quranic recitations that consists of recitations by both genders from different ages. Using this dataset, we build several speaker-independent NSR systems based on the DeepSpeech model and use word error rate (WER) for evaluating them. The goal is to show how an NSR system trained and tuned on a dataset of a certain gender would perform on a test set from the other gender. Unfortunately, the number of female recitations in our dataset is rather small while the number of male recitations is much larger. In the first set of experiments, we avoid the imbalance issue between the two genders and down-sample the male part to match the female part. For this small subset of our dataset, the results are interesting with 0.968 WER when the system is trained on male recitations and tested on female recitations. The same system gives 0.406 WER when tested on male recitations. On the other hand, training the system on female recitations and testing it on male recitation gives 0.966 WER while testing it on female recitations gives 0.608 WER.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.