Native language identification (NLI) is the task of identifying the first language of a user based on their speech or written text in a second language. In this paper, we propose the use of spectrogramand cochleagram-based features extracted from very short speech utterances (0.8 s on average) to infer the native language of an Urdu speaker. The bidirectional long short-term memory (BLSTM) neural networks are adopted for the classification of utterances among the native languages. A set of experiments is carried out for the network architecture search and the system's accuracy is evaluated on the validation data set. Overall accuracy of 74.81% and 71.61% is achieved using the Mel-frequency cepstral coefficients (MFCC) and Gammatone frequency cepstral coefficients (GFCC), respectively. Moreover, the optimized MFCC featurebased BLSTM network and GFCC feature-based BLSTM network are merged together to take advantage of both the feature sets. The experiments show that the performance of the merged network surpasses the individual BLSTM networks and accuracy of 75.69% is achieved on the evaluation data. The effect of test data duration is also analyzed (from 0.27 s to 1.5 s); in addition, it is observed that with very short duration as 0.4 s, an accuracy of over 50% can be achieved.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.