Since the emergence of deep learning-based chatbots for knowledge services, numerous research and development projects have been conducted in various industries. A high demand for chatbots has drastically increased the global market size; however, the limited functional scalability of open-domain chatbots is a challenge to their application to industries. Moreover, as most chatbot frameworks employ English, it is necessary to create chatbots customized for other languages. To address this problem, this paper proposes KoRASA as a pipeline-optimization method, which uses a deep learning-based open-source chatbot framework to understand the Korean language. KoRASA is a closed-domain chatbot that is applicable across a wide range of industries in Korea. KoRASA’s operation consists of four stages: tokenization, featurization, intent classification, and entity extraction. The accuracy and F1-score of KoRASA were measured based on datasets taken from common tasks carried out in most industrial fields. The algorithm for intent classification and entity extraction was optimized. The accuracy and F1-score were 98.2% and 98.4% for intent classification and 97.4% and 94.7% for entity extraction, respectively. Furthermore, these results are better than those achieved by existing models. Accordingly, KoRASA can be applied to various industries, including mobile services based on closed-domain chatbots using Korean, robotic process automation (RPA), edge computing, and Internet of Energy (IoE) services.
Deep learning chatbot research and development is exploding recently to offer customers in numerous industries personalized services. However, human resources are used to create a learning dataset for a deep learning chatbot. In order to augment this, the idea of neural question generation (NQG) has evolved, although it has restrictions on how questions can be expressed in different ways and has a finite capacity for question generation. In this paper, we propose an ensemble-type NQG model based on the text-to-text transfer transformer (T5). Through the proposed model, the number of generated questions for each single NQG model can be greatly increased by considering the mutual similarity and the quality of the questions using the soft-voting method. For the training of the soft-voting algorithm, the evaluation score and mutual similarity score weights based on the context and the question–answer (QA) dataset are used as the threshold weight. Performance comparison results with existing T5-based NQG models using the SQuAD 2.0 dataset demonstrate the effectiveness of the proposed method for QG. The implementation of the proposed ensemble model is anticipated to span diverse industrial fields, including interactive chatbots, robotic process automation (RPA), and Internet of Things (IoT) services in the future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.