Text recognition is focused on text transcription-based image processing modeling in relevance to such domains like document digitization, content moderation, scene text translation, automation driving, scene understanding, and other related contexts. Conventional text recognition techniques are often concerned about characters-seen recognition more. However, two factors in the training set of these methods are yet to be well covered, which are novel character cat• egories and out-of-vocabulary (OOV) samples. Newly characters-related samples are often linked with OOV-based samples. However, it may pay attention to seen characters without novel combinations or contexts. For novel character cat• egories, internet-based environments can be mainly used to face unseen ligatures like 1) emoticons and unperceived lan•guages, 2) scene-text recognition environments, and 3) characters from foreign and region-specific languages. For digitiza• tion profiling, the undiscovered characters may not be involved in as well. Since the heterogeneity of language format to be balanced, the linguistic statistic data (e. g. , n-gram, context, etc. ) can be biased the training data gradually, which is