“…Typically, character units-specifically, four nucleotide bases, namely, adenine (A), guanine (G), thymine (T), and cytosine (C)-are used in the biology domain. However, in the natural language domain, the units for comparison should be considered carefully, such as whether to use character, character-span, or word-level units [1]. In this study, we added a sub-word level comparison unit.…”