“…These writing habits with the well-known OOV problems in NAMEX extraction seriously lower the performance of the morphological analyzer. To resolve this problem, the proposed system uses a statistical model based on character n-grams because character n-gram models have been generally known as a good solution of word boundary detection for languages with no spacing between words (Goh et al, 2003;Ha et al, 2004). To perform instance boundary detection and category assignment at the same time, we first defined nine labels that represented the boundaries of named instance candidates by adopting a 'begin, inner, and outer (BIO)' annotation scheme, as shown in Table 1 (Shen and Sarkar, 2005;Uchimoto et al, 2000).…”