An evaluation to detect and correct erroneous characters wrongly substituted, deleted and inserted in Japanese and English sentences using Markov models

Araki, Tetsuo; Ikehara, Satoru; Tsukahara, Nobuyuki; Komatsu, Yasunori

doi:10.3115/991886.991918

Cited by 9 publications

(12 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Up to now, the methods to detect and correct erroneous characters wrongly substituted, deleted, or inserted at the inner position in Japanese sentences using m th‐order Markov chain model for Japanese ‘kanji‐kana’ characters, have been known to be useful to detect and correct these erroneous characters [11–18]. For an example, the value of the second‐order Markov probability for each character of the erroneous chain \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$\Gamma^{(2)}_S$ \end{document} or \documentclass{article}\usepackage{amsmath}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{amsfonts}\pagestyle{empty}\begin{document}$\Gamma^{(2)}_I$ \end{document} remains smaller value than the critical value T just four times.…”

Section: A New Methods Of Error Detection Using Cmcp and Smcpmentioning

confidence: 99%

“…In order to solve this problem, by using the relation between the types of errors and the length of a chain in which the values of Markov joint probability remain small, a new method has been proposed to judge the three types of the errors, which are characters wrongly substituted, deleted, or inserted in Japanese sentences and ‘bunsetsu’s; to find the locations and the lengths of these erroneous characters; and to correct these errors in Japanese ‘kanji‐kana’ chains using m th‐order Markov chain model [11–18].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s

Araki

Mori

Taniguchi

2011

IEEJ Transactions Elec Engng

View full text Add to dashboard Cite

A method to detect the erroneous characters wrongly substituted, deleted, and inserted at the interior location of Japanese sentences and ‘bunsetsu’s using mth‐order Markov chain model has been proposed earlier and was found to be useful in detecting these erroneous characters. However, with this method it is difficult to detect erroneous characters at the end position of Japanese sentences and ‘bunsetsu’s, because the Markov chain probabilities of erroneous characters at the end position of sentences and ‘bunsetsu’s, do not remain smaller than the critical value T the same number of times. This paper proposes a method to detect erroneous characters located at the end position of sentences and ‘bunsetsu’s using the ‘skipped Markov chain model’ in addition to the ‘connected Markov chain model’. From experiments with newspaper articles, the proposed method is shown to be useful to correct erroneous characters located at the end position of sentences and ‘bunsetsu’s. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

show abstract

Section: A New Methods Of Error Detection Using Cmcp and Smcpmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s

Araki

Mori

Taniguchi

2011

IEEJ Transactions Elec Engng

View full text Add to dashboard Cite

show abstract

“…An example of 2nd-order Markov chain models to skip one character, is shown in Fig.3. The precise definitions of the error types, the "Relevance Factor" P and the "Recall Factor" R are given in [2].…”

Section: Basic Definitionsmentioning

confidence: 99%

Detection and correction of mutually interfered erroneous characters in Japanese texts

Araki

Ikehara

Komatsu

1999

Proceedings of the 1999 ACM Symposium on Applied Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Araki et al [1] tried to correct not only substitution errors but also insertion and deletion errors in Japanese text using character trigram statistics. In insertion errors, some wrong characters are inserted into the original text, and in deletion errors, some characters are lost from the original text.…”

Section: Introductionmentioning

confidence: 99%

“…Instead, it has been shown that character n-gram statistics are effective in detecting and correcting erroneous Japanese text [1,7,8].…”

Section: Introductionmentioning

confidence: 99%

Japanese OCR Error Correction Using Stochastic Morphological Analyzer and Probabilistic Word N-gram Model

Takeuchi

Matsumoto

2000

Int. J. Comp. Proc. Lang.

View full text Add to dashboard Cite

While the accuracy of current OCR systems is getting very high, they are still error-prone. In this paper, we clarify how much of recognition errors in text can be corrected using linguistic information from on-line texts. We present an OCR error correction method which uses character trigram, stochastic morphological analysis and word trigram models. These models are trained on a large untagged text. The proposed method does not use any graphical information about characters. Therefore the method can be applied to any domain that has a large on-line text corpus. When our method is applied to text which include random character substitution, it improves a text of 90% correct character rate into that of 94.3% correct rate and a 95% correct text into a 96.9% correct one.

show abstract

An evaluation to detect and correct erroneous characters wrongly substituted, deleted and inserted in Japanese and English sentences using Markov models

Cited by 9 publications

References 7 publications

A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s

A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s

Detection and correction of mutually interfered erroneous characters in Japanese texts

Japanese OCR Error Correction Using Stochastic Morphological Analyzer and Probabilistic Word N-gram Model

Contact Info

Product

Resources

About