A Semi-supervised Ensemble Learning Approach for Character Labeling with Minimal Human Effort

Vajda, Szil ́rd; Junaidi, Akmal; Fink, Gernot A.

doi:10.1109/icdar.2011.60

Cited by 26 publications

(19 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For this experiment, we used the same dataset as descibed in [9], [10] but with a different composition. We did not work on a class-wise manner but rather document-wise.…”

Section: A Datasetmentioning

confidence: 99%

See 1 more Smart Citation

Statistical Modeling of the Relation between Characters and Diacritics in Lampung Script

Junaidi

Grzeszick

Fink

et al. 2013

2013 12th International Conference on Document Analysis and Recognition

Self Cite

View full text Add to dashboard Cite

Lampung Script is a non-cursive script where a rich set of diacritics is used to modify the syllable denoted by a character symbol. Consequently, the analysis of the relation between characters and diacritic marks associated with them plays an important role in the recognition process. As diacritics can appear in three different relative positions with respect to a character (top, bottom, and right) associating them correctly with a character is a challenging problem. In this paper we propose a novel approach for modeling the relations between characters and diacritics in handwritten Lampung documents. First, a document is segmented into characters and diacritic marks. Then every character defines a normalized coordinate system into which nearby diacritics can be mapped. The relation between a diacritic mark and its associated character can then be described by a statistical model. In a writer independent experimental evaluation we investigate models with different degrees of specialization with respect to their capability of predicting the correct character-todiacritic associations. We achieve significant error rate reductions with respect to a naive association model using a nearest-neighbor criterion.

show abstract

“…For this experiment, we used the same dataset as descibed in [9], [10] but with a different composition. We did not work on a class-wise manner but rather document-wise.…”

Section: A Datasetmentioning

confidence: 99%

“…In our previous work, particular research on the Lampung handwritten character recognition has been addressed for semiautomatic labeling [9] and recognition [10]. In the first work, we manually assigned labels to only 0.5% of the training data, the rest of the labels were inferred automatically by the proposed method.…”

Section: Related Workmentioning

confidence: 99%

Statistical Modeling of the Relation between Characters and Diacritics in Lampung Script

Junaidi

Grzeszick

Fink

et al. 2013

2013 12th International Conference on Document Analysis and Recognition

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our work focus therefore on grouping the handwritten scripts into several clusters, and then labeling them manually. A similar offline handwriting annotation system Vajda et al (2011) proposes the idea to label a large number of isolated characters; clustering them into several clusters of characters, and labeling the clusters in order to reduce the human effort. This work shows that over 80% symbol labeling workload have been saved.…”

Section: Reducing Annotation Workloadmentioning

confidence: 99%

An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols

Li¹,

Mouchère²,

Viard-Gaudin³

2014

Pattern Recognition Letters

View full text Add to dashboard Cite

“…Our preliminary work (Vajda et al, 2011; Richarz et al, 2014), proposed an analogous scheme, but using much less feature spaces, and an unsupervised clustering mechanism, which relied only on k-means. In this paper, we extended the number of feature spaces considered for unsupervised clustering, and the clustering methods.…”

Section: Related Workmentioning

confidence: 99%

“…In addition, they are evaluated at two levels: the clustering method performance, and the effect of this performance on the classification of the test data set using k-nn. Instead of limiting the input features to the pixel values of the raw images in gray level (Vajda et al, 2011), more sophisticated and lower dimensionality features such as profiles, local binary patterns (Pietikäinen et al, 2011), and Radon transform (Miciak, 2010; Cecotti and Vajda, 2013) were considered to better exploit the advantage of the original method (Vajda et al, 2011). Currently, each image is projected in five different feature spaces.…”

Section: Introductionmentioning

confidence: 99%

Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition

Vajda

Rangoni²,

Cecotti

2015

Pattern Recognition Letters

View full text Add to dashboard Cite

For training supervised classifiers to recognize different patterns, large data collections with accurate labels are necessary. In this paper, we propose a generic, semi-automatic labeling technique for large handwritten character collections. In order to speed up the creation of a large scale ground truth, the method combines unsupervised clustering and minimal expert knowledge. To exploit the potential discriminant complementarities across features, each character is projected into five different feature spaces. After clustering the images in each feature space, the human expert labels the cluster centers. Each data point inherits the label of its cluster’s center. A majority (or unanimity) vote decides the label of each character image. The amount of human involvement (labeling) is strictly controlled by the number of clusters – produced by the chosen clustering approach. To test the efficiency of the proposed approach, we have compared, and evaluated three state-of-the art clustering methods (k-means, self-organizing maps, and growing neural gas) on the MNIST digit data set, and a Lampung Indonesian character data set, respectively. Considering a k-nn classifier, we show that labeling manually only 1.3% (MNIST), and 3.2% (Lampung) of the training data, provides the same range of performance than a completely labeled data set would.

show abstract

A Semi-supervised Ensemble Learning Approach for Character Labeling with Minimal Human Effort

Cited by 26 publications

References 9 publications

Statistical Modeling of the Relation between Characters and Diacritics in Lampung Script

Statistical Modeling of the Relation between Characters and Diacritics in Lampung Script

An annotation assistance system using an unsupervised codebook composed of handwritten graphical multi-stroke symbols

Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition

Contact Info

Product

Resources

About