Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.
The document layout analysis is a complex task in the context of heterogeneous documents. It is still a challenging problem. In this paper, we present our contribution for the layout analysis competition of the international Maurdor Campaign. Our method is based on a grammatical description of the content of elements. It consists in iteratively finding and then removing the most structuring elements of documents. This method is based on notions of perceptive vision: a combination of points of view of the document, and the analysis of salient contents. Our description is generic enough to deal with a very wide range of heterogeneous documents. This method obtained the second place in Run 2 of Maurdor Campaign (on 1000 documents), and the best results in terms of pixel labeling for text blocs and graphic regions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.