We study bisimulation and context equivalence in a probabilistic λ-calculus. The contributions of this paper are threefold. Firstly we show a technique for proving congruence of probabilistic applicative bisimilarity. While the technique follows Howe's method, some of the technicalities are quite different, relying on non-trivial "disentangling" properties for sets of real numbers. Secondly we show that, while bisimilarity is in general strictly finer than context equivalence, coincidence between the two relations is attained on pure λ-terms. The resulting equality is that induced by Levy-Longo trees, generally accepted as the finest extensional equivalence on pure λ-terms under a lazy regime. Finally, we derive a coinductive characterisation of context equivalence on the whole probabilistic language, via an extension in which terms akin to distributions may appear in redex position. Another motivation for the extension is that its operational semantics allows us to experiment with a different congruence technique, namely that of logical bisimilarity.
This paper introduces a new way for text-line extraction by integrating deep-learning based pre-classification and state-of-the-art segmentation methods. Text-line extraction in complex handwritten documents poses a significant challenge, even to the most modern computer vision algorithms. Historical manuscripts are a particularly hard class of documents as they present several forms of noise, such as degradation, bleedthrough, interlinear glosses, and elaborated scripts. In this work, we propose a novel method which uses semantic segmentation at pixel level as intermediate task, followed by a text-line extraction step. We measured the performance of our method on a recent dataset of challenging medieval manuscripts and surpassed state-of-the-art results by reducing the error by 80.7%. Furthermore, we demonstrate the effectiveness of our approach on various other datasets written in different scripts. Hence, our contribution is two-fold. First, we demonstrate that semantic pixel segmentation can be used as strong denoising pre-processing step before performing text line extraction. Second, we introduce a novel, simple and robust algorithm that leverages the highquality semantic segmentation to achieve a text-line extraction performance of 99.42% line IU on a challenging dataset.
Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of humanannotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pretraining for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval.
In this paper, we present a novel approach for initializing deep neural networks, i.e., by turning Principal Component Analysis (PCA) into neural layers. Usually, the initialization of the weights of a deep neural network is done in one of the three following ways: 1) with random values, 2) layer-wise, usually as Deep Belief Network or as auto-encoder, and 3) re-use of layers from another network (transfer learning). Therefore, typically, many training epochs are needed before meaningful weights are learned, or a rather similar dataset is required for seeding a fine-tuning of transfer learning. In this paper, we describe how to turn a PCA into an auto-encoder, by generating an encoder layer of the PCA parameters and furthermore adding a decoding layer. We analyze the initialization technique on real documents. First, we show that a PCA-based initialization is quick and leads to a very stable initialization. Furthermore, for the task of layout analysis we investigate the effectiveness of PCAbased initialization and show that it outperforms state-of-the-art random weight initialization methods.
e point of this paper is to question typical assumptions in deep learning and suggest alternatives. A particular contribution is to prove that even if a Stacked Convolutional Auto-Encoder (SCAE) is good at reconstructing pictures, it is not necessarily good at discriminating their classes. When using Auto-Encoders, intuitively one assumes that features which are good for reconstruction will also lead to high classi cation accuracy. Indeed, it became research practice and is a suggested strategy by introductory books. However, we prove that this is not always the case. We thoroughly investigate the quality of features produced by SCAEs when trained to reconstruct their input. In particular, we analyze the relation between the reconstruction and classi cation capabilities of the network, if we were to use the same features for both tasks. Experimental results suggest that in fact, there is no correlation between the reconstruction score and the quality of features for a classi cation task. is means, more formally, that the sub-dimension representation space learned from the SCAE (while being trained for input reconstruction) is not necessarily be er separable than the initial input space. Furthermore, we show that the reconstruction error is not a good metric to assess the quality of features, because it is biased by the decoder quality. We do not question the usefulness of pre-training, but we conclude that aiming for the lowest reconstruction error is not necessarily a good idea if a erwards one performs a classi cation task.
We study bisimulation and context equivalence in a probabilistic lambda-calculus. The contributions of this paper are threefold. Firstly we show a technique for proving congruence of probabilistic applicative bisimilarity. While the technique follows Howe's method, some of the technicalities are quite different, relying on non-trivial "disentangling" properties for sets of real numbers. Secondly we show that, while bisimilarity is in general strictly finer than context equivalence, coincidence between the two relations is attained on pure lambda-terms. The resulting equality is that induced by Levy-Longo trees, generally accepted as the finest extensional equivalence on pure lambda-terms under a lazy regime. Finally, we derive a coinductive characterisation of context equivalence on the whole probabilistic language, via an extension in which terms akin to distributions may appear in redex position. Another motivation for the extension is that its operational semantics allows us to experiment with a different congruence technique, namely that of logical bisimilarity.
We introduce DeepDIVA: an infrastructure designed to enable quick and intuitive setup of reproducible experiments with a large range of useful analysis functionality. Reproducing scientific results can be a frustrating experience, not only in document image analysis but in machine learning in general. Using DeepDIVA a researcher can either reproduce a given experiment with a very limited amount of information or share their own experiments with others. Moreover, the framework offers a large range of functions, such as boilerplate code, keeping track of experiments, hyper-parameter optimization, and visualization of data and results. To demonstrate the effectiveness of this framework, this paper presents case studies in the area of handwritten document analysis where researchers benefit from the integrated functionality. DeepDIVA is implemented in Python and uses the deep learning framework PyTorch. It is completely open source 1 , and accessible as Web Service through DIVAServices 2 .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.