Nowadays, there is a growing interest in machine learning and pattern recognition for tree-structured data. Trees actually provide a suitable structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, computer music, or conversion of semi-structured data (e.g. XML documents). Many applications in these domains require the calculation of similarities over pairs of trees. In this context, the tree edit distance (ED) has been subject of investigations for many years in order to improve its computational efficiency. However, used in its classical form, the tree ED needs a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, to overcome this drawback, we focus on the automatic learning of a non parametric stochastic tree ED. More precisely, we are interested in two kinds of probabilistic approaches. The first one builds a generative model of the tree ED from a joint distribution over the edit operations, while the second works from a conditional distribution providing then a discriminative model. To tackle these tasks, we present an adaptation of the Expectation-Maximization algorithm for learning these distributions over the primitive edit costs. Two experiments are conducted. The first is achieved on artificial data and confirms the interest to learn a tree ED rather than a priori imposing edit costs; The second is applied to a pattern recognition task aiming to classify handwritten digits.⋆ This work is part of the ongoing ARA Marmota research project.Email addresses: marc.bernard@univ-st-etienne.fr (Marc Bernard), laurent.boyer@univ-st-etienne.fr (Laurent Boyer), amaury.habrard@lif.univ-mrs.fr (Amaury Habrard), marc.sebban@univ-st-etienne.fr (Marc Sebban).
Preprint submitted to Elsevier 30 October 2007A c c e p t e d m a n u s c r i p t
This paper, based on a cross-sectional empirical study of information system (IS) architectures within 143 small to medium enterprises (SMEs) in France, reports findings on how SMEs architect to achieve IS integration and interoperability. This research provides an empirically derived taxonomy of enterprise architectural variants of the types often described in the literature for large firms. This study finds indications that for SMEs the immediate goal of interoperability prevailed over fuller and more formal system integration. The most common means for approaching enterprise architecture and any form of integration is via the construction of software bridges and interfaces. Partially standardized architectures based on Enterprise Systems (ERP) are the next most common type. Hybrid architectures -mixed Enterprise Applications Integration and ERP -are the third most common. The contribution of this paper lies not in the identification of the three types but resides (1) in the description of their distribution in SMEs; (2) in the absence of other integration/interoperability types in this population; and (3) most importantly in the interpretation of the organizational and historical rationale explaining the emergence of these types in this organizational context.
Abstract. Trees provide a suited structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, or conversion of tree structured documents. In this context, many applications require the calculation of similarities between tree pairs. The most studied distance is likely the tree edit distance (ED) for which improvements in terms of complexity have been achieved during the last decade. However, this classic ED usually uses a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, we focus on the learning of a stochastic tree ED. We use an adaptation of the ExpectationMaximization algorithm for learning the primitive edit costs. We carried out series of experiments that confirm the interest to learn a tree ED rather than a priori imposing edit costs.
International audienceThe goal of passive source localization is to acoustically detect objects producing noises by multiple sensors (e.g. microphones, hydrophones) and to estimate their position using only the sound information. While within the last four decades a lot of work was carried out on how to best measure the time delay of arrivals (TDOAs) and on finding an optimal location estimator, relatively little work can be found on how to best place the sensors. However, the performance of such estimators is strongly correlated to the sensor configuration. Therefore, we propose a procedure for an optimal sensor setup minimizing the condition numbers of an analytic linear least-squares (LLS) estimator and an iterative, linearized model (LM) estimator. An advantage of using the condition number as the cost function is that, unlike the Cramer Rao Lower Bound, it defines an upper bound for the estimation error. Further, no assumptions about the disturbance noise need to be made and a robust sensor configuration will be found, which is invariant to rotation and dilatation. The two condition numbers of the presented passive source localization algorithms are independent of the number of sensors. However, it will be shown, that the estimation error decreases proportionally to the inverse of the square-root of the number of sensors. Some analytical forms of optimal sensor configurations will be derived, which attain the global minimum of the condition number of the LLS estimator or which minimize the condition number of the LM estimator. Further, a sensor geometry using a minimum number of sensors is derived, which forces the condition numbers of both estimators equal to one. The interest of such a setup lies in a possible combination of both estimators. The LM estimator might then be initialized by the position estimate found by the LLS estimator. A variety of alternative estimators are closely related to the LLS estimator. Their performances will be compared, and it will be shown, that the o- - ptimal sensor geometry specially derived for the LLS estimator also increases their accuracies
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.