the document classification is one of the classical task of information retrieval and it has involved numerous studies. In this paper, we are presenting a learning model for XML document classification based on Bayesian networks. This latter is a probabilistical reasoning formalism. It permits to represent depending relationships between the random variables in order to describe a problem or a phenomenon. In this article, we are proposing a model which simplifies the arborescent representation of the XML document that we have, named coupled model and we will see that this approach improves the response time and keeps the same performances of the classification.
International audience
In this paper, we are presenting a learning model for XML document classification based on Bayesian networks. Then, we are proposing a model which simplifies the arborescent representation of the XML document that we have, named coupled model and we will see that this approach improves the response time and keeps the same performances of the classification. Then, we will study an extension of this generative model to the discriminating model thanks to the formalism of the Fisher’s kernel. At last, we have applied a ponderation of the structure components of the Fisher’s vector. We finish by presenting the obtained results on the XML collection by using the CBS and SVM methods
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.