We propose a new statistical model for the classification of structured documents and consider its use for multimedia document classification. Its main originality is its ability to simultaneously take into account the structural and the content information present in a structured document, and also to cope with different types of content (text, image, etc). We present experiments on the classification of multilingual pornographic HTML pages using text and image data. The system accurately classifies porn sites from 8 European languages. This corpus has been developed by EADS company in the context of a large Web site filtering application.
This paper describes a work currently in progress whose aim is to design. develop and evaluate a Multi-Agent Framework for Data Fusion (DFMAF). This is being done with the support of a battlefield surveillance demonstrator application, named TA-b [3.61. Through the following chapters. we will describe the benefits of using such a framework for data fusion problems. Firstly. we will briefly present the multi-agent research domain. Then, we will go into further details to describe DFMAF. the multi-agent framework designed to help solving data fusion problems. The appropriateness of DFMAF to data fusion problems will also he pointed out. Next, the implementation and use of DFMAF in the support application will he detailed as well as the assessment procedure followed. Finally, we will conclude and expose the future work which will he done.
Purpose
Most of the existing literature on online social networks (OSNs) either focuses on community detection in graphs without considering the topic of the messages exchanged, or concentrates exclusively on the messages without taking into account the social links. The purpose of this paper is to characterise the semantic cohesion of such groups through the introduction of new measures.
Design/methodology/approach
A theoretical model for social links and salient topics on Twitter is proposed. Also, measures to evaluate the topical cohesiveness of a group are introduced. Inspired from precision and recall, the proposed measures, called expertise and representativeness, assess how a set of groups match the topic distribution. An adapted measure is also introduced when a topic similarity can be computed. Finally, a topic relevance measure is defined, similar to tf.idf (term-frequency, inverse document frequency).
Findings
The measures yield interesting results, notably on a large tweet corpus: the metrics accurately describe the topics discussed in the tweets and enable to identify topic-focused groups. Combined with topological measures, they provide a global and concise view of the detected groups.
Originality/value
Many algorithms, applied on OSN, detect communities which often lack of meaning and internal semantic cohesion. This paper is among the first to quantify this aspect, and more precisely the topical cohesion and topical relevance of a group. Moreover, the proposed indicators can be exploited for social media monitoring, to investigate the impact of a group of people: for instance, they could be used for journalism, marketing and security purposes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.