Spammers use a wide range of content generation techniques with low quality pages known as content spam to achieve their goals. We argue that content spam must be tackled using a wide range of content quality features. In this paper, we propose novel sentence-level diversity features based on the probabilistic topic model. We combine them with other content features to build a content spam classifier. Our experiments show that our method outperforms the conventional methods.
Database management systems for multimedia data retrieval are becoming more important, as digital videos and cameras increase in popularity.An important feature of multimedia data retrieval is that users rarely specify their first queries exactly, and must clarify what they want by browsing the query results, refining their query by trial and error. It is therefore desirable for a multimedia database management system (DBMS) to develop a rough query result quickly and refine it over time. This paper describes a hierarchical space model for the multimedia data retrieval that is similar to that of the human memory hierarchy. The aim of the hierarchical space model is to improve the similarity retrieval's performance with little loss in query result quality. We implemented the hierarchical space model on the ORDBMS LiteObject and applied it to an image retrieval application. The results of this test proved the efficiency of our hierarchical space model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.