The recently initiated and rapidly developing research field of 'human interactive proofs' (HIPs) and its implications for the document image analysis (DIA) research field are described. Over the last five years, efforts to defend Web services against abuse by programs ('bots') have led to a new family of security protocols able to distinguish between human and machine users. AltaVista pioneered this technology in 1997 [Bro01, LBBB01]. By the summer of 2000, Yahoo! and PayPal were using similar methods. In the Fall of 2000, Prof. Manuel Blum of Carnegie-Mellon University and his team, stimulated by Udi Manber of Yahoo!, were studying these and related problems [BAL00]. Soon thereafter a collaboration between the University of California at Berkeley and the Palo Alto Research Center (PARC) built a tool based on systematically generated image degradations [CBF01]. In January 2002, Prof. Blum and the present authors ran the first workshop (at PARC) on HIPs, defined broadly as a class of challenge/response protocols which allow a human to authenticate herself as a member of a given group-e.g. human (vs. machine), herself (vs. anyone else), an adult (vs. a child), etc. All commercial uses of HIPs known to us exploit the gap in ability between human and machine vision systems in reading images of machine printed text. Many technical issues that have been systematically studied by the DIA community are relevant to the HIP research program. This paper describes the evolution of HIP R&D, applications of HIPs now and on the horizon, highlights of the first HIP workshop, and proposals for a DIA research agenda to advance the state of the art of HIPs.
We develop, analyze, and apply a specific form of mixture modeling for density estimation within the context of image and texture processing. The technique captures much of the higher order, nonlinear statistical relationships present among vector elements by combining aspects of kernel estimation and cluster analysis. Experimental results are presented in the following applications: image restoration, image and texture compression, and texture classification.
Documents are commonly categorized into hierarchies of topics, such as the ones maintained by Yahoo! and the Open Directory project, in order to facilitate browsing and other interactive forms of information retrieval. In addition, topic hierarchies can be utilized to overcome the sparseness problem in text categorization with a large number of categories, which is the main focus of this paper. This paper presents a hierarchical mixture model which extends the standard naive Bayes classifier and previous hierarchical approaches. Improved estimates of the term distributions are made by differentiation of words in the hierarchy according to their level of generality/specificity. Experiments on the Newsgroups and the Reuters-21578 dataset indicate improved performance of the proposed classifier in comparison to other state-of-the-art methods on datasets with a small number of positive examples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.