The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel SelfTaught Hashing (STH) approach to semantic hashing: we first find the optimal l-bit binary codes for all documents in the given corpus via unsupervised learning, and then train l classifiers via supervised learning to predict the l-bit code for any query document unseen before. Our experiments on three real-world text datasets show that the proposed approach using binarised Laplacian Eigenmap (LapEig) and linear Support Vector Machine (SVM) outperforms stateof-the-art techniques significantly.
Abstract.A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes within a short Hamming distance. In this paper, we introduce the novel problem of co-hashing where both documents and terms are hashed simultaneously according to their semantic similarities. Furthermore, we propose a novel algorithm Laplacian Co-Hashing (LCH) to solve this problem which directly optimises the Hamming distance.
Abstract. In this paper, we propose a novel scheme for cell nucleus segmentation which is multi-scale space level set method. Under this scheme, all nuclei of interest in a microscopic image can be segmented simultaneously. The procedure includes three stages. Firstly, the mathematical morphology method is used to search seed points to localize interested nuclei. Secondly, based on the distribution of these seed points, a level set function is initialized. Finally, the level set function evolves and eventually stops zero level set contours at the boundaries of nuclei labeled by seed points. The evolution in the last stage is a three phase evolution. In each phase, information of different scale spaces is employed. This method was tested by truthful microscope images of lymphocyte, which proved its robustness and efficiency.
Abstract.We address the problem of online term recurrence prediction: for a stream of terms, at each time point predict what term is going to recur next in the stream given the term occurrence history so far. It has many applications, for example, in Web search and social tagging. In this paper, we propose a time-sensitive language modelling approach to this problem that effectively combines term frequency and term recency information, and describe how this approach can be implemented efficiently by an online learning algorithm. Our experiments on a real-world Web query log dataset show significant improvements over standard language modelling.
Abstract. The research on computational advertising so far has focused on finding the single best ad. However, in many real situations, more than one ad can be presented. Although it is possible to address this problem myopically by using a single-ad optimisation technique in serial-mode, i.e., one at a time, this approach can be ineffective and inefficient because it ignores the correlation between ads. In this paper, we make a leap forward to address the problem of finding the best ads in batch-mode, i.e., assembling the optimal set of ads to be presented altogether. The key idea is to achieve maximum revenue while controlling the level of risk by diversifying the set of ads. We show how the Modern Portfolio Theory can be applied to this problem to provide elegant solutions and deep insights.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.