Jinsong Lu scite author profile

The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes (within a short Hamming distance). Although some recently proposed techniques are able to generate high-quality codes for documents known in advance, obtaining the codes for previously unseen documents remains to be a very challenging problem. In this paper, we emphasise this issue and propose a novel SelfTaught Hashing (STH) approach to semantic hashing: we first find the optimal l-bit binary codes for all documents in the given corpus via unsupervised learning, and then train l classifiers via supervised learning to predict the l-bit code for any query document unseen before. Our experiments on three real-world text datasets show that the proposed approach using binarised Laplacian Eigenmap (LapEig) and linear Support Vector Machine (SVM) outperforms stateof-the-art techniques significantly.

show abstract

Laplacian Co-hashing of Terms and Documents

Zhang¹,

Wang²,

Lu³

2010

View full text Add to dashboard Cite

Abstract.A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes within a short Hamming distance. In this paper, we introduce the novel problem of co-hashing where both documents and terms are hashed simultaneously according to their semantic similarities. Furthermore, we propose a novel algorithm Laplacian Co-Hashing (LCH) to solve this problem which directly optimises the Hamming distance.

show abstract

Robust Automatic Segmentation of Cell Nucleus Using Multi-scale Space Level Set Method

Duan

Bao

et al.

View full text Add to dashboard Cite

Abstract. In this paper, we propose a novel scheme for cell nucleus segmentation which is multi-scale space level set method. Under this scheme, all nuclei of interest in a microscopic image can be segmented simultaneously. The procedure includes three stages. Firstly, the mathematical morphology method is used to search seed points to localize interested nuclei. Secondly, based on the distribution of these seed points, a level set function is initialized. Finally, the level set function evolves and eventually stops zero level set contours at the boundaries of nuclei labeled by seed points. The evolution in the last stage is a three phase evolution. In each phase, information of different scale spaces is employed. This method was tested by truthful microscope images of lymphocyte, which proved its robustness and efficiency.

show abstract

Time-Sensitive Language Modelling for Online Term Recurrence Prediction

Zhang

Mao

et al. 2009

View full text Add to dashboard Cite

Abstract.We address the problem of online term recurrence prediction: for a stream of terms, at each time point predict what term is going to recur next in the stream given the term occurrence history so far. It has many applications, for example, in Web search and social tagging. In this paper, we propose a time-sensitive language modelling approach to this problem that effectively combines term frequency and term recency information, and describe how this approach can be implemented efficiently by an online learning algorithm. Our experiments on a real-world Web query log dataset show significant improvements over standard language modelling.

show abstract

Batch-Mode Computational Advertising Based on Modern Portfolio Theory

Zhang

2009

View full text Add to dashboard Cite

Abstract. The research on computational advertising so far has focused on finding the single best ad. However, in many real situations, more than one ad can be presented. Although it is possible to address this problem myopically by using a single-ad optimisation technique in serial-mode, i.e., one at a time, this approach can be ineffective and inefficient because it ignores the correlation between ads. In this paper, we make a leap forward to address the problem of finding the best ads in batch-mode, i.e., assembling the optimal set of ads to be presented altogether. The key idea is to achieve maximum revenue while controlling the level of risk by diversifying the set of ads. We show how the Modern Portfolio Theory can be applied to this problem to provide elegant solutions and deep insights.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jinsong Lu

Self-taught hashing for fast similarity search

Laplacian Co-hashing of Terms and Documents

Robust Automatic Segmentation of Cell Nucleus Using Multi-scale Space Level Set Method

Time-Sensitive Language Modelling for Online Term Recurrence Prediction

Batch-Mode Computational Advertising Based on Modern Portfolio Theory

Contact Info

Product

Resources

About