Cam-Tu Nguyen scite author profile

This paper introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information retrieval (IR). The underlying idea of the framework is that common hidden topics discovered from large external data sets (universal data sets), when included, can make short documents less sparse and more topic-oriented. Furthermore, hidden topics from universal data sets help handle unseen data better. The proposed framework can also be applied for different natural languages and data domains. We carefully evaluated the framework by carrying out two experiments for two important online applications (Web search result classification and matching/ranking for contextual advertising) with large-scale universal data sets and we achieved significant results.

show abstract

Dave the debater: a retrieval-based and generative argumentative dialogue agent

Le¹,

Nguyen²,

Nguyen³

2018

View full text Add to dashboard Cite

show abstract

Web Search Clustering and Labeling with Hidden Topics

Nguyen

Phan

Horiguchi

et al. 2009

ACM Transactions on Asian Language Information Processing

View full text Add to dashboard Cite

Web search clustering is a solution to reorganize search results (also called "snippets") in a more convenient way for browsing. There are three key requirements for such post-retrieval clustering systems: (1) the clustering algorithm should group similar documents together; (2) clusters should be labeled with descriptive phrases; and (3) the clustering system should provide high-quality clustering without downloading the whole Web page.This article introduces a novel framework for clustering Web search results in Vietnamese which targets the three above issues. The main motivation is that by enriching short snippets with hidden topics from huge resources of documents on the Internet, it is able to cluster and label such snippets effectively in a topic-oriented manner without concerning whole Web pages. Our approach is based on recent successful topic analysis models, such as Probabilistic-Latent Semantic Analysis, or Latent Dirichlet Allocation. The underlying idea of the framework is that we collect a very large external data collection called "universal dataset," and then build a clustering system on both the original snippets and a rich set of hidden topics discovered from the universal data collection. This can be seen as a richer representation of snippets to be clustered. We carry out careful evaluation of our method and show that our method can yield impressive clustering quality.

show abstract

SilentTalk: Lip reading through ultrasonic sensing on mobile phones

Tan

Nguyen

Wang

2017

View full text Add to dashboard Cite

A feature-word-topic model for image annotation

Nguyen

Kaothanthong

Phan

et al. 2010

View full text Add to dashboard Cite

SilentKey

Tan

Wang

Nguyen

et al. 2018

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.

View full text Add to dashboard Cite

This paper presents SilentKey, a new authentication framework to identify mobile device users through ultrasonic-based lip reading. The main idea is to generate ultrasonic signals from a mobile phone and analyze the fine-grained impact of mouth motions on the reflected signal. The new framework is effective since people have unique characteristics when performing mouth motions, which represent not only what people input, but also how they input. SilentKey is robust against attacks since the input cannot be recorded or imitated. We implement a prototype and demonstrate the effectiveness of the system by fifty volunteers. Such a non-intrusive identification mechanism provides a natural user interface which can also be applied by people with speaking or viewing difficulties.

show abstract

A feature-word-topic model for image annotation and retrieval

et al. 2013

View full text Add to dashboard Cite

Image annotation is a process of finding appropriate semantic labels for images in order to obtain a more convenient way for indexing and searching images on the Web. This article proposes a novel method for image annotation based on combining feature-word distributions, which map from visual space to word space, and word-topic distributions, which form a structure to capture label relationships for annotation. We refer to this type of model as Feature-Word-Topic models. The introduction of topics allows us to efficiently take word associations, such as {ocean, fish, coral} or {desert, sand, cactus}, into account for image annotation. Unlike previous topic-based methods, we do not consider topics as joint distributions of words and visual features, but as distributions of words only. Feature-word distributions are utilized to define weights in computation of topic distributions for annotation. By doing so, topic models in text mining can be applied directly in our method. Our Feature-word-topic model, which exploits Gaussian Mixtures for feature-word distributions, and probabilistic Latent Semantic Analysis (pLSA) for word-topic distributions, shows that our method is able to obtain promising results in image annotation and retrieval.

show abstract

An Efficient Walking Safety Service for Distracted Mobile Users

Tang¹,

Nguyen

Wang

et al. 2016

View full text Add to dashboard Cite

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Cam-Tu Nguyen

A Hidden Topic-Based Framework toward Building Applications with Short Web Documents

Dave the debater: a retrieval-based and generative argumentative dialogue agent

Web Search Clustering and Labeling with Hidden Topics

SilentTalk: Lip reading through ultrasonic sensing on mobile phones

A feature-word-topic model for image annotation

SilentKey

A feature-word-topic model for image annotation and retrieval

An Efficient Walking Safety Service for Distracted Mobile Users

Contact Info

Product

Resources

About