Kyeongpil Kang scite author profile

Being a prevalent form of social communications on the Internet, billions of short texts are generated everyday. Discovering knowledge from them has gained a lot of interest from both industry and academia. The short texts have a limited contextual information, and they are sparse, noisy and ambiguous, and hence, automatically learning topics from them remains an important challenge. To tackle this problem, in this paper, we propose a semantics-assisted non-negative matrix factorization (SeaNMF) model to discover topics for the short texts. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of the corpus. The SeaNMF model is solved using a block coordinate descent algorithm. We also develop a sparse variant of the SeaNMF model which can achieve a better model interpretability. Extensive quantitative evaluations on various realworld short text datasets demonstrate the superior performance of the proposed models over several other state-of-the-art methods in terms of topic coherence and classification accuracy. The qualitative semantic analysis demonstrates the interpretability of our models by discovering meaningful and consistent topics. With a simple formulation and the superior performance, SeaNMF can be an effective standard topic model for short texts.

show abstract

TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections

Kim¹,

Kang²,

Park

et al. 2017

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.

show abstract

Whose Opinion Matters? Analyzing Relationships Between Bitcoin Prices and User Groups in Online Community

Kang

Choo

Kim

2019

Social Science Computer Review

View full text Add to dashboard Cite

Public interest in cryptocurrencies has consistently risen over the past decade. Owing to this rapid growth, cryptocurrency-related information is being increasingly shared online. As considerable portions of such information in online communities are noise, extracting meaningful information is important. Therefore, judging whose opinion should be considered more important or who the opinion leaders in online communities are is critical. This study analyzed the topics that contain meaningful information, in particular, user groups, by investigating the correlation between topic weights and their price change. The proposed analysis method involves (1) effective classification of the user groups using a hypertext-induced topic selection algorithm, (2) textual information analysis through topic modeling, and (3) the identification of user groups that have a high interest in the Bitcoin price by measuring the correlation between the price and the topics and by measuring the topic similarities between each user group and all users to determine the user group that can effectively represent the entire community. By analyzing the information shared by users, we observed that most users are interested in the price information, whereas users having social influence are not only interested in the price but also in other information.

show abstract

System Architecture and Software Stack for GDDR6-AiM

Kwon¹,

Vladimir²,

Kim³

et al. 2022

View full text Add to dashboard Cite

Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation

Kang¹,

Jin²,

Yang³

et al. 2021

View full text Add to dashboard Cite

Understanding voluminous historical records provides clues on the past in various aspects, such as social and political issues and even natural science facts. However, it is generally difficult to fully utilize the historical records, since most of the documents are not written in a modern language and part of the contents are damaged over time. As a result, restoring the damaged or unrecognizable parts as well as translating the records into modern languages are crucial tasks. In response, we present a multi-task learning approach to restore and translate historical documents based on a selfattention mechanism, specifically utilizing two Korean historical records, ones of the most voluminous historical records in the world. Experimental results show that our approach significantly improves the accuracy of the translation task than baselines without multi-task learning. In addition, we present an in-depth exploratory analysis on our translated results via topic modeling, uncovering several significant historical events.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kyeongpil Kang

Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations

TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections

Whose Opinion Matters? Analyzing Relationships Between Bitcoin Prices and User Groups in Online Community

System Architecture and Software Stack for GDDR6-AiM

Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation

Contact Info

Product

Resources

About