Sangwon Lee scite author profile

Sangwon Lee

2Publications

13Citation Statements Received

61Citation Statements Given

How they've been cited

How they cite others

Affiliations

Inha University

Publications

Order By: Most citations

TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation

et al. 2022

View full text Add to dashboard Cite

A sentence embedding vector can be obtained by connecting a global average pooling (GAP) to a pre-trained language model. The problem of such a sentence embedding vector using a GAP is that it is generated with the same weight for all words appearing in the sentence. We propose a novel sentence embedding-method-based model Token Attention-SentenceBERT (TA-SBERT) to address this problem. The rationale of TA-SBERT is to enhance the performance of sentence embedding by introducing three strategies. First, we convert the base form while preprocessing the input sentence to reduce misunderstanding. Second, we propose a novel Token Attention (TA) technique that distinguishes important words to produce more informative sentence vectors. Third, we increase stability of fine-tuning to avoid catastrophic forgetting by adding a reconstruction loss to the word embedding vector. Extensive ablation studies demonstrate that our TA-SBERT outperforms the original SentenceBERT (SBERT) in the sentence vector evaluation using semantic textual similarity (STS) tasks and the SentEval toolkit.

show abstract

Iterative Translation-Based Data Augmentation Method for Text Classification Tasks

Lee¹,

Ling

Choi

2021

IEEE Access

View full text Add to dashboard Cite

A well-known limitation of existing rule-based text augmentation is that it cannot be applied to other languages because it depends on grammatical and structural characteristics. Moreover, most text Generative Adversarial Networks (GAN) are unstable in training due to inefficient generator optimization and rely on maximum likelihood pre-training. This paper addresses the above problems by proposing a novel augmentation method with a Sentence Generator (SG) and Sentence Discriminator (SD) for Iterative Translation-based Data Augmentation (ITDA). This paper makes three original contributions. First, the ITDA SG is designed to provide universal multiple-language support by generating comprehensive augmented sentences through serial and parallel iterations of an existing translator, such as Google Translate. Second, given that the quality of the generated sentences varies depending on the translation combination or the type of sentence, the ITDA addresses this issue using a discriminator to achieve sentence augmentation, which can select high-quality augmented data using a text classifier. Third, the ITDA can perform sentence augmentation for 109 different languages using discriminators based on text classifiers trained for a specific language or type of data set. Extensive experiments are conducted to evaluate the efficacy of the ITDA using a Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), CNN-BiLSTM, and self-attention. The results demonstrate that when the ITDA is applied to 480 sentence classification tasks, the average accuracy increases by 4.24%.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sangwon Lee

TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation

Iterative Translation-Based Data Augmentation Method for Text Classification Tasks

Contact Info

Product

Resources

About