TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation

Seo, Jae-Jin; Lee, Sangwon; Ling, Liu; Choi, Wonik

doi:10.1109/access.2022.3164769

Cited by 10 publications

(8 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Subsequently, the recommended package source files and past related bugs to the specific package will be input into this task. Moreover, a text similarity technique which is sentence BERT [22] will take the new bug report structured text to get the most similar source code file and if attached with old bugs. The output of the source code recommendation phase will be a list of source code files sorted in descending order according to their similarity to new bug report that needs to be fixed.…”

Section: Source Code Recommendation Phasementioning

confidence: 99%

“…The text similarity is applied between the new bug report text and all files related to the software package recommended by the previous phase. According to [22] sentence BERT is applied to fine-tune BERT architecture for semantic similarity. As in Figure 5, the input from the source code files and related data from the software package will be entered into the pre-trained BERT.…”

Section: Source Code Recommendation Phasementioning

confidence: 99%

See 1 more Smart Citation

Enhancing Bug Localization Using Phase-Based Approach

et al. 2023

View full text Add to dashboard Cite

Software bug localization is an important step in the software maintenance process. Automatic bug localization can reduce the time consumed in the process of localization. Some techniques are applied in the bug localization process, but those techniques suffer from limitations in time and accuracy. This paper proposes a phase-based bug localization approach to overcome these limitations. The approach is composed of three main phases which are raw data preparation, package classification, and source code recommendation. The main input to our approach is a bug report and the source code of the past versions for the target system of interest. From the bug report, various information is utilized: the summary, the description, the stack traces, and the fixed source code files. The raw data preparation phase is used to restructure those inputs. The package classification phase aims to locate the package that would include the source code to be modified as a first step, hence reducing the time needed to locate the source code file due to the lexical mismatch between those files and the bug report data. Bidirectional Encoder Representations from Transformers (BERT), which is a sentence embedding technique, is utilized in the package classification and source code recommendation phases. The experimental results show that our approach outperforms existing approaches according to TOP-N rank and Mean Reciprocal Rank (MRR) evaluation metrics.

show abstract

Section: Source Code Recommendation Phasementioning

confidence: 99%

Section: Source Code Recommendation Phasementioning

confidence: 99%

Enhancing Bug Localization Using Phase-Based Approach

et al. 2023

View full text Add to dashboard Cite

show abstract

“…The learning process could be completed by using an untagged corpus. To balance the role of different words, Seo et al [24] divided sentences into the form of Token, and combined the attention mechanism and sentence BERT, aiming to assign corresponding weights to different words through the attention mechanism to highlight the role of keywords. This improves the accuracy of subsequent tasks, but in the sentence decomposition and representation stage, it is easy to ignore the internal structure information of words, which weakens the representation effect of global semantics.…”

Section: Related Workmentioning

confidence: 99%

CharAs-CBert: Character Assist Construction-Bert Sentence Representation Improving Sentiment Classification

Chen

Peng

Song

2022

Sensors

View full text Add to dashboard Cite

In the process of semantic capture, traditional sentence representation methods tend to lose a lot of global and contextual semantics and ignore the internal structure information of words in sentences. To address these limitations, we propose a sentence representation method for character-assisted construction-Bert (CharAs-CBert) to improve the accuracy of sentiment text classification. First, based on the construction, a more effective construction vector is generated to distinguish the basic morphology of the sentence and reduce the ambiguity of the same word in different sentences. At the same time, it aims to strengthen the representation of salient words and effectively capture contextual semantics. Second, character feature vectors are introduced to explore the internal structure information of sentences and improve the representation ability of local and global semantics. Then, to make the sentence representation have better stability and robustness, character information, word information, and construction vectors are combined and used together for sentence representation. Finally, the evaluation and verification are carried out on various open-source baseline data such as ACL-14 and SemEval 2014 to demonstrate the validity and reliability of sentence representation, namely, the F1 and ACC are 87.54% and 92.88% on ACL14, respectively.

show abstract

“…To address these limitations, Shi [23] considered the feature capture ability of Transformer and used Transformer in sentence representation tasks to learn the generality of sentences. Seo et al [11] combined the attention mechanism with the language model Bert to improve the limitations of the pre-trained language model in the sentence representation task, that is, when the language model is used for sentence representation, it only generates sentence representations by obtaining word weights for Bert, which is easy Ignore the global and contextual context of the sentence. Kim et al [24] propose a contrastive learning method that utilizes self-guidance to improve the quality of Bert sentence representations by fine-tuning Bert in a self-supervised manner that improves sentence representations without relying on additional processing such as data augmentation.…”

Section: Relate Workmentioning

confidence: 99%

“…Although these methods improve the accuracy of sentence representation when the sentence is complex, it is prone to semantic ambiguity and ignores the context of the word, that is when representing a sentence, the role of the same word in different sentences, and the semantics of the expression may be different. The traditional sentence embedding methods used in SBert [10] and TA-SBert [11], given that all words appearing in a sentence have the same weight, do not take these limitations into account, and at the same time, ignore the correlation between these feature information. Therefore, to address this limitation, we combine sentence constructions and propose a Multi-directional Attention Interaction Construction-Bert Sentence Representation Framework (MAI-CBert), which pays attention to salient words through multiple directions and assigns more Effective weights to produce sentence vectors with rich details.…”

Section: Introductionmentioning

confidence: 99%

MAI-CBert: Multidirectional Attention Interaction Construction-Bert Sentiment Sentence Representation

2022

View full text Add to dashboard Cite

Considering that when the traditional network structure is connected with the pre-trained language model for sentence embedding representation, it relies on simple word weights to generate sentence vectors, which easily ignores the global and contextual semantic details of sentences, reducing the accuracy of sentence representation. To address these issues, we develop a Multidirectional Attention Interaction Construction-Bert Sentence Representation Framework (MAI-CBert). First, the model uses deep attention to transform the input emotional sentences to reduce misunderstanding caused by the unbalanced distribution of word weights in the sentences; at the same time, it applies horizontal and vertical attention to emotional constructions, which are generated from two different directions. Construction vectors, aim to focus on words with salient features in sentences and establish close dependencies. Second, the dynamic interaction strategy is adopted to realize the interaction between attention in different directions, so that the information flow forms a complementary relationship, resulting in more effective sentence vectors. Notably, the loss function is refactored to improve the representation robustness of the model to avoid catastrophic forgetting. Experimental results on the SemEval-14 and ACL-14 baseline datasets demonstrate that the MAI-CBert sentence representation framework is robust and competitive.

show abstract

TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation

Cited by 10 publications

References 37 publications

Enhancing Bug Localization Using Phase-Based Approach

Enhancing Bug Localization Using Phase-Based Approach

CharAs-CBert: Character Assist Construction-Bert Sentence Representation Improving Sentiment Classification

MAI-CBert: Multidirectional Attention Interaction Construction-Bert Sentiment Sentence Representation

Contact Info

Product

Resources

About