Traceability Support for Multi-Lingual Software Projects

Liu, Yalin; Lin, Jinfeng; Cleland‐Huang, Jane

doi:10.1145/3379597.3387440

Cited by 9 publications

(15 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One risk of mining links from commit message is that the link set may 2OSS dataset https://zenodo.org/record/4511291#.YB3tjyj0mbg be incomplete. Liu et al partially addressed this problem by pruning the dataset and only retaining artifacts appearing in links set [38]. We adopted this process to construct our dataset and report results in Table I TABLE I: The size of software project leveraged in traceability experiment.…”

Section: A Data Collectionmentioning

confidence: 99%

See 1 more Smart Citation

Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models

Lin

Liu

Zeng

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Self Cite

View full text Add to dashboard Cite

Software traceability establishes and leverages associations between diverse development artifacts. Researchers have proposed the use of deep learning trace models to link natural language artifacts, such as requirements and issue descriptions, to source code; however, their effectiveness has been restricted by availability of labeled data and efficiency at runtime. In this study, we propose a novel framework called TVace BERT (T-BERT) to generate trace links between source code and natural language artifacts. To address data sparsity, we leverage a three-step training strategy to enable trace models to transfer knowledge from a closely related Software Engineering challenge, which has a rich dataset, to produce trace links with much higher accuracy than has previously been achieved. We then apply the T-BERT framework to recover links between issues and commits in Open Source Projects. We comparatively evaluated accuracy and efficiency of three BERT architectures. Results show that a Single-BERT architecture generated the most accurate links, while a Siamese-BERT architecture produced comparable results with significantly less execution time. Furthermore, by learning and transferring knowledge, all three models in the framework outperform classical IR trace models. On the three evaluated real-word OSS projects, the best T-BERT stably outperformed the VSM model with average improvements of 60.31% measured using Mean Average Precision (MAP). RNN severely underperformed on these projects due to insufficient training data, while T-BERT overcame this problem by using pretrained language models and transfer learning.

show abstract

Section: A Data Collectionmentioning

confidence: 99%

“…We alleviate the impact of this phenomena by adopting the data processing suggested by Liu et.al. [38]. Another important threat is that while the SINGLE architecture, trained for code search problem, does not outperform CodeBERT, further improvements could be achieved using hyper parameter optimization.…”

Section: Th R E a T S T O Va L I D I T Ymentioning

confidence: 99%

Traceability Transformed: Generating More Accurate Links with Pre-Trained BERT Models

Lin

Liu

Zeng

et al. 2021

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Figure 2 shows the number of papers per topic modeling technique. The total number (125) exceeds the number of papers reviewed (111), because ten papers experimented with more than one technique: Thomas et al (2013), De Lucia et al (2014, Binkley et al (2015), Tantithamthavorn et al (2018), Abdellatif et al (2019) and Liu et al (2020) The popularity of LDA in software engineering has also been discussed by others, e.g., Treude and Wagner (2019). LDA is a three-level hierarchical Bayesian model (Blei et al 2003b).…”

Section: Topic Modeling Techniquesmentioning

confidence: 99%

“…-Regarding the other two papers, Binkley et al (2015) compared LSI to Query likelihood LDA (QL-LDA) and other information extraction techniques to check the best model for locating features in source code; and Liu et al (2020) compared LSI and LDA to Generative Vector Space Model (GVSM), a deep learning technique, to select the best performer model for documentation traceability to source code in multilingual projects.…”

Section: Topic Modeling Techniquesmentioning

confidence: 99%

See 1 more Smart Citation

Topic modeling in software engineering research

2021

View full text Add to dashboard Cite

Topic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.

show abstract