TagCombine: Recommending Tags to Contents in Software Information Sites

Wang, Xin Yu; Xia, Xin; Lo, David

doi:10.1007/s11390-015-1578-2

Cited by 26 publications

(14 citation statements)

References 37 publications

(80 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such as code search (e.g., [2,24,31,39]), clone detection (e.g., [7,18,19,64,67]), program repair (e.g,. [10,45,60,66]), document (such as API and questions/answers/tags) recommendation (e.g., [22,25,26,55,63,65,69,70,76]).…”

Section: Machine/deep Learning On Software Engineeringmentioning

confidence: 99%

Generating Question Titles for Stack Overflow from Mined Code Snippets

Gao,

Xia,

Grundy

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Stack Overflow has been heavily used by software developers as a popular way to seek programming-related information from peers via the internet. The Stack Overflow community recommends users to provide the related code snippet when they are creating a question to help others better understand it and offer their help. Previous studies have shown that a significant number of these questions are of low-quality and not attractive to other potential experts in Stack Overflow. These poorly asked questions are less likely to receive useful answers and hinder the overall knowledge generation and sharing process. Considering one of the reasons for introducing low-quality questions in SO is that many developers may not be able to clarify and summarize the key problems behind their presented code snippets due to their lack of knowledge and terminology related to the problem, and/or their poor writing skills, in this study we propose an approach to assist developers in writing high-quality questions by automatically generating question titles for a code snippet using a deep sequence-to-sequence learning approach. Our approach is fully data-driven and uses an attention mechanism to perform better content selection, a copy mechanism to handle the rare-words problem and a coverage mechanism to eliminate word repetition problem. We evaluate our approach on Stack Overflow datasets over a variety of programming languages (e.g., Python, Java, Javascript, C# and SQL) and our experimental results show that our approach significantly outperforms several state-of-the-art baselines in both automatic and human evaluation. We have released our code and datasets to facilitate other researchers to verify their ideas and inspire the follow up work.

show abstract

Section: Machine/deep Learning On Software Engineeringmentioning

confidence: 99%

Generating Question Titles for Stack Overflow from Mined Code Snippets

Gao,

Xia,

Grundy

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In the recent years, several studies have been done to analyze posts on SO, which include analyzing developers' area of interest based on questions asked [5], analyzing and suggesting tags of the questions [2] [1] [6] [7], identifying difficulties faced by developers [8], identifying trending technological topics [9], and so on. Researchers have classified posts on SO based on the context by manually interviewing software developers.…”

Section: Related Workmentioning

confidence: 99%

“…Insofar as the development in methods of classification is concerned, the research community has progressed from significant manual studies to automating them using machine learning algorithms and NLP techniques. Contemporary tools such as EnTAGREC++ [6], TagCombine [7] have been developed to provide tag suggestions to users when they post questions on SO. These tools…”

Section: Related Workmentioning

confidence: 99%

SOTagger - Towards Classifying Stack Overflow Posts through Contextual Tagging (S)

Venigalla¹,

Lakkundi²,

Chimalakonda³

2019

International Conferences on Software Engineering and Knowledge Engineering

View full text Add to dashboard Cite

There is an ever increasing growth in the use of Q&A websites such as Stack Overflow (SO), so are the number of posts on them. These websites serve as knowledge sharing platforms where Subject Matter Experts (SMEs) and developers answer questions posted by other users. It is effort intensive for developers to navigate to right posts because of the large volume of posts on the platform, despite the presence of existing tags, that are based on technologies. Tagging these posts based on their context and purpose might help developers and SMEs in easily identifying questions they wish to answer and also in identifying contextually similar posts. To support this idea, we propose SOTagger as a prototype plug-in for Stack Overflow to tag questions contextually. We have considered SO data provided on SOTorrent and automated the identification of 6 categories of questions using Latent Dirichlet Allocation. We have also manually verified relevance of these categories. Using these categories and dataset, we have built a classification model to classify a post into one of these six categories using Support Vector Machine. We have evaluated SOTagger by conducting a user survey with 32 developers. The preliminary results are promising with about 80% developers recommending the plugin to others.

show abstract

“…The outcome revealed that the developed model gives 65 percent correct results in a situation where one tag prediction is needed on average. Besides, the work of Xia et al (2013) and Wang, Xia and Lo (2015) also focused on developing a technique called TagCombine, aimed to propose tags automatically which examine objects in software information websites. The output of the conducted experiments revealed that TagCombine outperformed the available tag recommendation methods.…”

Section: Mining So For Software Developmentmentioning

confidence: 99%

A survey on mining stack overflow: question and answering (Q&A) community

Ahmad

Feng

Shi

et al. 2018

DTA

View full text Add to dashboard Cite

Purpose Software developers extensively use stack overflow (SO) for knowledge sharing on software development. Thus, software engineering researchers have started mining the structured/unstructured data present in certain software repositories including the Q&A software developer community SO, with the aim to improve software development. The purpose of this paper is show that how academics/practitioners can get benefit from the valuable user-generated content shared on various online social networks, specifically from Q&A community SO for software development. Design/methodology/approach A comprehensive literature review was conducted and 166 research papers on SO were categorized about software development from the inception of SO till June 2016. Findings Most of the studies revolve around a limited number of software development tasks; approximately 70 percent of the papers used millions of posts data, applied basic machine learning methods, and conducted investigations semi-automatically and quantitative studies. Thus, future research should focus on the overcoming existing identified challenges and gaps. Practical implications The work on SO is classified into two main categories; “SO design and usage” and “SO content applications.” These categories not only give insights to Q&A forum providers about the shortcomings in design and usage of such forums but also provide ways to overcome them in future. It also enables software developers to exploit such forums for the identified under-utilized tasks of software development. Originality/value The study is the first of its kind to explore the work on SO about software development and makes an original contribution by presenting a comprehensive review, design/usage shortcomings of Q&A sites, and future research challenges.

show abstract

TagCombine: Recommending Tags to Contents in Software Information Sites

Cited by 26 publications

References 37 publications

Generating Question Titles for Stack Overflow from Mined Code Snippets

Generating Question Titles for Stack Overflow from Mined Code Snippets

SOTagger - Towards Classifying Stack Overflow Posts through Contextual Tagging (S)

A survey on mining stack overflow: question and answering (Q&A) community

Contact Info

Product

Resources

About