2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) 2019
DOI: 10.1109/msr.2019.00046
|View full text |Cite
|
Sign up to set email alerts
|

Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?

Abstract: Duplicate questions on Stack Overflow are questions that are flagged as being conceptually equivalent to a previously posted question. Stack Overflow suggests that duplicate questions should not be discussed by users, but rather that attention should be redirected to their previously posted counterparts. Roughly 53% of closed Stack Overflow posts are closed due to duplication. Despite their supposed overlapping content, user activity suggests duplicates may generate additional or superior answers. Approximatel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 15 publications
0
10
0
Order By: Relevance
“…As might be expected given that collective recognition of the breadth of potential ethics issues is only relatively recent, we found some, but not extensive, discussion: only one mining challenge (Wilkie et al 2018) discussed ethics issues in some detail, and two papers (Soto-Valero et al 2018;Abric et al 2019) mentioned the anonymity of the underlying datasets in the research that built on them. 35 papers contained discussions of threats to validity (perhaps a section of a paper best suited to the discussion of ethics issues).…”
Section: Mining Challengesmentioning
confidence: 71%
“…As might be expected given that collective recognition of the breadth of potential ethics issues is only relatively recent, we found some, but not extensive, discussion: only one mining challenge (Wilkie et al 2018) discussed ethics issues in some detail, and two papers (Soto-Valero et al 2018;Abric et al 2019) mentioned the anonymity of the underlying datasets in the research that built on them. 35 papers contained discussions of threats to validity (perhaps a section of a paper best suited to the discussion of ethics issues).…”
Section: Mining Challengesmentioning
confidence: 71%
“…The literature characterize duplicate posts on PCQA forums as: (1) questions that address the same topic, but are not necessarily identical copies (Silva et al, 2018;Wang et al, 2020); (2) questions conceptually equivalent to other questions previously posted (Abric et al, 2019); (3) questions that were already asked and answered before (Zhang et al, 2017;Wang et al, 2020); (4) questions asked to solve the same problem (Ahasanuzzaman et al, 2016); and (5) questions that 'express the same point' (Zhang et al, 2015). We can note the duplicates conceptualization is not a rigid definition; it can depend on human subjective criteria.…”
Section: Related Workmentioning
confidence: 99%
“…The authors adopted an official Stack Overflow dataset dump to train, optimize, and evaluate the approaches that predict and classify questions as duplicate or not, respectively. Wang et al (2020) Conversely, Abric et al (2019) analyzed how helpful duplicate questions are to the software development community. The authors argue that duplicates on Stack Overflow help the developer community by providing different formulations of the same problem or solution.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations