Extracting Development Tasks to Navigate Software Documentation

Robillard, Martin P.; Dagenais, Barthélémy

doi:10.1109/tse.2014.2387172

Cited by 72 publications

(43 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In many of these cases, the background of the users seems to determine whether they understand a sentence or not. We found a similar situation in our previous work [36] when we asked developers to rate the meaningfulness of task descriptions that we had automatically extracted from their software documentation. In those cases, we argue that displaying such sentences does little harm if some users do not understand them while other users find them useful.…”

Section: Inter-rater Agreementsupporting

confidence: 64%

See 1 more Smart Citation

Augmenting API documentation with insights from stack overflow

Treude

Robillard

2016

Proceedings of the 38th International Conference on Software Engineering

Self Cite

227

113

View full text Add to dashboard Cite

Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with "insight sentences" from Stack Overflowsentences that are related to a particular API type and that provide insight not contained in the API documentation of that type. Based on a development set of 1,574 sentences, we compare the performance of two state-of-the-art summarization techniques as well as a pattern-based approach for insight sentence extraction. We then present SISE, a novel machine learning based approach that uses as features the sentences themselves, their formatting, their question, their answer, and their authors as well as part-of-speech tags and the similarity of a sentence to the corresponding API documentation. With SISE, we were able to achieve a precision of 0.64 and a coverage of 0.7 on the development set. In a comparative study with eight software developers, we found that SISE resulted in the highest number of sentences that were considered to add useful information not found in the API documentation. These results indicate that taking into account the meta data available on Stack Overflow as well as part-of-speech tags can significantly improve unsupervised extraction approaches when applied to Stack Overflow data.

show abstract

Section: Inter-rater Agreementsupporting

confidence: 64%

“…We had developed a set of techniques for preprocessing software documentation in previous work [36,37]. We summarize them here for completeness.…”

Section: Unsupervised Approachesmentioning

confidence: 99%

Augmenting API documentation with insights from stack overflow

Treude

Robillard

2016

Proceedings of the 38th International Conference on Software Engineering

Self Cite

227

113

View full text Add to dashboard Cite

show abstract

“…One threat to the validity of our results and an opportunity for future work lies in the fact that we used all four NLP libraries with their default settings 9 and without any specialized models. Also, the results are only reflecting the performance and accuracy of the current library versions which might change as the libraries are evolving.…”

Section: Threats To Validitymentioning

confidence: 99%

“…While it is common for researchers to rely on publicly available NLP libraries, some researchers develop their own tooling for specific tasks. For example, Allamanis et al [7] developed a customized system called Haggis for mining code idioms and in our own previous work, we added customizations to the Stanford NLP library to improve the accuracy of parsing natural language text authored by software developers [8], [9]. In this work, we aim to identify how the choice of using a particular publicly available NLP library could impact the results of any research that makes use of an NLP library.…”

Section: Introductionmentioning

confidence: 99%

Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments

Omran

Treude

2017

2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)

Self Cite

View full text Add to dashboard Cite

Abstract-To uncover interesting and actionable information from natural language documents authored by software developers, many researchers rely on "out-of-the-box" NLP libraries. However, software artifacts written in natural language are different from other textual documents due to the technical language used. In this paper, we first analyze the state of the art through a systematic literature review in which we find that only a small minority of papers justify their choice of an NLP library. We then report on a series of experiments in which we applied four state-of-the-art NLP libraries to publicly available software artifacts from three different sources. Our results show low agreement between different libraries (only between 60% and 71% of tokens were assigned the same part-of-speech tag by all four libraries) as well as differences in accuracy depending on source: For example, spaCy achieved the best accuracy on Stack Overflow data with nearly 90% of tokens tagged correctly, while it was clearly outperformed by Google's SyntaxNet when parsing GitHub ReadMe files. Our work implies that researchers should make an informed decision about the particular NLP library they choose and that customizations to libraries might be necessary to achieve good results when analyzing software artifacts written in natural language.

show abstract

“…Several tools have been developed that automatically process natural language documents produced by software developers, for example by inferring specification from documentation [29], linking information from bug tracking systems and mailing lists to source code methods [14], summarizing bug reports [19], or extracting tasks from documentation [25]. Many of these tools rely on natural language processing tools such as the Stanford natural language processing toolkit [13] to split sentences, detect words in a sentence, assign parts of speech to words (such as adjective, verb, or noun), and to detect grammatical dependencies between different parts of a sentence (such as subject or direct object).…”

Section: Introductionmentioning

confidence: 99%

Challenges in Analyzing Software Documentation in Portuguese

Prolo

Filho

2015

2015 29th Brazilian Symposium on Software Engineering

Self Cite

View full text Add to dashboard Cite

Many tools that automatically analyze, summarize, or transform software artifacts rely on natural language processing tooling for the interpretation of natural language text produced by software developers, such as documentation, code comments, commit messages, or bug reports. Processing natural language text produced by software developers is challenging because of unique characteristics not found in other texts, such as the presence of code terms and the systematic use of incomplete sentences. In addition, texts produced by Portuguese-speaking developers mix languages since many keywords and programming concepts are referred to by their English name. In this paper, we provide empirical insights into the challenges of analyzing software artifacts written in Portuguese. We analyzed 100 question titles from the Portuguese version of Stack Overflow with two Portuguese language tools and identified multiple problems which resulted in very few sentences being tagged completely correctly. Based on these results, we propose heuristics to improve the analysis of natural language text produced by software developers in Portuguese.

show abstract

Extracting Development Tasks to Navigate Software Documentation

Cited by 72 publications

References 51 publications

Augmenting API documentation with insights from stack overflow

Augmenting API documentation with insights from stack overflow

Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments

Challenges in Analyzing Software Documentation in Portuguese

Contact Info

Product

Resources

About