Hideaki Hata scite author profile

Pseudo-code written in natural language can aid the comprehension of source code in unfamiliar programming languages. However, the great majority of source code has no corresponding pseudo-code, because pseudo-code is redundant and laborious to create. If pseudo-code could be generated automatically and instantly from given source code, we could allow for on-demand production of pseudo-code without human effort. In this paper, we propose a method to automatically generate pseudo-code from source code, specifically adopting the statistical machine translation (SMT) framework. SMT, which was originally designed to translate between two natural languages, allows us to automatically learn the relationship between source code/pseudo-code pairs, making it possible to create a pseudo-code generator with less human effort. In experiments, we generated English or Japanese pseudo-code from Python statements using SMT, and find that the generated pseudo-code is largely accurate, and aids code understanding.

show abstract

Bug prediction based on fine-grained module histories

Hata

Mizuno²,

Kikuno³

2012

108

106

View full text Add to dashboard Cite

Pandemic programming

et al. 2020

View full text Add to dashboard Cite

Context As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective This study investigates the effects of the pandemic on developers’ wellbeing and productivity. Method A questionnaire survey was created mainly from existing, validated scales and translated into 12 languages. The data was analyzed using non-parametric inferential statistics and structural equation modeling. Results The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Confirmatory results include: (1) the pandemic has had a negative effect on developers’ wellbeing and productivity; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity. Exploratory analysis suggests that: (1) women, parents and people with disabilities may be disproportionately affected; (2) different people need different kinds of support. Conclusions To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees’ home offices. Women, parents and disabled persons may require extra support.

show abstract

9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay

Hata

Treude

Kula

et al. 2019

View full text Add to dashboard Cite

Links are an essential feature of the World Wide Web, and source code repositories are no exception. However, despite their many undisputed benefits, links can suffer from decay, insufficient versioning, and lack of bidirectional traceability. In this paper, we investigate the role of links contained in source code comments from these perspectives. We conducted a large-scale study of around 9.6 million links to establish their prevalence, and we used a mixed-methods approach to identify the links' targets, purposes, decay, and evolutionary aspects. We found that links are prevalent in source code repositories, that licenses, software homepages, and specifications are common types of link targets, and that links are often included to provide metadata or attribution. Links are rarely updated, but many link targets evolve. Almost 10% of the links included in source code comments are dead. We then submitted a batch of link-fixing pull requests to open source software repositories, resulting in most of our fixes being merged successfully. Our findings indicate that links in source code comments can indeed be fragile, and our work opens up avenues for future work to address these problems.12 https://stackoverflow.com/q/312443 13 We used LWP::UserAgent and LWP::RobotUA.

show abstract

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling

Pingclasai

Hata

Matsumoto

2013

View full text Add to dashboard Cite

Bug reports are widely used in several research areas such as bug prediction, bug triaging, and etc. The performance of these studies relies on the information from bug reports. Previous study showed that a significant number of bug reports are actually misclassified between bugs and nonbugs. However, classifying bug reports is a time-consuming task. In the previous study, researchers spent 90 days to classify manually more than 7,000 bug reports. To tackle this problem, we propose automatic bug report classification techniques. We apply topic modeling to the corpora of preprocessed bug reports of three open-source software projects with decision tree, naive Bayes classifier, and logistic regression. The performance in classification, measured in F-measure score, varies between 0.66-0.76, 0.65-0.77, and 0.71-0.82 for HTTPClient, Jackrabbit, and Lucene project respectively.

show abstract

Bug or Not? Bug Report Classification Using N-Gram IDF

Terdchanakul

Hata

Phannachitta

et al. 2017

View full text Add to dashboard Cite

Previous studies have found that a significant number of bug reports are misclassified between bugs and nonbugs, and that manually classifying bug reports is a timeconsuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.

show abstract

Generating Pseudo-Code from Source Code Using Deep Learning

Alhefdhi¹,

Dam²,

Hata³

et al. 2018

View full text Add to dashboard Cite

A Dataset of High Impact Bugs: Manually-Classified Issue Reports

Ohira

Kashiwa

Yamatani

et al. 2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hideaki Hata

Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation

Bug prediction based on fine-grained module histories

Pandemic programming

9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling

Bug or Not? Bug Report Classification Using N-Gram IDF

Generating Pseudo-Code from Source Code Using Deep Learning

A Dataset of High Impact Bugs: Manually-Classified Issue Reports

Contact Info

Product

Resources

About