Achyudh Ram scite author profile

Neural network models for many NLP tasks have grown increasingly complex in recent years, making training and deployment more difficult. A number of recent papers have questioned the necessity of such architectures and found that well-executed, simpler models are quite effective. We show that this is also the case for document classification: in a large-scale reproducibility study of several recent neural models, we find that a simple BiLSTM architecture with appropriate regularization yields accuracy and F 1 that are either competitive or exceed the state of the art on four standard benchmark datasets. Surprisingly, our simple model is able to achieve these results without attention mechanisms. While these regularization techniques, borrowed from language modeling, are not novel, to our knowledge we are the first to apply them in this context. Our work provides an opensource platform and the foundation for future work in document classification.

show abstract

What makes a code change easier to review: an empirical investigation on code change reviewability

Ram

Sawant

Castelluccio³

et al. 2018

View full text Add to dashboard Cite

Peer code review is a practice widely adopted in software projects to improve the quality of code. In current code review practices, code changes are manually inspected by developers other than the author before these changes are integrated into a project or put into production. We conducted a study to obtain an empirical understanding of what makes a code change easier to review. To this end, we surveyed published academic literature and sources from gray literature (e.g., blogs and white papers), we interviewed ten professional developers, and we designed and deployed a reviewability evaluation tool that professional developers used to rate the reviewability of 98 changes. We find that reviewability is defined through several factors, such as the change description, size, and coherent commit history. We provide recommendations for practitioners and researchers. The preprint and data for this paper are publicly available. Preprint [

show abstract

Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT

Adhikari¹,

Ram

Tang

et al. 2020

View full text Add to dashboard Cite

Fine-tuned variants of BERT are able to achieve state-of-the-art accuracy on many natural language processing tasks, although at significant computational costs. In this paper, we verify BERT's effectiveness for document classification and investigate the extent to which BERT-level effectiveness can be obtained by different baselines, combined with knowledge distillation-a popular model compression method. The results show that BERTlevel effectiveness can be achieved by a singlelayer LSTM with at least 40× fewer FLOPS and only ∼3% parameters. More importantly, this study analyzes the limits of knowledge distillation as we distill BERT's knowledge all the way down to linear models-a relevant baseline for the task. We report substantial improvement in effectiveness for even the simplest models, as they capture the knowledge learnt by BERT.

show abstract

Investigating type declaration mismatches in Python

Pascarella

Ram

Nadeem

et al. 2018

View full text Add to dashboard Cite

Past research provided evidence that developers making code changes sometimes omit to update the related documentation, thus creating inconsistencies that may contribute to faults and crashes. In dynamically typed languages, such as Python, an inconsistency in the documentation may lead to a mismatch in type declarations only visible at runtime. With our study, we investigate how often the documentation is inconsistent in a sample of 239 methods from five Python open-source software projects. Our results highlight that more than 20% of the comments are either partially defined or entirely missing and that almost 1% of the methods in the analyzed projects contain type inconsistencies. Based on these results, we create a tool, PyID, to early detect type mismatches in Python documentation and we evaluate its performance with our oracle.Abstract-Past research provided evidence that developers making code changes sometimes omit to update the related documentation, thus creating inconsistencies that may contribute to faults and crashes. In dynamically typed languages, such as Python, an inconsistency in the documentation may lead to a mismatch in type declarations only visible at runtime.With our study, we investigate how often the documentation is inconsistent in a sample of 239 methods from five Python opensource software projects. Our results highlight that more than 20% of the comments are either partially defined or entirely missing and that almost 1% of the methods in the analyzed projects contain type inconsistencies. Based on these results, we create a tool, PyID, to early detect type mismatches in Python documentation and we evaluate its performance with our oracle.

show abstract

Supervised Sentiment Classification with CNNs for Diverse SE Datasets

Ram¹,

Nagappan²

2018

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Achyudh Ram

Rethinking Complex Neural Network Architectures for Document Classification

What makes a code change easier to review: an empirical investigation on code change reviewability

Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT

Investigating type declaration mismatches in Python

Supervised Sentiment Classification with CNNs for Diverse SE Datasets

Contact Info

Product

Resources

About