Eddie Antonio Santos scite author profile

Developers summarize their changes to code in commit messages. When a message seems “unusual,” however, this puts doubt into the quality of the code contained in the commit. We trained \(n\)-gram language models and used cross-entropy as an indicator of commit message “unusualness” of over 120 000 commits from open source projects. Build statuses collected from Travis-CI were used as a proxy for code quality. We then compared the distributions of failed and successful commits with regards to the “unusualness” of their commit message. Our analysis yielded significant results when correlating cross-entropy with build status.

show abstract

How does docker affect energy consumption? Evaluating workloads in and out of Docker containers

Santos

McLean

Solinas

et al. 2018

Journal of Systems and Software

View full text Add to dashboard Cite

Context: Virtual machines provide isolation of services at the cost of hypervisors and more resource usage. This spurred the growth of systems like Docker that enable single hosts to isolate several applications, similar to VMs, within a low-overhead abstraction called containers. Motivation: Although containers tout low overhead performance, do they still have low energy consumption? Methodology: This work statistically compares (ttest, Wilcoxon) the energy consumption of three application workloads in Docker and on bare-metal Linux. Results: In all cases, there was a statistically significant (t-test and Wilcoxon p < 0.05) increase in energy consumption when running tests in Docker, mostly due to the performance of I/O system calls.

show abstract

The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software

Kühn¹,

Davis²,

Désilets³

et al. 2020

View full text Add to dashboard Cite

This paper describes the first, three-year phase of a project at the National Research Council of Canada that creates software to assist Indigenous communities in preserving their languages and extending their use. The project aimed to work within the empowerment paradigm, where collaboration with communities and fulfillment of their goals is central. Since many of the technologies we developed were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (such as Kanyen'kéha, in the Iroquoian language family), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for speech recordings, software for implementing text prediction and read-along audiobooks for Indigenous languages, and several other subprojects. Sociolinguistic BackgroundThere are about 70 Indigenous languages from 10 distinct language families currently spoken in Canada (Rice, 2008). Most of these languages have complex morphology; they are polysynthetic or agglutinative. Commonly, a single word carries the meaning of an entire clause in Indo-European languages.

show abstract

The unreasonable effectiveness of traditional information retrieval in crash report deduplication

Campbell

Santos

Hindle

2016

Preprint

View full text Add to dashboard Cite

Organizations like Mozilla, Microsoft, and Apple are flooded with thousands of automated crash reports per day. Although crash reports contain valuable information for debugging, there are often too many for developers to examine individually. Therefore, in industry, crash reports are often automatically grouped together in buckets. Ubuntu’s repository contains crashes from hundreds of software systems available with Ubuntu. A variety of crash report bucketing methods are evaluated using data collected by Ubuntu’s Apport automated crash reporting system. The trade-off between precision and recall of numerous scalable crash 7 deduplication techniques is explored. A set of criteria that a crash deduplication method must meet is presented and several methods that meet these criteria are evaluated on a new dataset. The evaluations presented in this paper show that using off-the-shelf information retrieval techniques, that were not designed to be used with crash reports, outperform other techniques which are specifically designed for the task of crash bucketing at realistic industrial scales. This research indicates that automated crash bucketing still has a lot of room for improvement, especially in terms of identifier tokenization.

show abstract

Programming Is Hard - Or at Least It Used to Be

Becker

Denny

Finnie-Ansley

et al. 2023

View full text Add to dashboard Cite

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.