Leveraging the pull request model of social coding platforms, Open Source Software (OSS) integrators review developers' contributions, checking aspects like license, code quality, and testability. Some projects use bots to automate predefined, sometimes repetitive tasks, thereby assisting integrators' and contributors' work. Our research investigates the usage and impact of such bots. We sampled 351 popular projects from GitHub and found that 93 (26%) use bots. We classified the bots, collected metrics from before and after bot adoption, and surveyed 228 developers and integrators. Our results indicate that bots perform numerous tasks. Although integrators reported that bots are useful for maintenance tasks, we did not find a consistent, statistically significant difference between before and after bot adoption across the analyzed projects in terms of number of comments, commits, changed files, and time to close pull requests. Our survey respondents deem the current bots as not smart enough and provided insights into the bots' relevance for specific tasks, challenges, and potential new features. We discuss some of the raised suggestions and challenges in light of the literature in order to help GitHub bot designers reuse and test ideas and technologies already investigated in other contexts.
Apache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets.Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions being to address sustainability and limit power consumption. In this paper, we introduce HDFS H , a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFS H brings to middleware the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of the Hadoop project, representing different designs of the Hadoop middleware. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that in many cases storing only part of the data in SSDs results in significant energy savings and execution speedups.
ABSTRACTApache Hadoop has evolved significantly over the last years, with more than 60 releases bringing new features. By implementing the MapReduce programming paradigm and leveraging HDFS, its distributed file system, Hadoop has become a reliable and fault tolerant middleware for parallel and distributed computing over large datasets. Nevertheless, Hadoop may struggle under certain workloads, resulting in poor performance and high energy consumption. Users increasingly demand that high performance computing solutions being to address sustainability and limit power consumption. In this paper, we introduce HDFSH , a hybrid storage mechanism for HDFS, which uses a combination of Hard Disks and Solid-State Disks to achieve higher performance while saving power in Hadoop computations. HDFSH brings to middleware the best from HDs (affordable cost per GB and high storage capacity) and SSDs (high throughput and low energy consumption) in a configurable fashion, using dedicated storage zones for each storage device type. We implemented our mechanism as a block placement policy for HDFS, and assessed it over six recent releases of the Hadoop project, representing different designs of the Hadoop middleware. Results indicate that our approach increases overall job performance while decreasing the energy consumption under most hybrid configurations evaluated. Our results also showed that in many cases storing only part of the data in SSDs results in significant energy savings and execution speedups.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.