In this age of data-driven science and high-throughput biology, computational thinking is becoming an increasingly important skill for tackling both new and long-standing biological questions. However, despite its obvious importance and conspicuous integration into many areas of biology, computer science is still viewed as an obscure field that has, thus far, permeated into only a few of the biology curricula across the nation. A national survey has shown that lack of computational literacy in environmental sciences is the norm rather than the exception [Valle & Berdanier (2012) Bulletin of the Ecological Society of America, 93,[373][374][375][376][377][378][379][380][381][382][383][384][385][386][387][388][389]. In this article, we seek to introduce a few important concepts in computer science with the aim of providing a context-specific introduction aimed at research biologists. Our goal was to help biologists understand some of the most important mainstream computational concepts to better appreciate bioinformatics methods and trade-offs that are not obvious to the uninitiated.
IntroductionNext-generation sequencing (NGS) technologies have produced a substantial decrease in the cost and the complexity of generating sequence data and are allowing researchers to tackle questions that were not previously possible. Along with this remarkable progress in data acquisition, parallel advances in computational sciences, such as in machine learning and high-performance computing, are allowing researchers to answer complex biological problems using creative computational and quantitative techniques. For example, it is now possible to identify germline and somatic variants in thousands of individuals at reasonable costs, by employing powerful algorithms to analyse data sets generated using reduced-representation methods, such as RAD-seq (Marx 2013;Pabinger et al. 2014). The scale and the complexity of these new problems require computational infrastructures beyond those typically available in a traditional molecular ecology laboratory, and the computational background required to mine these data sets can be a major handicap. As such, a basic understanding of computer science is becoming an essential part in training the next generation of data-enabled biologists, not only as a tool during the inevitable integration of computer science in biology, but also to foster productive interactions in the new era of multidisciplinary and large-scale biology. While an increasing number of undergraduate and graduate programmes are including bioinformatics, programming or computer science in their curricula, precious few students seem to understand the principal computational concepts underlying the tools they use on a regular basis in their research (Pevzner & Sharmir 2009;Valle & Berdanier 2012).Computer science is a discipline that lays the theoretical foundations for the systematic study of processes that describe and transform information (Brookshear & Brookshear 2003). Its applications are widespread and encompass, amo...