This chapter identifies the challenges posed to biologists, geneticists and other scientists by advances in technology that have made the observation and study of biological systems increasingly possible. High-throughput platforms have made routine the collection vast amounts of structural and functional data, and have provided insights into the working cell, and helped to explain the role of genetics in common diseases. Associated with the improvements in technology is the need for statistical procedures that extract the biological information from the available data in a coherent fashion, and perhaps more importantly, can quantify the certainty with which conclusions can be made. This chapter outlines a biological hierarchy of structures, functions and interactions that can now be observed, and detail the statistical procedures that are necessary for analyzing the resulting data. The chapter has four main sections. The first section details the historical connection between statistics and the analysis of biological and genetic data, and summarizes fundamental concepts in biology and genetics. The second section outlines specific mathematical and statistical methods that are useful in the modelling of data arising in bioinformatics. In sections three and four, two particular issues are discussed in detail: functional genomics via microrray analysis, and metabolomics. Section five identifies some future directions for biological research in which statisticians will play a vital role.
Glossary
Systems BiologyThe holistic study of biological structure, function and organization 1 Probabilistic Graphical ModelA probabilistic model defining the relationships between variables in a model by means of a graph, used to represent the relationships in a biological network or pathway 5 MCMC Markov chain Monte Carlo -a computational method for approximating high-dimensional integrals using Markov chains to sample from probability distributions, commonly used in Bayesian inference 8 Microarray A high-throughput experimental platform for collecting functional gene expression and other genomic data 11Cluster Analysis A statistical method for discovering subgroups in data 14 MetabolomicsThe study of the metabolic content of tissues 20