Model Organism Databases (MODs) represent the union of database technology and biology, and are essential to modern biological and medical research. Research communities are producing floods of new data, of increasingly different types and complexity. MODs assimilate this information from a wide variety of sources, organize it in a comprehensible manner, and make it freely available to the public via the Internet. MODs permit researchers to sort through massive amounts of data, providing access to key information that they might otherwise have overlooked. The protocols in this unit offer a general introduction to different types of data available in the growing number of MODs, and approaches for accessing, browsing, and querying these data.Keywords: Genome project r genetics r DNA sequence r gene model r protein function
OVERVIEW AND PRINCIPLESRecent advances in DNA sequencing technologies over the past two decades have led to an increase in the number of fully sequenced genomes and other types of publicly available DNA sequences, which has in turn allowed a great expansion in the depth and breadth of experimental data available to today's researcher. In order to make the most of this information, it must be collected, vetted, collated, and made available to the relevant scientific community (i.e., it must be curated). This curation occurs within the context of Model Organism Databases (MODs), which are assuming increasing importance in all areas of biology."Model organisms" are nonhuman organisms that are typically used for biological research. The resulting data can be used as a framework for the interpretation and understanding of similar data from humans or other medically or economically important species. Popular model organisms include budding yeast, fruit flies, and laboratory mice, all of which contain genes that encode proteins and other gene products similar to those found in humans. Genetic manipulation of model organisms is generally the most efficient path to understanding the effects of mutations in their human homologs. Model organisms have become especially effective reference species because vast amounts of data have been generated, collected, and made freely available to the public research community.
History of Model Organism DatabasesIn order to help researchers sort through these mountains of data, crucial resources called Model Organism Databases (MODs) have been developed. Each MOD provides easy access to the diverse types of knowledge available for a particular model organism. Two of the earliest MODs were FlyBase and the Saccharomyces Genome Database (SGD), both of which were established in the early 1990s. FlyBase was started by Michael Ashburner and colleagues at Cambridge University, Harvard University, and Indiana University in 1992 as an effort to collate information regarding the genes and mutations of the fruit fly Drosophila melanogaster, one of the most intensely studied eukaryotic