The completion of draft sequences of the human and mouse genomes offers many opportunities for gene discovery in the field of immunology through the application of the methods of computational genomics. One arm of the innate immune system includes the antimicrobial peptides that protect multicellular organisms from a diverse spectrum of microorganisms. The beta-defensins comprise an important family of mammalian antimicrobial peptides. To better define the beta-defensin gene family, we developed an approach to search genomic databases for conserved motifs present in the beta-defensin family using HMMER, a computational search tool based on hidden Markov models (HMMs), in combination with the basic local alignment search tool. The approach was first used to identify candidate second-exon coding regions, and later applied to finding associated first exons. This strategy discovered 28 new human and 43 new mouse beta-defensin genes in five syntenic chromosomal regions. Within each syntenic cluster, the gene sequences and organization were similar, suggesting that each cluster pair arose from a common ancestor and was retained because of conserved functions. These findings demonstrate an important proof-of-principle for a genome-wide search strategy to identify genes with conserved structural motifs. Such an approach may be readily adopted to address other questions of relevance to immunology.
Previously [1], we reported a coarse-grained parallel computational approach to identifying rare molecular evolutionary events often referred to as horizontal gene transfers. Very high degrees of parallelism (up to 65x speedup on 4,096 processors) were reported, yet the overall execution time for a realistic problem size was still on the order of 12 days. With the availability of large numbers of compute clusters, as well as genomic sequence from more than 2,000 species containing as many as 35,000 genes each, and trillions of sequence nucleotides in all, we demonstrated the computational feasibility of a method to examine "clusters" of genes using phylogenetic tree similarity as a distance metric. A full serial solution to this problem requires years of CPU time, yet only makes modest IPC and memory demands; thus, it is an ideal candidate for a grid computing approach involving low-cost compute nodes. This paper now describes a multiple granularity parallelism solution that includes exploitation of multi-core shared memory nodes to address fine-grained aspects in the tree-clustering phase of our previous deployment of XenoCluster 1.0. In addition to benchmarking results that show up to 80% speedup efficiency on 8 CPU cores, we report on the biological accuracy and relevance of our results compared to a reported set of known xenologs in yeast.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.