The present paper demonstrates algorithms for applying gene counting estimation of haplotype frequencies in very large genetic systems. A factor union representation of phenotypes is used which conveniently yields the sets of potential haplotypes and diplotypes for each phenotype. Methods for storing and rapidly retrieving the relevant haplotypes are given. An example with several hundred frequencies is given which required a few seconds computing time for each estimation iteration on a small computer. A computer program employing the methods described has been written in Fortran-77 and is available to any investigator on request.
The budding yeast Saccharomyces cerevisiae is the best studied eukaryote in molecular and cell biology, but its utility for understanding the genetic basis of natural phenotypic variation is limited by the inefficiency of association mapping owing to strong and complex population structure. To facilitate association mapping, we analyzed 190 high-quality genomes of diverse strains, including 85 newly sequenced ones, to uncover yeast's population structure that varies substantially among genomic regions. We identified 181 yeast genes that are absent from the reference genome and demonstrated their expression and role in important functions such as drug resistance. We then simultaneously measured the growth rates of over 4500 lab strains each deficient of a nonessential gene and 81 natural strains across multiple environments using unique DNA barcode present in each strain. We combined the genome-wide reverse genetic information with genome-wide association analysis to determine potential genomic regions of importance to environmental adaptations, and for a subset experimentally validated their role by reciprocal hemizygosity tests. The resources provided permit efficient and reliable association mapping in yeast and significantly enhances its value as a model for understanding the genetic mechanisms of phenotypic polymorphism and evolution.
The problem of compact, fully efficient representation of multilocus data has not yet been solved. Lod scores can be used to map multilocus data, but because of certain statistical problems, this method loses some information. However, simulation studies show that for distances less than 10 or 20 cMo, where there is little danger of huge overestimates of distance, the lod score method yields estimators just as good as maximum likelihood (ML). Since short distances are the most important, the lod method is quite efficient. Its main drawback is misrepresentation of the likelihood under wrong gene orders. This problem can be ameliorated with a single multipoint calculation under each order. Thus, representation of multipoint data with lod scores can be very practical.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.