Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion of pairwise fragment overlap. While shotgun sequencing infers a DNA sequence given the sequences of overlapping fragments, a recent and complementary method, called sequencing by hybridization (SBH), infers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper, we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises to be very fast and practical for DNA sequence assembly.
High-throughput fluorescent genotyping requires a considerable amount of automation for accurate and efficient processing of genetic markers. Automated DNA sequencers and corresponding software products are commercially available that contribute substantially to increased throughput rates for large-scale genotyping projects. However, some conceptually simple tasks still require time-consuming manual intervention that imposes bottlenecks on throughput capacity. One of these tasks is the conversion of imprecise DNA fragment sizes determined by commercial software programs to the underlying discrete alleles that the sizes represent. Here we describe a simple method for assigning allele sizes into their appropriate allele “bins” using least-squares minimization procedures. The method requires no special treatment of family data on plates, internal/external size standards, or electropherogram data manipulation. Tests of the method using the ABI 373A automated DNA sequencer and accompanying Genescan/Genotyper software resulted in accurate automatic classification of all alleles in >80% of 208 markers analyzed, with the remaining 20% being appropriately identified as requiring additional attention to laboratory conditions. Specific characteristics of different markers, including differences in PCR product size and inexact repeat lengths (e.g., 1.9 bp for a dinucleotide repeat), are accommodated by the method and their properties discussed.
There are two basic algorithms for calculating multipoint linkage likelihoods: in one the computational effort increases linearly with the number of pedigree members and exponentially with the number of markers, in the other the effort increases exponentially with the number of persons but linearly with the number of markers. We describe a faster version of the latter algorithm for which there is no penalty in making the recombination fraction meiosis specific. This can lead to faster and potentially more powerful linkage analysis whenever the number of nonfounder meioses in a pedigree is not too large.
Progress towards construction of a dense map of di-allelic markers across the human genome has generated considerable enthusiasm for pharmacogenomic applications. To date, however, nearly all of the effort on single nucleotide polymorphism (SNP) projects has been focused on marker identification and screening, not on how the SNP genotype data actually can be used in clinical trials to advance medical practice. Here, we explore how different properties of SNPs impact the size, scope and design of clinical trials using a simple trial design. We evaluate the clinical trial sampling requirements under different allele frequencies, gene action, gene effect size and number of markers in a genome screen. Power and sample size calculations suggest that allele frequency and type of gene action can have a dramatic impact on trial sample sizes, in that under some conditions the required sample sizes are too large to be applicable in a costly clinical trial setting. In other situations, however, pharmacogenomic clinical trials can yield significant sampling/cost savings over traditional trials. These properties are discussed with regard to the general usage of genetic information in clinical trial settings.
With discovery of an increasing number of candidate genes that may affect inter-individual variability in response to drugs, the design of drug trials that incorporate their study has become relevant. We discuss the determination of sample size for such studies when the number of tests to perform is given, or, alternatively, the number of tests to perform when the sample size is given. In many cases, a uniformly most powerful test does not exist and normal approximations are not sufficiently accurate to determine sample size. We discuss briefly various tests of interest and we give simple examples to illustrate some of the problems that arise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.