The development of genome sequencing and DNA microarray analysis of gene expression gives rise to the demand for data-mining tools. BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters are either given by the user or estimated from a specified sequence file. The significance of each motif found is judged based on a motif score distribution estimated by a Monte Carlo method. In addition, BioProspector modifies the motif model used in the earlier Gibbs samplers to allow for the modeling of gapped motifs and motifs with palindromic patterns. All these modifications greatly improve the performance of the program. Although testing and development are still in progress, the program has shown preliminary success in finding the binding motifs for Saccharomyces cerevisiae RAP1, Bacillus subtilis RNA polymerase, and Escherichia coli CRP. We are currently working on combining BioProspector with a clustering program to explore gene expression networks and regulatory mechanisms. For a copy of the program and documentation for UNIX systems, please contact xliu@smi.stanford.edu.
Chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-array) has become a popular procedure for studying genome-wide protein-DNA interactions and transcription regulation. However, it can only map the probable protein-DNA interaction loci within 1-2 kilobases resolution. To pinpoint interaction sites down to the base-pair level, we introduce a computational method, Motif Discovery scan (MDscan), that examines the ChIP-array-selected sequences and searches for DNA sequence motifs representing the protein-DNA interaction sites. MDscan combines the advantages of two widely adopted motif search strategies, word enumeration and position-specific weight matrix updating, and incorporates the ChIP-array ranking information to accelerate searches and enhance their success rates. MDscan correctly identified all the experimentally verified motifs from published ChIP-array experiments in yeast (STE12, GAL4, RAP1, SCB, MCB, MCM1, SFF, and SWI5), and predicted two motif patterns for the differential binding of Rap1 protein in telomere regions. In our studies, the method was faster and more accurate than several established motif-finding algorithms. MDscan can be used to find DNA motifs not only in ChIP-array experiments but also in other experiments in which a subgroup of the sequences can be inferred to contain relatively abundant motif sites. The MDscan web server can be accessed at http://BioProspector.stanford.edu/MDscan/.
Understanding the genetic basis of HIV-1 drug resistance is essential to developing new antiretroviral drugs and optimizing the use of existing drugs. This understanding, however, is hampered by the large numbers of mutation patterns associated with crossresistance within each antiretroviral drug class. We used five statistical learning methods (decision trees, neural networks, support vector regression, least-squares regression, and least angle regression) to relate HIV-1 protease and reverse transcriptase mutations to in vitro susceptibility to 16 antiretroviral drugs. Learning methods were trained and tested on a public data set of genotype-phenotype correlations by 5-fold cross-validation. For each learning method, four mutation sets were used as input features: a complete set of all mutations in >2 sequences in the data set, the 30 most common data set mutations, an expert panel mutation set, and a set of nonpolymorphic treatment-selected mutations from a public database linking protease and reverse transcriptase sequences to antiretroviral drug exposure. The nonpolymorphic treatment-selected mutations led to the best predictions: 80.1% accuracy at classifying sequences as susceptible, low͞intermediate resistant, or highly resistant. Least angle regression predicted susceptibility significantly better than other methods when using the complete set of mutations. The three regression methods provided consistent estimates of the quantitative effect of mutations on drug susceptibility, identifying nearly all previously reported genotype-phenotype associations and providing strong statistical support for many new associations. Mutation regression coefficients showed that, within a drug class, crossresistance patterns differ for different mutation subsets and that cross-resistance has been underestimated. antiviral therapy ͉ HIV ͉ linear regression ͉ machine learning T wenty antiretroviral drugs are approved for treating HIV-1 infection: eight protease inhibitors (PIs), seven nucleoside and one nucleotide reverse transcriptase (RT) inhibitors (NRTIs), three nonnucleoside RT inhibitors (NNRTIs), and one fusion inhibitor. Resistance to these drugs is caused by mutations in their molecular targets. Understanding the genetic basis of cross-resistance is essential for designing new antiviral drugs and for using genotypic drug resistance testing to select optimal therapy. Despite the large number of PIs and RT inhibitors, therapy is challenging because drug resistance arises from complex patterns of mutations and because of the high degree of cross-resistance within each drug class.Approaches for using HIV-1 drug resistance mutations to predict changes in drug susceptibility have included decision trees (1), linear regression (2), linear discriminant analysis (3), neural networks (4), and support vector regression (SVR) (5). Here, we compare five statistical learning methods each using four different sets of input mutations to develop quantitative models associating HIV-1 protease and RT mutations with changes in susce...
Extracts of Drosophila embryos contain an enzymatic activity that converts circular DNAs into huge networks of catenated rings in an ATP-dependent fashion. The catenated activity is resolved into two protein components during purification. One component is a novel DNA topoisomerase that requires the presence of ATP in order to relax supercoiled DNA. We have shown that the ATP-dependent DNA topoisomerase relaxes DNA by a mechanism distinct from that of nicking-closing enzymes. The Drosophila ATP-dependent topoisomerase allows one segment of a circular DNA to pass through transient breaks in both strands at another site on the DNA circle without any relative rotation between the ends at the transient break. This mechanism can convert negative supertwists to positive twists and vice versa until a relaxed equilibrium state is reached. The formation of catenated rings is mediated by an analogous bimolecular reaction which can occur between two nonhomologous DNA circles. The catenation reaction is fully reversible: in the presence of the second protein component, circular DNA is converted quantitatively into catenated forms; in its absence, the ATP-dependent topoisomerase resolves catenated networks back into monomer circles. The Drosophila ATP-dependent topoisomerase appears to be closely related to E. coli DNA gyrase in that both use a similar mechanism to change the topology of DNA, both require ATP and both are inhibited by the antibiotic novobiocin. The presence of an enzyme that allows one DNA helix to pass freely through another could not only be useful in relaxation of topological constraints, but also may be involved in the folding and unfolding of eucaryotic chromosomes.
We report the properties of 67 members of a family of dispersed repetitive palindromic extragenic bacterial DNA sequences. These sequences, called palindromic units, appear to be present at least several hundred times outside structural genes on the Escherichia coli chromosome. They are found either in clusters ‐ as in a previously described intercistronic element ‐ or in single occurrences. They are not only found within an operon but also between different operons, including between convergent ones. The palindromic units could yield a stem and loop structure at the level of DNA or RNA. The base of the stem is made of eight remarkably conserved base pairs while the rest varies somewhat in length and sequence. We analyse the data available on the palindromic units and we speculate on their possible roles with emphasis on transcription and mRNA stability or processing, as well as on their possible relation to transposition elements and the modular evolution of the genome.
Classic molecular motion simulation techniques, such as Monte Carlo (MC) simulation, generate motion pathways one at a time and spend most of their time in the local minima of the energy landscape defined over a molecular conformation space. Their high computational cost prevents them from being used to compute ensemble properties (properties requiring the analysis of many pathways). This paper introduces stochastic roadmap simulation (SRS) as a new computational approach for exploring the kinetics of molecular motion by simultaneously examining multiple pathways. These pathways are compactly encoded in a graph, which is constructed by sampling a molecular conformation space at random. This computation, which does not trace any particular pathway explicitly, circumvents the local-minima problem. Each edge in the graph represents a potential transition of the molecule and is associated with a probability indicating the likelihood of this transition. By viewing the graph as a Markov chain, ensemble properties can be efficiently computed over the entire molecular energy landscape. Furthermore, SRS converges to the same distribution as MC simulation. SRS is applied to two biological problems: computing the probability of folding, an important order parameter that measures the "kinetic distance" of a protein's conformation from its native state; and estimating the expected time to escape from a ligand-protein binding site. Comparison with MC simulations on protein folding shows that SRS produces arguably more accurate results, while reducing computation time by several orders of magnitude. Computational studies on ligand-protein binding also demonstrate SRS as a promising approach to study ligand-protein interactions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.