Recent comparative studies of the human and mouse genomes have revealed sets of conserved nongenic sequences (CNGs) and sets of ultraconserved elements (UCEs). Both sets of sequences, which exhibit extremely high levels of conservation, extend over hundreds of bases and have no known function. Since there is no detectable sequence homology between paralogous CNGs or UCEs in either of the species, an alignment-free technique is needed for their analysis. We have previously compiled a database of the structural properties of all 32,896 unique DNA octamers, including information on stability, the minimum energy conformation, and flexibility. We have used Fourier techniques to analyze the UCEs and CNGs in terms of their octamer structural properties, to reveal structural correlations which may indicate possible functions for some of these sequences.
An R-group descriptor characterises the distribution of some atom-based property, such as elemental type or partial atomic charge, at increasing numbers of bonds distant from the point of substitution on a parent ring system. Application of PLS to datasets for which bioactivity data and R-group descriptor information are available is shown to provide an effective way of generating QSAR models with a high level of predictive ability. The resulting models are competitive with the models produced by established QSAR approaches, are readily interpretable in structural terms, and are shown to be of value in the optimisation of a lead series.
Structural DNA profiles use the structural properties of the constituent octamers either to observe any characteristics of a single sequence that are unusual (a single sequence query) or to visualize a pattern common to a set of sequences (a multiple sequence query). They are an aid in understanding structural reasons for functional DNA activity. Profiles that answer single sequence queries are introduced and Profile Manager (a software application developed to automate profile generation) is presented. Two sequences that are similar by their nucleotide composition but are known to be very different by structure are analyzed, resulting in useful illustrations that agree with the experimental nuclear magnetic resonance structures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.