. A top-down linguistic approach to the analysis of genomic sequences: The metabotropic glutamate receptors 1 and 5 in human and in mouse as a case study. Journal of Theoretical Biology, Elsevier, 2011, 270 (1) This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.A top-down linguistic approach to the analysis of genomic sequences: The metabotropic Glutamate receptors 1 and 5 in Human and in Mouse as a case study
AbstractThis paper presents a top-down strategy to detect features in genomic sequences. The strategy's core is to exploit dictionary-based compression algorithms and analyze the content of the automatically generated dictionary. We classify the different over-represented words and in the case study we correlate them to experimentally identified or theoretically forecasted biological features. A large spectrum analysis reveals that the only feature co-located with the a priori extracted words is the torsional flexibility of DNA, while non-B DNA configurations are anti-localized and other features are mostly independent of the extracted words. This analysis unravels complex relationships between the linguistic structures investigated under our approach and some known biological features.