Motivation
We discover that maximality of information content among intervals of Tandem Repeats (TRs) in animal genome segregates over taxa such that taxa identification becomes swift and accurate. Successive TRs of a motif occur at intervals over the sequence, forming a trail of TRs of the motif across the genome. We present a method, Tandem Repeat Information Mining (TRIM), that mines 4k number of TR trails of all k length motifs from a whole genome sequence and extracts the information content within intervals of the trails. TRIM vector formed from the ordered set of interval entropies becomes instrumental for genome segregation.
Results
Reconstruction of correct phylogeny for animals from whole genome sequences proves precision of TRIM. Identification of animal taxa by TRIM vector upon feature selection is the most significant achievement. These suggest Tandem Repeat Interval Pattern (TRIP) is a taxa-specific constitutional characteristic in animal genome.
Availability
Source and executable code of TRIM along with usage manual are made available at https://github.com/BB-BiG/TRIM.
Supplementary information
Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.