2012
DOI: 10.1142/s0219720012500163
|View full text |Cite
|
Sign up to set email alerts
|

Suite of Tools for Statistical N-Gram Language Modeling for Pattern Mining in Whole Genome Sequences

Abstract: Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 47 publications
(31 reference statements)
0
4
0
Order By: Relevance
“…We used the Biological Language Modelling Toolkit (BLMT) (version 2) to identify palindromes in the human genomes [20]. BLMT pre-processes the whole genome sequence into suffix arrays and then computes the longest common prefix array, which make searching for patterns like palindromes very efficient.…”
Section: Methodsmentioning
confidence: 99%
“…We used the Biological Language Modelling Toolkit (BLMT) (version 2) to identify palindromes in the human genomes [20]. BLMT pre-processes the whole genome sequence into suffix arrays and then computes the longest common prefix array, which make searching for patterns like palindromes very efficient.…”
Section: Methodsmentioning
confidence: 99%
“…With the advent of high-throughput technologies for sequencing personal genomes, the computational question can be revisited not only to find the locations of palindromes in the reference genome but also to study the variations exhibited across individuals. We have developed a suite of tools called the Biological Language Modeling Toolkit (BLMT, version 2) for pattern mining in a genome sequence 31 . The tools in BLMT preprocess the genome sequence into a suffix array that is augmented with other data arrays for mining genomic patterns, including palindromes.…”
Section: Palindromic Variations In Disease Susceptibility Locimentioning
confidence: 99%
“…We employed the BLMT (version 2) 31 to identify palindromes and near-palindromes in the individual human genomes. BLMT preprocesses each wholegenome sequence into a suffix array and then computes the longest common prefix array and rank array, thus making pattern searches very efficient.…”
Section: Palindrome Computation With Blmtmentioning
confidence: 99%
See 1 more Smart Citation