2021 Data Compression Conference (DCC) 2021
DOI: 10.1109/dcc50243.2021.00027
|View full text |Cite
|
Sign up to set email alerts
|

PHONI: Streamed Matching Statistics with Multi-Genome References

Abstract: Computing the matching statistics of patterns with respect to a text is a fundamental task in bioinformatics, but a formidable one when the text is a highly compressed genomic database. Bannai et al. gave an efficient solution for this case, which Rossi et al. recently implemented, but it uses two passes over the patterns and buffers a pointer for each character during the first pass. In this paper, we simplify their solution and make it streaming, at the cost of slowing it down slightly. This means that, firs… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
29
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

3
5

Authors

Journals

citations
Cited by 16 publications
(31 citation statements)
references
References 17 publications
1
29
0
Order By: Relevance
“…Now we show how we can compute eMS extending the algorithm presented in Boucher et al [3] while preserving the same space-bound.…”
Section: Computing the Second Longest Matchmentioning
confidence: 92%
See 2 more Smart Citations
“…Now we show how we can compute eMS extending the algorithm presented in Boucher et al [3] while preserving the same space-bound.…”
Section: Computing the Second Longest Matchmentioning
confidence: 92%
“…From now on, we refer to the set of all maximal unique matches between T and P as MUMs. In [3] the authors showed how to compute maximal matches (not necessarily unique neither in T nor P ) in O(r + g) space, where r is the number of runs of the BWT of T and g is the size of the SLP representing the text T . This is achieved by computing the matching statistics, for which we report the definition given in [3].…”
Section: Definitionmentioning
confidence: 99%
See 1 more Smart Citation
“…They used a balanced straight-line program (SLP) for T to support random access in O(log n) time, so MONI finds all MEMs of P with respect to T in O(m log n) time. In a separate paper [4], they and their coauthors observed that if the SLP is used to support longest longest-common-extension (LCE) queries in O(log 2 n) time, then MONI needs only one pass over P and O(m log 2 n) time. They named one-pass implementation PHONI (for "PHony MONI", and because they considered running it on smartPHOnes) because of the increased running time, but later realized that if the SLP is locally consistent as well as balanced then the LCE queries take O(log n) time.…”
Section: Introductionmentioning
confidence: 99%
“…The increase in the amount of highly compressible data that requires efficient processing in the recent years, particularly in the area of computational genomics [3,4], has caused a spike of interest in dictionary compression. Its main idea is to reduce the size of the representation of data by finding repetitions in the input and encoding them as references to other occurrences.…”
Section: Introductionmentioning
confidence: 99%