Page segmentation using thinning of white areas

Kise, Koichi; Yanagida, Osamu

doi:10.1002/(sici)1520-684x(199803)29:3<59::aid-scj6>3.0.co;2-o

Search citation statements

Order By: Relevance

Paper Sections

Select...

Introduction1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2004

2012

Publication Types

Select...

Article2

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

(1 citation statement)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Alternatively, the algorithms may be defined as background approaches [10,11]. A description of the background can provide good segmentation for complex document layouts, though narrow (or zero) gaps between separate components will prevent proper segmentation.…”

Section: Introductionmentioning

confidence: 99%

Newspaper layout analysis incorporating connected component separation

Mitchell¹,

Yan²

2004

Image and Vision Computing

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

Newspaper layout analysis incorporating connected component separation

Mitchell¹,

Yan²

2004

Image and Vision Computing

View full text Add to dashboard Cite

Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees

Mahmud¹,

Wiedenhoeft²,

Schliep

2012

Bioinformatics

View full text Add to dashboard Cite

Motivation: Mapping billions of reads from next generation sequencing experiments to reference genomes is a crucial task, which can require hundreds of hours of running time on a single CPU even for the fastest known implementations. Traditional approaches have difficulties dealing with matches of large edit distance, particularly in the presence of frequent or large insertions and deletions (indels). This is a serious obstacle both in determining the spectrum and abundance of genetic variations and in personal genomics.Results: For the first time, we adopt the approximate string matching paradigm of geometric embedding to read mapping, thus rephrasing it to nearest neighbor queries in a q-gram frequency vector space. Using the L1 distance between frequency vectors has the benefit of providing lower bounds for an edit distance with affine gap costs. Using a cache-oblivious kd-tree, we realize running times, which match the state-of-the-art. Additionally, running time and memory requirements are about constant for read lengths between 100 and 1000 bp. We provide a first proof-of-concept that geometric embedding is a promising paradigm for read mapping and that L1 distance might serve to detect structural variations. TreQ, our initial implementation of that concept, performs more accurate than many popular read mappers over a wide range of structural variants.Availability and implementation: TreQ will be released under the GNU Public License (GPL), and precomputed genome indices will be provided for download at http://treq.sf.net.Contact: pavelm@cs.rutgers.eduSupplementary information: Supplementary data are available at Bioinformatics online.

show abstract

Page segmentation using thinning of white areas

Cited by 2 publications

References 7 publications

Newspaper layout analysis incorporating connected component separation

Newspaper layout analysis incorporating connected component separation

Indel-tolerant read mapping with trinucleotide frequencies using cache-oblivious kd-trees

Contact Info

Product

Resources

About