Ranjan Sinha scite author profile

We present SHREC, a new algorithm for correcting errors in short-read data that uses a generalized suffix trie on the read data as the underlying data structure. Our results show that the method can identify erroneous reads with sensitivity and specificity of over 99% and 96% for simulated data with error rates of up to 3% as well as for real data. Furthermore, it achieves an error correction accuracy of over 80% for simulated data and over 88% for real data. These results are clearly superior to previously published approaches. SHREC is available as an efficient open-source Java implementation that allows processing of 10 million of short reads on a standard workstation.

show abstract

Cache-conscious sorting of large sets of strings with dynamic tries

Sinha

Zobel

2004

ACM J. Exp. Algorithmics

View full text Add to dashboard Cite

Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing string-sorting algorithms as they do not utilise cache particularly well for large data sets. We propose a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets. It is simple, fast, and efficient. We experimentally compare burstsort to existing string-sorting algorithms on large and small sets of strings with a range of characteristics.These experiments show that, for large sets of strings, burstsort is almost twice as fast as any previous algorithm, due primarily to a lower rate of cache miss.

show abstract

Improving suffix array locality for fast pattern matching on disk

Sinha

Puglisi

Moffat

et al. 2008

View full text Add to dashboard Cite

The suffix tree (or equivalently, the enhanced suffix array) provides efficient solutions to many problems involving pattern matching and pattern discovery in large strings, such as those arising in computational biology. Here we address the problem of arranging a suffix array on disk so that querying is fast in practice. We show that the combination of a small trie and a suffix array-like blocked data structure allows queries to be answered as much as three times faster than the best alternative disk-based suffix array arrangement. Construction of our data structure requires only modest processing time on top of that required to build the suffix tree, and requires negligible extra memory.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ranjan Sinha

SHREC: a short-read error correction method

Cache-conscious sorting of large sets of strings with dynamic tries

Improving suffix array locality for fast pattern matching on disk

Contact Info

Product

Resources

About