Gıyasettin Özcan scite author profile

Abstract:We introduce a fast bitwise exact pattern-matching algorithm, which speeds up short-length pattern searches on large-sized DNA databases. Our contributions are two-fold. First, we introduce a novel exact matching algorithm designed specifically for modern processor architectures. Second, we conduct a detailed comparative performance analysis of bitwise exact matching algorithms by utilizing hardware counters. Our algorithmic technique is based on condensed bitwise operators and multifunction variables, which minimize register spills and instruction counts during searches. In addition, the technique aims to efficiently utilize CPU branch predictors and to ensure smooth instruction flow through the processor pipeline. Analyzing letter occurrence probability estimations for DNA databases, we develop a skip mechanism to reduce memory accesses. For comparison, we exploit the complete Mus musculus sequence, a commonly used DNA sequence that is larger than 2 GB. Compared to five state-of-the-art pattern-matching algorithms, experimental results show that our technique outperforms the best algorithm even for the worst-case DNA pattern for our technique.

show abstract

Unsupervised Learning from Multi-Dimensional Data: A Fast Clustering Algorithm Utilizing Canopies and Statistical Information

Özcan

2018

Int. J. Info. Tech. Dec. Mak.

View full text Add to dashboard Cite

In this study, we consider unsupervised learning from multi-dimensional dataset problem. Particularly, we consider [Formula: see text]-means clustering which require long duration time during execution of multi-dimensional datasets. In order to speed up clustering in an accurate form, we introduce a new algorithm, that we term Canopy[Formula: see text]. The algorithm utilizes canopies and statistical techniques. Also, its efficient initiation and normalization methodologies contributes to the improvement. Furthermore, we consider early termination cases of clustering computation, provided that an intermediate result of the computation is accurate enough. We compared our algorithm with four popular clustering algorithms. Results denote that our algorithm speeds up the clustering computation by at least 2X. Also, we analyzed the contribution of early termination. Results present that further 2X improvement can be obtained while incurring 0.1% error rate. We also observe that our Canopy[Formula: see text] algorithm benefits from early termination and introduces extra 1.2X performance improvement.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Gıyasettin Özcan

Estimation of compressive strength of BFS and WTRP blended cement mortars with machine learning models

Melody Extraction on MIDI Music Files

Comparative study of hyperspectral image classification by multidimensional Convolutional Neural Network approaches to improve accuracy

Fast bitwise pattern-matching algorithm for DNA sequences on modern hardware

Unsupervised Learning from Multi-Dimensional Data: A Fast Clustering Algorithm Utilizing Canopies and Statistical Information

Contact Info

Product

Resources

About