Anas Al-okaily scite author profile

Anas Al-okaily

4Publications

34Citation Statements Received

61Citation Statements Given

How they've been cited

How they cite others

Affiliations

King Hussein Cancer Center, University of Connecticut, Southern Connecticut State University

Publications

Order By: Most citations

Toward a Better Compression for DNA Sequences Using Huffman Encoding

Al-okaily

Almarri

Yami

et al. 2017

Journal of Computational Biology

View full text Add to dashboard Cite

Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).

show abstract

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads

Al-okaily

2016

BMC Genomics

View full text Add to dashboard Cite

BackgroundCurrent high-throughput sequencing technologies generate large numbers of relatively short and error-prone reads, making the de novo assembly problem challenging. Although high quality assemblies can be obtained by assembling multiple paired-end libraries with both short and long insert sizes, the latter are costly to generate. Recently, GAGE-B study showed that a remarkably good assembly quality can be obtained for bacterial genomes by state-of-the-art assemblers run on a single short-insert library with very high coverage.ResultsIn this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads.ConclusionsWe empirically evaluated this methodology for 8 leading assemblers using 7 GAGE-B bacterial datasets consisting of 100 bp Illumina HiSeq and 250 bp Illumina MiSeq reads, with coverage ranging from 100x– ∼200x. The results show that for all evaluated datasets and using most evaluated assemblers (that were used to assemble the disjoint subsets), HGA leads to a significant improvement in the quality of the assembly based on N50 and corrected N50 metrics.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2515-7) contains supplementary material, which is available to authorized users.

show abstract

Error Tree: A Tree Structure for Hamming and Edit Distances and Wildcards Matching

Al-okaily

2015

Journal of Computational Biology

View full text Add to dashboard Cite

Approximate pattern matching is a fundamental problem in the bioinformatics and information retrieval applications. The problem involves different matching relations such as Hamming distance, edit distances, and the wildcards matching problem. The input is usually a text of length n over a fixed alphabet of length Σ, a pattern of length m, and an integer k. The output is to find all positions that have ≤ k Hamming distance, edit distance, or wildcards matching with P. Many algorithms and indexes have been proposed to solve the problems more efficiently, but due to the space and time complexities of the problems, most tools adopted heuristics approaches based on, for instance, suffix tree, suffix array, or Burrows Wheeler Transform to reach practical implementations. Error Tree is a novel tree structure that is mainly oriented to solve the approximate pattern matching problems, using less space and faster computation time. The algorithm proposes for Hamming distance and wildcards matching a tree structure that needs [Formula: see text] words and takes [Formula: see text] in the average case) of query time for any online/offline pattern, where occ is the number of outputs. In addition, a tree structure of [Formula: see text] words and [Formula: see text] in the average case) query time for edit distance for any online/offline pattern.

show abstract

GeNeo: A Bioinformatics Toolbox for Genomics-Guided Neoepitope Prediction

Al-okaily

Shcheglova

Sherafat

et al. 2023

Journal of Computational Biology

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Anas Al-okaily

Toward a Better Compression for DNA Sequences Using Huffman Encoding

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads

Error Tree: A Tree Structure for Hamming and Edit Distances and Wildcards Matching

GeNeo: A Bioinformatics Toolbox for Genomics-Guided Neoepitope Prediction

Contact Info

Product

Resources

About