Tzu-Fang Sheu scite author profile

BackgroundDNA signatures are distinct short nucleotide sequences that provide valuable information that is used for various purposes, such as the design of Polymerase Chain Reaction primers and microarray experiments. Biologists usually use a discovery algorithm to find unique signatures from DNA databases, and then apply the signatures to microarray experiments. Such discovery algorithms require to set some input factors, such as signature length l and mismatch tolerance d, which affect the discovery results. However, suggestions about how to select proper factor values are rare, especially when an unfamiliar DNA database is used. In most cases, biologists typically select factor values based on experience, or even by guessing. If the discovered result is unsatisfactory, biologists change the input factors of the algorithm to obtain a new result. This process is repeated until a proper result is obtained. Implicit signatures under the discovery condition (l, d) are defined as the signatures of length ≤ l with mismatch tolerance ≥ d. A discovery algorithm that could discover all implicit signatures, such that those that meet the requirements concerning the results, would be more helpful than one that depends on trial and error. However, existing discovery algorithms do not address the need to discover all implicit signatures.ResultsThis work proposes two discovery algorithms - the consecutive multiple discovery (CMD) algorithm and the parallel and incremental signature discovery (PISD) algorithm. The PISD algorithm is designed for efficiently discovering signatures under a certain discovery condition. The algorithm finds new results by using previously discovered results as candidates, rather than by using the whole database. The PISD algorithm further increases discovery efficiency by applying parallel computing. The CMD algorithm is designed to discover implicit signatures efficiently. It uses the PISD algorithm as a kernel routine to discover implicit signatures efficiently under every feasible discovery condition.ConclusionsThe proposed algorithms discover implicit signatures efficiently. The presented CMD algorithm has up to 97% less execution time than typical sequential discovery algorithms in the discovery of implicit signatures in experiments, when eight processing cores are used.

show abstract

An algorithm of discovering signatures from DNA databases on a computer cluster

Lee

Sheu

2014

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundSignatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved.ResultsIn this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms.ConclusionsThe algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available athttp://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.

show abstract

Hierarchical multi-pattern matching algorithm for network content inspection

Sheu

Huang

Lee

2008

Information Sciences

View full text Add to dashboard Cite

A parallel router architecture for high speed LAN internetworking

Marimuthu

Viniotis

Sheu³

View full text Add to dashboard Cite

We propose a parallel router architecture for processing networLlayer protocob at FDDI (Fiber Distributed Data Interface) speeds. A t high speeds the computing power of the ezwtang routers becomes the performance bottleneck (for processing small frame razes). Hence a completely diflerent approach is required in designing a router. The opportunities of parallel processing in a network protocol are investigated in thid paper. Several levels of parallel processing are considered and an architecture for the most practical and femible approach w proposed. The concept of a snoopy header cache w introduced. Algorithm for reducing the mean processing delay by balancing the load among the processors are dhcwsed. The performance of the router is evaluated by analytic methods and is compared with simulation results. The results from both the analytic model and the simulator reinforce the choice of a header cache in a multiprocessor environment.

show abstract

NIS04-6: A Time- and Memory- Efficient String Matching Algorithm for Intrusion Detection Systems

Sheu

Huang

Lee

2006

View full text Add to dashboard Cite

Intrusion Detection Systems (IDSs) are known as useful tools for identifying malicious attempts over the network. The most essential part to an IDS is the searching engine that inspects every packet through the network. To strictly defend the protectorate, an IDS must be able to inspect packets at line rate and also provide guaranteed performance even under heavy attacks. Therefore, in this paper we propose an efficient string matching algorithm (named ACM) with compact memory as well as high worst-case performance. Using a magic number heuristic based on the Chinese Remainder Theorem, the proposed ACM significantly reduces the memory requirement without bringing complex processes. Furthermore, the latency of off-chip memory references is drastically reduced. The proposed ACM can be easily implemented in hardware and software. As a result, ACM enables cost-effective and efficient IDSs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tzu-Fang Sheu

A parallel and incremental algorithm for efficient unique signature discovery on DNA databases

An algorithm of discovering signatures from DNA databases on a computer cluster

Hierarchical multi-pattern matching algorithm for network content inspection

A parallel router architecture for high speed LAN internetworking

NIS04-6: A Time- and Memory- Efficient String Matching Algorithm for Intrusion Detection Systems

Contact Info

Product

Resources

About