An algorithm for finding the largest approximately common substructures of two trees

Wang, J.T.L.; Shapiro, Bruce A.; Shasha, Dennis; Zhang, K.; Currey, Kathleen M.

doi:10.1109/34.709622

Cited by 79 publications

(35 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For example, in Chang et al (1998), Wang et al (1996Wang et al ( , 1998 we represented an RNA secondary structure using an ordered labelled tree and designed a tree matching algorithm to find motifs in multiple RNA secondary structures.…”

Section: A Motif Mining Methodsmentioning

confidence: 99%

Design of an RNA structural motif database

Wen¹,

Wang²

2009

IJCIBSB

View full text Add to dashboard Cite

Abstract:In this paper we present the design and implementation of an RNA structural motif database, called RmotifDB. The structural motifs stored in RmotifDB come from three sources:• collected manually from the biomedical literature • submitted by scientists around the world • discovered by a wide variety of motif mining methods.We present here a motif mining method in detail. We also describe the interface and search mechanisms provided by RmotifDB and report its current status. The RmotifDB system is fully operational and accessible on the web at http://datalab.njit.edu/bioinfo/.

show abstract

Section: A Motif Mining Methodsmentioning

confidence: 99%

Design of an RNA structural motif database

Wen¹,

Wang²

2009

IJCIBSB

View full text Add to dashboard Cite

show abstract

“…Edit distance models on unordered trees are considered in [32,29]. Problem variations on rooted and/or unrooted trees are considered in [15,31,26]. Algorithms that calculate local similarity of trees in the tree editing model are presented in [26,28].…”

Section: Previous Workmentioning

confidence: 99%

Local similarity in RNA secondary structures

Höchsmann

Töller

Giegerich

et al.

Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003

171

184

View full text Add to dashboard Cite

We present a systematic treatment of alignment distance and local similarity algorithms on trees and forests. We build upon the tree alignment algorithm for ordered trees given by Jiang et. al (1995) and extend it to calculate local forest alignments, which is essential for finding local similar regions in RNA secondary structures. The time complexity of our algorithm is Motivation IntroductionRNA is a chain molecule, mathematically a string over a four letter alphabet. It is built from nucleotides containing the bases A(denine), C(ytosine), G(uanine), and U(racil). By folding back onto itself, an RNA molecule forms structure, stabilized by the forces of hydrogen bonds between certain pairs of bases (A-U, C-G, G-U), and dense stacking of neighbouring base pairs.The investigation of RNA secondary structures is a challenging task in molecular biology. RNA molecules have a large variety of functions in the cell which often depend on special structural properties. String edit distance [25] clearly is the most successful model in sequence comparison. It is used in document processing, file comparison, molecular sequence analysis, and numerous other applications of approximate string matching. The basic model is that one string is "edited" into another string by a sequence of edit operations, such as single character replacement (R), deletion (D) or insertion (I). The weights associated with the edit operations sum up to an overall score, and the edit sequence giving the minimal score defines the edit distance of the two strings. Equivalently, the editing process, ignoring the order of edit operations, can be represented as an alignment. This equivalence does not generalize to trees, as already mentioned in [1]. For each tree alignment one can construct a corresponding sequence of edit operations, but not vice versa. One can understand editing as finding a largest common sub-structure, while aligning means finding the smallest common superstructure (In fact, this depends on the scoring scheme.). Which model is favourable depends on the problem. Previous workThe first generalization of the edit model from strings to rooted ordered trees is due to [23], algorithmically improved in [31] and implemented and applied to computa-

show abstract

“…The pairwise tree alignment problem can be solved in O(|T1| × |T2| × h1 × h2) time, where |Ti| is the size of tree i and hi is the height of tree i [25]. Wang et al [28] improve upon this algorithm to solve the problem in O(|T1| × |T2| × min(h1, l1) × min(h2, l2)) time, where li is the number of leaves in tree i. Followup work [4] applies the center star approximation algorithm [11] for multiple string alignment in order to approximately align multiple HTML trees.…”

Section: Related Workmentioning

confidence: 99%

“…The algorithm starts from the roots of the MHTs, and traverses recursively through the MHTs in a top-down fashion. For each tree node, we compute the majority consensus for the full-hashes and tag-hashes (lines 10-13): if a majority of the proxies agree on the same full-hash, which indicates that a majority consensus has been reached for the complete subtree rooted by that tree node, then the whole subtree is copied into the final consensus tree (lines 14-16); otherwise, if the corresponding tree nodes in a majority of the summaries have the same tag-hash, we heuristically assume that these tree nodes correspond to the same fragment in the HTML but disagree on the contents, in which case, that tree node is copied into the final consensus tree (lines [18][19][20], and the BFS algorithm will construct the consensus version of the corresponding subtree when the children nodes are visited (lines [21][22][23][24][25][26][27][28]. If neither a tree node's full-hash nor its tag-hash are present in a majority of the MHTs, no consensus can be drawn, and the node is marked as NON-CONSENSUS (lines 29-31).…”

Section: Consensus Constructionmentioning

confidence: 99%

Validating web content with senser

Wilberding

Yates

Sherr

et al. 2013

Proceedings of the 29th Annual Computer Security Applications Conference

View full text Add to dashboard Cite

This paper introduces Senser, a system for validating retrieved web content. Senser does not rely on a PKI and operates even when SSL/TLS is not supported by the web server. Senser operates as a network of proxies located at different vantage points on the Internet. Clients query a random subset of Senser proxies for compact descriptions of a desired web page, and apply consensus and matching algorithms to the returned results to locally render a "majority" web page. To ensure diverse selections of proxies (and consequently decrease an adversary's ability to manipulate a majority of the proxies' requests), Senser leverages Internet mapping systems that accurately predict AS-level paths between available proxies and the desired web page. We demonstrate using a deployment of Senser on Amazon EC2 that Senser detects and mitigates attempts by adversaries to manipulate web content -even when controlling large collections of autonomous systems -while maintaining reasonable performance overheads.

show abstract

An algorithm for finding the largest approximately common substructures of two trees

Cited by 79 publications

References 20 publications

Design of an RNA structural motif database

Design of an RNA structural motif database

Local similarity in RNA secondary structures

Validating web content with senser

Contact Info

Product

Resources

About