A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

Hamada, Michiaki; Asai, Kiyoshi

doi:10.1089/cmb.2011.0197

Cited by 12 publications

(18 citation statements)

References 88 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that CentroidAlign employs an estimator based on maximum expected accuracy (MEA), which has been successfully applied in much software in the field of bioinformatics; see the review by Hamada and Asai [24] for details. In CentroidAlign, the sum-of-pair scores (SPS) [25] is optimized for predicting multiple alignments of RNA sequences (cf.…”

Section: Methodsmentioning

confidence: 99%

“…( a ) The input is two RNA sequences, ( x; x ′ ), to be aligned; ( b-1 ) The exact algorithm of CentroidAlign considers a probability distribution of structural alignments between x and x ′ , which gives simultaneously the alignments between nucleotides and those between base-pairs (e.g., Sankoff model [4]); ( b-2 ) The exact case can be approximated by factorizing the distribution of structural alignments into (i) a distribution of secondary structures of x (e.g., the CONTRAfold [27] or McCaskill [22] models); (ii) a distribution of pairwise alignments between x and x ′ (e.g., the CONTRAlign model [23]); and (iii) a distribution of secondary structures of x ′ ; ( c ) By marginalization of the distribution(s) in (b), we obtain a distribution of alignments (*) in which the information about secondary structures is included; ( d ) The best multiple alignment is estimated based on maximizing expected accuracy (MEA) [24] in which the SPS scores of predicted alignments are optimized with respect to the distribution (*) of pairwise alignments given in (c). It should be emphasized that the computational cost of the exact algorithm is ≈ O ( L 6 ), while it is reduced to ≈ O ( L 3 ) in the approximate algorithm, where L is the (maximum) length of two input sequences.…”

Section: Figure A1mentioning

confidence: 99%

See 1 more Smart Citation

CentroidAlign-Web: A Fast and Accurate Multiple Aligner for Long Non-Coding RNAs

Yonemoto

Asai

Hamada

2013

IJMS

Self Cite

View full text Add to dashboard Cite

Due to the recent discovery of non-coding RNAs (ncRNAs), multiple sequence alignment (MSA) of those long RNA sequences is becoming increasingly important for classifying and determining the functional motifs in RNAs. However, not only primary (nucleotide) sequences, but also secondary structures of ncRNAs are closely related to their function and are conserved evolutionarily. Hence, information about secondary structures should be considered in the sequence alignment of ncRNAs. Yet, in general, a huge computational time is required in order to compute MSAs, taking secondary structure information into account. In this paper, we describe a fast and accurate web server, called CentroidAlign-Web, which can handle long RNA sequences. The web server also appropriately incorporates information about known secondary structures into MSAs. Computational experiments indicate that our web server is fast and accurate enough to handle long RNA sequences. CentroidAlign-Web is freely available from http://centroidalign.ncrna.org/.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Figure A1mentioning

confidence: 99%

CentroidAlign-Web: A Fast and Accurate Multiple Aligner for Long Non-Coding RNAs

Yonemoto

Asai

Hamada

2013

IJMS

Self Cite

View full text Add to dashboard Cite

show abstract

“…where G(θ, y) is called the gain function, which returns the similarity between two solutions in Y . When the gain function is designed according to an accuracy or evaluation measure for the target problem, in which y and θ are considered as a prediction and reference, respectively, the estimator is often called a maximum expected accuracy (MEA) estimator [57,58,59] 6 . MEA estimators predict the solution by maximizing the expected accuracy when the solutions are distributed according to p(θ|D).…”

Section: Definitionmentioning

confidence: 99%

“…In addition to the above examples, many algorithms in bioinformatics can be classified, from the viewpoint of MEA/MEG, with respect to gain function and predictive space. See [59] for a review of MEA estimators.…”

Section: Other Examplesmentioning

confidence: 99%

Fighting against uncertainty: an essential issue in bioinformatics

Hamada¹

2013

Briefings in Bioinformatics

Self Cite

View full text Add to dashboard Cite

Many bioinformatics problems, such as sequence alignment, gene prediction, phylogenetic tree estimation and RNA secondary structure prediction, are often affected by the 'uncertainty' of a solution, that is, the probability of the solution is extremely small. This situation arises for estimation problems on high-dimensional discrete spaces in which the number of possible discrete solutions is immense. In the analysis of biological data or the development of prediction algorithms, this uncertainty should be handled carefully and appropriately. In this review, I will explain several methods to combat this uncertainty, presenting a number of examples in bioinformatics. The methods include (i) avoiding point estimation, (ii) maximum expected accuracy (MEA) estimations and (iii) several strategies to design a pipeline involving several prediction methods. I believe that the basic concepts and ideas described in this review will be generally useful for estimation problems in various areas of bioinformatics.

show abstract

“…Many other decoding criteria were proposed (Hamada and Asai, 2012). For example, we can assign labels to states of the HMM, and then search for the most probable sequence of labels instead of the most probable state path.…”

Section: Introductionmentioning

confidence: 99%

Sequence annotation with HMMs: New problems and their complexity

Nánási

Vinař

Brejová

2015

Information Processing Letters

View full text Add to dashboard Cite

Hidden Markov models (HMMs) and their variants were successfully used for several sequence annotation tasks. Traditionally, inference with HMMs is done using the Viterbi and posterior decoding algorithms. However, recently a variety of different optimization criteria and associated computational problems were proposed. In this paper, we consider three HMM decoding criteria and prove their NP hardness. These criteria consider the set of states used to generate a certain sequence, but abstract from the exact locations of regions emitted by individual states. We also illustrate experimentally that these criteria are useful for HIV recombination detection.

show abstract

A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

Cited by 12 publications

References 88 publications

CentroidAlign-Web: A Fast and Accurate Multiple Aligner for Long Non-Coding RNAs

CentroidAlign-Web: A Fast and Accurate Multiple Aligner for Long Non-Coding RNAs

Fighting against uncertainty: an essential issue in bioinformatics

Sequence annotation with HMMs: New problems and their complexity

Contact Info

Product

Resources

About