Tim Hunkapiller scite author profile

Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.Comparative analysis of primary sequence information is a major tool in the elucidation of the molecular mechanisms of replication and evolution of organisms and the structure and function of proteins. For the simple case of pairwise sequence comparison, good algorithms exist (see refs. 1 and 2 for recent reviews) that can align two sequences of length N in roughly O(N2) steps. Most of these algorithms are based on dynamic programming (3), with location-independent substitution and gap penalties. Unfortunately, when dynamic programming is applied to a family of K sequences its behavior scales like O(NK), exponentially in the number of sequences (4).A number of algorithms have been devised to try to tackle the multiple alignment problem (see refs. 5-7 for some of the most recent ones). Most protein sequence relationships exhibiting >50%o identical residues can be aligned by several of these algorithms. Many of the most interesting protein families, however, exhibit conservation far below 50%o identity.To date, alignment methods have not been developed that can correctly identify all the motifs that define each protein family (2).Here, we apply a different approach, based on hidden Markov models (HMMs), to the problem of modeling and aligning a family by using primary structure information only. Initial results were presented (8). Markov models and the related expectation-maximization (EM) (9) algorithm in statistics have already been applied to biocomputational problems (10-13). Krogh et al. (14) were the first to demonstrate the power of a similar method on the globin family. Rather than starting from pairwise alignments, the approach seeks to take advantage of the massive amount of information typically present in a family with a flexible use of positiondependent parameters. A new algorithm is introduced for the iterative adjustments of the parameters of the models. The algorithm is used here to model three protein families:globins, immunoglobulins, and kinases.tt HMMs and Learning A first-order discrete HMM (15) is completely defined by a set of states S, an alphabet of m symbols, a probability transition matrix T = (tv), and a probability emission matrix E = (eta). When the system is in state i, it has a probability t(/ of moving to state] and a probability eia of emitting symbol a. Only the output s...

show abstract

Mouse T cell antigen receptor: Structure and organization of constant and joining gene segments encoding the β polypeptide

Malissen

Minard

Mjolsness

et al. 1984

Cell

400

183

View full text Add to dashboard Cite

Differential Tolerance Is Induced in T Cells Recognizing Distinct Epitopes of Myelin Basic Protein

et al. 1998

View full text Add to dashboard Cite

Experimental allergic encephalomyelitis (EAE) is induced by T cell-mediated immunity to central nervous system antigens. In H-2u mice, EAE is mediated primarily by T cells specific for residues 1-11 of myelin basic protein (MBP). We demonstrate that differential tolerance to MBP1-11 versus epitopes in MBP121-150 is induced by expression of endogenous MBP, reflecting extreme differences in stability of peptide/MHC complexes. The diverse MBP121-150-specific TCR repertoire can be divided into three fine specificity groups. Two groups were identified in wild-type mice despite extensive tolerance, but the third group was not detected. Activated MBP121-150-specific T cells induce EAE in wild-type mice. Thus, encephalitogenic T cells that escape tolerance either recognize short-lived peptide/MHC complexes or express TCRs with unique specificities for stable complexes.

show abstract

The murine T-cell receptor uses a limited repertoire of expressed Vβ gene segments

Barth

Kim

Lan

et al. 1985

Nature

279

142

View full text Add to dashboard Cite

Only 10 different V beta gene segments were found when the sequences of 15 variable (V beta) genes of the mouse T-cell receptor were examined. From this analysis we calculate that the total number of expressed V beta gene segments may be 21 or fewer, which makes the expressed germline V beta repertoire much smaller than that of immunoglobulin heavy-chain or light-chain genes. We suggest that beta-chain somatic diversification is concentrated at the V beta-D beta-J beta junctions.

show abstract

T cell antigen receptors and the immunoglobulin supergene family

Hood

Kronenberg

Hunkapiller

1985

Cell

425

125

View full text Add to dashboard Cite

Nucleotide sequence of dengue 2 RNA and comparison of the encoded proteins with those of other flaviviruses

et al. 1988

View full text Add to dashboard Cite

Large-Scale and Automated DNA Sequence Determination

Hunkapiller

Kaiser

Koop

et al. 1991

Science

317

118

View full text Add to dashboard Cite

DNA sequence analysis is a multistage process that includes the preparation of DNA, its fragmentation and base analysis, and the interpretation of the resulting sequence information. New technological advances have led to the automation of certain steps in this process and have raised the possibility of large-scale DNA sequencing efforts in the near future [for example, 1 million base pairs (Mb) per year]. New sequencing methodologies, fully automated instrumentation, and improvements in sequencing-related computational resources may render genome-size sequencing projects (100 Mb or larger) feasible during the next 5 to 10 years.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tim Hunkapiller

Peptide Mass Maps: A Highly Informative Approach to Protein Identification

Hidden Markov models of biological primary sequence information.

Mouse T cell antigen receptor: Structure and organization of constant and joining gene segments encoding the β polypeptide

Differential Tolerance Is Induced in T Cells Recognizing Distinct Epitopes of Myelin Basic Protein

The murine T-cell receptor uses a limited repertoire of expressed Vβ gene segments

T cell antigen receptors and the immunoglobulin supergene family

Nucleotide sequence of dengue 2 RNA and comparison of the encoded proteins with those of other flaviviruses

Large-Scale and Automated DNA Sequence Determination

Contact Info

Product

Resources

About