Amino acid substitution matrices from an information theoretic perspective

Altschul, Stephen F.

doi:10.1016/0022-2836(91)90193-a

Cited by 531 publications

(390 citation statements)

References 55 publications

Supporting

Mentioning

383

Contrasting

Unclassified

Order By: Relevance

“…When + I and +2 were added, z-values of -7 were obtained with PAM250. This dramatically lower performance for "biased" scoring matrices is consistent with the interpretation of a scoring matrix as resulting from target values for substitution frequencies (Altschul, 1991(Altschul, , 1993. From this perspective, an offset in the scoring matrix implies a dramatically different expectation for the number of substitutions.…”

Section: Comparison Of Scoring Matrices and Gap Penaltiessupporting

confidence: 73%

“…We examined the PAM250 (Dayhoff et al, 1978) matrix and several modern matrices, including 5093 (Johnson & Overington, 1993), which was derived from comparing structural alignments, Gonnet92 (Gonnet et al, 1992), which was derived from an "allversus-all" comparison of a protein sequence database, and two families of matrices, the BLOSUM family (Henikoff & Henikoff, 1992) and a modern version of the PAM matrices (Jones et al, 1992). Because current statistical theory does not provide any guidance for the selection of gap penalties (Altschul, 1991), a range of gap penalties from -6, -1 to -16, -4 was tested for each matrix (Fig. 4).…”

Section: Comparison Of Scoring Matrices and Gap Penaltiesmentioning

confidence: 99%

“…Predicted growth of sequence databases and the advent of large-scale DNA sequencing projects have prompted increased interest in better methods for comparing protein and DNA sequences. As a result, several rapid biological sequence comparison algorithms (Pearson & Lipman, 1988;Altschul et al, 1990) have become used widely, and there has been considerable discussion of the best scoring parameters for sequence comparison algorithms (Collins et al, 1988;Karlin & Altschul, 1990;Altschul, 1991; Reprint requests to: William R. Pearson, Department of Biochemistry, Jordan Hall #440, University of Virginia, Charlottesville, Virginia 22908; e-mail: wrp@virginia.edu. Gonnet et al, 1992;Henikoff & Henikoff, 1992, 1993Johnson & Overington, 1993).…”

mentioning

confidence: 99%

“…The goal of identifying distant relationships by database search is different from that of finding the most statistically significant sequence similarities I I46 W. R. Pearson (Collins et al, 1988;Altschul, 1991Altschul, , 1993 or of finding the most accurate sequence alignments (Johnson & Overington, 1993;Vingron & Waterman, 1994). Statistical significance may not reflect homology; unrelated sequences may have statistically significant similarities due to sequence convergence, e.g., in transmembrane domains or DNA binding domains.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Comparison of methods for searching protein sequence databases

1995

View full text Add to dashboard Cite

We have compared commonly used sequence comparison algorithms, scoring matrices, and gap penalties using a method that identifies statistically significant differences in performance. Search sensitivity with either the Smith-Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45-55, and optimized gap penalties instead of the conventional PAM250 matrix. More dramatic improvement can be obtained by scaling similarity scores by the logarithm of the length of the library sequence (In()-scaling). With the best modern scoring matrix (BLOSUM55 or 5093) and optimal gap penalties (-12 for the first residue in the gap and -2 for additional residues), Smith-Waterman and FASTA performed significantly better than BLASTP. With In()-scaling and optimal scoring matrices (BLOSUM45 or Gonnet92) and gap penalties (-12, -l), the rigorous Smith-Waterman algorithm performs better than either BLASTP and FASTA, although with the Gonnet92 matrix the difference with FASTA was not significant. Ln()-scaling performed better than normalization based on other simple functions of library sequence length. Ln()-scaling also performed better than scores based on normalized variance, but the differences were not statistically significant for the BLOSUMSO and Gonnet92 matrices. Optimal scoring matrices and gap penalties are reported for Smith-Waterman and FASTA, using conventional or In()-scaled similarity scores. Searches with no penalty for gap extension, or no penalty for gap opening, or an infinite penalty for gaps performed significantly worse than the best methods. Differences in performance between FASTA and Smith-Waterman were not significant when partial query sequences were used. However, the best performance with complete query sequences was obtained with the Smith-Waterman algorithm and In()-scaling.Keywords: BLAST; FASTA; PAM250; sequence similarity; Smith-Waterman The concurrent development of rapid methods for molecular cloning, DNA sequencing, high-performance computer workstations, and rapid protein and DNA sequence comparison algorithms has revolutionized the practice of molecular biology. Newly determined sequences are routinely compared against large sequence databases, and inferences about structure and function are frequently based on sequence similarity. Predicted growth of sequence databases and the advent of large-scale DNA sequencing projects have prompted increased interest in better methods for comparing protein and DNA sequences. As a result, several rapid biological sequence comparison algorithms (Pearson & Lipman, 1988;Altschul et al., 1990) have become used widely, and there has been considerable discussion of the best scoring parameters for sequence comparison algorithms (Collins et al

show abstract

Section: Comparison Of Scoring Matrices and Gap Penaltiessupporting

confidence: 73%

Section: Comparison Of Scoring Matrices and Gap Penaltiesmentioning

confidence: 99%

mentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Comparison of methods for searching protein sequence databases

1995

View full text Add to dashboard Cite

show abstract

“…The non-identical amino acids were scored with PAM 250 and PAM 100 matrices [19] and the gap inclusions were allowed in SmithWaterman searching.…”

Section: Alanine Substitutions Computer Predictions and Homology Searchmentioning

confidence: 99%

Fine specificity of autoantibodies to La/SSB: epitope mapping, and characterization

Tzioufas

Yiannaki

Sakarellos‐Daitsiotis

et al. 1997

Clinical and Experimental Immunology

View full text Add to dashboard Cite

SUMMARYThe B cell epitope mapping of La/SSB was performed using 20 mer synthetic peptides overlapping by eight amino acids covering the whole sequence of the protein. IgG, purified from sera of five patients with systemic lupus erythematosus (SLE) and four sera from patients with primary Sjögren's syndrome (pSS) were tested against the overlapping synthetic peptides. Peptides highly reactive with purified IgG were those spanning the regions 145-164, 289-308, 301-320 and 349-368 364 . Predicted features and molecular similarities of the defined epitopes were investigated using protein databases. The La epitope 147 HKAFKGSI 154 presented 83·3% similarity with the 139 HKGFKGVD 146 region of human myelin basic protein (MBP) and 72% similarity with the fragment YKNFKGTI of human DNA topoisomerase II. Peptides corresponding to these sequences cross-reacted with anti-La/SSB antibodies. Sixty-three sera with anti-La/SSB antibodies from patients with pSS or SLE, 35 sera without anti-La/SSB antibodies from patients with SS or SLE and 41 sera from age/sex-matched healthy blood donors were tested against biotinylated synthetic epitope analogues in order to determine their sensitivity and specificity for the detection of anti-La/SSB antibodies. Anti-La/SSB were detected with various frequencies ranging from 20% to epitope 147 HKAFKGSI 154 to 100% to epitope 349 GSGKGKVQGKKTKF 364 . The overall sensitivity and specificity using all assays with the synthetic peptides were found to be 93·6% and 85·6%, respectively. In conclusion, antibodies to La/SSB constitute a heterogeneous population, directed against different linear B cell epitopes of the molecule. The epitope 147 HKAFKGSI 154 presents molecular similarity with fragments of two other autoantigens, i.e. human MBP and DNA topoisomerase II. Finally, synthetic epitope analogues exhibit high sensitivity and specificity for the detection of anti-La/SSB antibodies.

show abstract