2021
DOI: 10.1002/prot.26050
|View full text |Cite
|
Sign up to set email alerts
|

New amino acid substitution matrix brings sequence alignments into agreement with structure matches

Abstract: Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions represen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
21
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2

Relationship

2
6

Authors

Journals

citations
Cited by 16 publications
(27 citation statements)
references
References 63 publications
0
21
0
Order By: Relevance
“…CPU architectures, vector-based operations can be done in one CPU cycle with SIMD extensions. As a result, the distance calculation can be performed in constant O (1) time. This is much better compared to O ( n 2 ) time required for the DTW method.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…CPU architectures, vector-based operations can be done in one CPU cycle with SIMD extensions. As a result, the distance calculation can be performed in constant O (1) time. This is much better compared to O ( n 2 ) time required for the DTW method.…”
Section: Resultsmentioning
confidence: 99%
“…Some of the failures are well known where there are nearly identical 3D structures, but usual sequence matches fail. In a recent work, we have recently solved this problem and now obtain sequence matching closely corresponding to the structure matches (1). Evolutionary information that is used there is key to improved sequence analyses.…”
Section: Introductionmentioning
confidence: 99%
“…As shown in Table 3, most of the depicted nucleotide substitutions are synonymous and thus do not incur changes in the amino acid sequences. However, some nucleotide substitutions incurred amino acid changes including radical or conservative substitutions in the VP2, VP3, VP5 and NS1 (Table 3) as defined by the amino acid exchangeability matrices [35]. There were no insertions or deletions in any of the sequences.…”
Section: The Full-length Sequence Of Btv-1rg C7 Genomementioning
confidence: 99%
“…The pipelines were largely standard for the field but notably did not include features or optimizations for deep sequence divergence. Using the typical tools and settings as a reference pipeline, we explored diverse software and parameters using Metazoa3 RGS sequences (Supplemental Table 1; Supplemental Material 1; Supplemental Material 2), including newly available tools and features that had not been previously tested, like the MAFFT -dash option for generation of structural alignment seeds by DASH within MAFFT to improve alignment of divergent sequences (Rozewicki et al 2019), the ProtSub substitution matrix that is based on deeply divergence sequences (Jia & Jernigan 2021), and novel Clipkit software for alignment trimming and the often strong to superior performance of untrimmed alignments (Steenwyk et al 2020). We produced hundreds of trees whose family branching structure in the superfamily tree was evaluated (Figure 2) and arrived at the following subset for a second round of more rigorous optimization and pipeline selection: (MAFFT vs MAFFT-DASH alignment) x ( full-length vs pore region sequences) x (untrimmed vs TrimAl vs ClipKit -smartgappy alignment trimming) x (Blosum30 vs Blosum45 vs Blosum62 vs Prot2021 substitution matrix) x (IQTREE2 vs FastTree2 tree building).…”
Section: Giganticatp Metazoa3 Rgs Optimization For Tree Buildingmentioning
confidence: 99%
“…In particular, given low sequence identity between TRP families (see Results), sensitivity of alignment to low-sequence identity, and sensitivity of trees to alignment, and because alignment has not been explicitly examined in previous TRP studies, we explored sequence alignment parameters, using giganticATP Baseline and RGS TRP and RGS TRP-X reference gene sets. We tested 1) full-length vs. pore region sequences, 2) sequence-based aligner MAFFT vs. sequence-structure-based aligners Promals3D (http://prodata.swmed.edu/promals3d/promals3d.php) (Pei & Grishin 2014) and MAFFT-Dash (https://mafft.cbrc.jp/alignment/server/) (Rozewicki et al 2019) and also the structure-only aligner mTM-align (https://yanglab.nankai.edu.cn/mTM-align/) (Dong et al 2018), which was used to provide a constraint tree to Promals3D, 3) amino acid substitution matrices, including BLOSUM30, BLOSUM45, and BLOSUM62, and structure-based ProtSub (Jia & Jernigan 2021), and 4) alignment trimming as untrimmed, TrimAl (Capella-GutiĂ©rrez et al 2009) trimmed, and ClipKit (Steenwyk et al 2020) trimmed. Structural data sets for mTM-align, Promals3D, and MAFFT-Dash were selected by searching RCSB PDB (https://www.rcsb.org/) (Berman et al 2000) for family name (except TRPA1 was used for TRPA) and selecting the dataset with the top structural score when more than one was available.…”
Section: Giganticatp Alignment Optimizationmentioning
confidence: 99%