† These authors contributed equally to this work.
AbstractHere, we developed a novel evolution of protein domains (EvoProDom) model for evolution of proteins, which was based on mix and merge of protein domains. We collected and integrated genomic and proteome data for 109 organisms. These data include protein domain content and orthologous protein families. In EvoProDom, we defined evolutionary events, such as translocations, as reciprocal exchanges of protein domains between orthologous proteins of different organisms. We found that protein domains, which frequently appear in translocation events, were enriched in trans-splicing events, i.e., producing novel transcripts fused from two distinct genes. We presented in EvoProDom, a general method to obtain protein domain content and orthologous protein annotation, by predicting these data from protein sequences using the Pfam search tool and KoFamKOALA, respectively. This method can be implemented in other research such as proteomics, protein design and host-virus interactions.We hypothesized that proteins evolve by means of "mix and merge" or "shuffling" of 113 protein domains, which are distinct functional units (1, 20). The evolutionary model that 114 7 described protein evolution by means of protein domain dynamics was termed 115EvoProDom. The EvoProDom model defined and formulated standard evolutionary 116 mechanisms such as translocations, duplications and indel (insertion and deletion) events, 117 which acted on protein domains that were realized as Pfam domains (6, 7). Therefore, 118proteins, under the EvoProDom model, gained or lost their function based on the 119 presence or absence of respective function-conferring domains. Accordingly, proteins 120 were modeled as sets of protein domains; and evolutionary events, such as translocations, 121 were defined. These describe the gain and loss of particular domains between domain sets 122 or DAs. The KEGG database catalogs diverse taxa and forms groups of orthologous 123 proteins (KO) based on shared function. Thus, all members of the KO group were 124 orthologous proteins (8, 12, 13). Additionally, in the EvoProDom model, proteins were 125 assigned to KO groups (see Materials and Methods). Consequently, translocation events 126 were mapped to groups of organisms according to underlying changes in DA. Thus, 127 evolutionary events, which acted upon domains and manifested as changes in DA were 128 reflected at the organism level. A link between changes at these two levels was therefore 129 established. The EvoProDom model was implemented with and tested on EvoProDomDB 130 (see Materials and Methods). In total, 5,548 translocation events, involving 94 protein 131 super families, excluding an "unknown" super family, were found ( Table 2). This result 132 indicates the existence of multiple evolutionary translocation events as defined by the 133 model. 134Mapping of genes to proteins and alternative splicing 135 EvoProDom combined genomic information (genes) with proteins, and in turn, proteins 136 with Pfam doma...