19Viruses deploy an array of genetically encoded strategies to coopt host machinery and support viral 20 replicative cycles. Molecular mimicry, manifested by structural similarity between viral and endogenous 21 host proteins, allow viruses to harness or disrupt cellular functions including nucleic acid metabolism and 22 modulation of immune responses. Here, we use protein structure similarity to scan for virally encoded 23 structure mimics across thousands of catalogued viruses and hosts spanning broad ecological niches and 24 taxonomic range, including bacteria, plants and fungi, invertebrates and vertebrates. Our survey identified 25 over 6,000,000 instances of structural mimicry, the vast majority of which (>70%) cannot be discerned 26 through protein sequence. The results point to molecular mimicry as a pervasive strategy employed by 27 viruses and indicate that the protein structure space used by a given virus is dictated by the host proteome.
28Interrogation of proteins mimicked by human-infecting viruses points to broad diversification of cellular 29 pathways targeted via structural mimicry, identifies biological processes that may underly autoimmune 30 disorders, and reveals virally encoded mimics that may be leveraged to engineer synthetic metabolic circuits 31 or may serve as targets for therapeutics. Moreover, the manner and degree to which viruses exploit 32 molecular mimicry varies by genome size and nucleic acid type, with ssRNA viruses circumventing 33 limitations of their small genomes by mimicking human proteins to a greater extent than their large dsDNA 34 counterparts. Finally, we identified over 140 cellular proteins that are mimicked by CoV, providing clues 35 about cellular processes driving the pathogenesis of the ongoing COVID-19 pandemic.36 37 42 structure-informed prediction algorithms have allowed discovery of such interactions across all fully 43 sequenced human infecting viruses 1 . Molecular mimicry, manifested by structural similarity between viral 44 and endogenous host proteins, allow viruses to harness or disrupt cellular functions including nucleic acid 45 metabolism and modulation of immune responses. Yet, while examples of this latter strategy pepper the 46 literature 2-4 , most have focused on human infecting viruses 5,6 and a systematic analysis of pathogen-encoded 47 molecular mimics has not been performed.
49The vast genomic landscape occupied by viruses hampers the discovery of evolutionary relationships 50 between viral proteins and their hosts. As is well known however, since 3-dimensional (3D) protein 51 structure is much better conserved than sequence, structural information can be used to interrogate 52 3 evolutionary relationships 1,7 as well as uncover virus-encoded structural mimics that cannot be detected by 53 sequence relationships (see Methods). Here, we use protein structure similarity to identify virally encoded 54 mimics of host proteomes. Briefly, we first employ sequence-based methods to identify proteins that have 55 similar structures to queried viral ...