Forensic analysis of novel SARS2r-CoV identified in game animal datasets in China shows evolutionary relationship to Pangolin GX CoV clade and apparent genetic experimentation
Abstract:Pangolins are the only animals other than bats proposed to have been infected with SARS-CoV-2related coronaviruses (SARS2r-CoVs) prior to the COVID-19 pandemic. Here we examine thenovel SARS2r-CoV we previously identified in game animal metatranscriptomic datasetssequenced by He et al. (2022) and find that sections of the partial genome phylogenetically groupwith Guangxi (GX) pangolin CoVs (GX PCoVs), while the full RdRp sequence groups with bat-SL-CoVZC45. While the novel SARS2r-CoV is found in 6 pangolin dat… Show more
“…It is of further concern that out of nine published GX PCoVs, only one unfiltered/non highly enriched pangolin tissue SRA dataset has been provided to support assembly of a GX PCoV, GX_P3B, a partial genome with 86% coverage of GX_P2V [24,25]. The dataset is of low quality with read lengths highly skewed to very short lengths, and is contaminated with SARS-CoV-2 reads.…”
Section: Discussionmentioning
confidence: 99%
“…Consequently, it has been proposed that SARS-CoV-2 acquired its RBD via recombination with, or from an ancestor in common with a GD PCoV [23]. However, the SARS2r-CoV in the Liu et al datasets may be contamination related rather than pangolin hosted given the low number of SARS2r-CoV reads, presence of human genomic sequences, presence of non-pangolin hosted virus sequences in similar abundance as SARS2r-CoV sequences, and correlation of of SARS2r-CoV sequences with high bacterial content [24,25].…”
Section: Introductionmentioning
confidence: 95%
“…For phylogenetic analysis the following workflow was used: 100 SARSr-CoV genomes with highest Blastn percentage identity to each of the assembled NSP10 + RdRp region, NSP4 region and bat-SL-CoVZC45 were de-duplicated. The GX_WIV [28] genome sequence was added to the set. The sequences were then aligned using the MUSCLE algorithm in UGENE with default settings.…”
Section: Phylogenetic Analysesmentioning
confidence: 99%
“…GX_ZC45r-CoV exhibits a basal sister relationship to GX CoVs with unanimous support. GX_WIV [28] exhibits a more divergent sequence in this region than related GX CoVs (Figure 6). A maximum likelihood tree implemented using PhyML using a GTR+G+I model shows the same basal sister relationship of GX_ZC45r-CoV to GX PCoVs (Supplementary Figure S24a).…”
Section: Phylogenetic Analysismentioning
confidence: 99%
“…GX PCoVs were first reported by Lam et al [26] in February 2020 from analysis of frozen tissue samples collected in Guangxi province between 2017 and 2018. The GX PCoVs form a separate clade to GD PCoVs and are more distantly related to SARS-CoV-2 [24,[26][27][28]. The spike proteins of GX PCoVs have a higher amino acid similarity in the S1 N-Terminal Domain (NTD) to SARS-CoV-2 than GD PCoVs, but conversely a lower similarity in the RBD to SARS-CoV-2 and GD PCoVs (Supplementary Figures S1 and S2).…”
Pangolins are the only animals other than bats proposed to have been infected with SARS-CoV-2 related coronaviruses (SARS2r-CoVs) prior to the COVID-19 pandemic. Here, we examine the novel SARS2r-CoV we previously identified in game animal metatranscriptomic datasets sequenced by the Nanjing Agricultural University in 2022, and find that sections of the partial genome phylogenetically group with Guangxi pangolin CoVs (GX PCoVs), while the full RdRp sequence groups with bat-SL-CoVZC45. While the novel SARS2r-CoV is found in 6 pangolin datasets, it is also found in 10 additional NGS datasets from 5 separate mammalian species and is likely related to contamination by a laboratory researched virus. Absence of bat mitochondrial sequences from the datasets, the fragmentary nature of the virus sequence and the presence of a partial sequence of a cloning vector attached to a SARS2r-CoV read suggests that it has been cloned. We find that NGS datasets containing the novel SARS2r-CoV are contaminated with significant Homo sapiens genetic material, and numerous viruses not associated with the host animals sampled. We further identify the dominant human haplogroup of the contaminating H. sapiens genetic material to be F1c1a1, which is of East Asian provenance. The association of this novel SARS2r-CoV with both bat CoV and the GX PCoV clades is an important step towards identifying the origin of the GX PCoVs.
“…It is of further concern that out of nine published GX PCoVs, only one unfiltered/non highly enriched pangolin tissue SRA dataset has been provided to support assembly of a GX PCoV, GX_P3B, a partial genome with 86% coverage of GX_P2V [24,25]. The dataset is of low quality with read lengths highly skewed to very short lengths, and is contaminated with SARS-CoV-2 reads.…”
Section: Discussionmentioning
confidence: 99%
“…Consequently, it has been proposed that SARS-CoV-2 acquired its RBD via recombination with, or from an ancestor in common with a GD PCoV [23]. However, the SARS2r-CoV in the Liu et al datasets may be contamination related rather than pangolin hosted given the low number of SARS2r-CoV reads, presence of human genomic sequences, presence of non-pangolin hosted virus sequences in similar abundance as SARS2r-CoV sequences, and correlation of of SARS2r-CoV sequences with high bacterial content [24,25].…”
Section: Introductionmentioning
confidence: 95%
“…For phylogenetic analysis the following workflow was used: 100 SARSr-CoV genomes with highest Blastn percentage identity to each of the assembled NSP10 + RdRp region, NSP4 region and bat-SL-CoVZC45 were de-duplicated. The GX_WIV [28] genome sequence was added to the set. The sequences were then aligned using the MUSCLE algorithm in UGENE with default settings.…”
Section: Phylogenetic Analysesmentioning
confidence: 99%
“…GX_ZC45r-CoV exhibits a basal sister relationship to GX CoVs with unanimous support. GX_WIV [28] exhibits a more divergent sequence in this region than related GX CoVs (Figure 6). A maximum likelihood tree implemented using PhyML using a GTR+G+I model shows the same basal sister relationship of GX_ZC45r-CoV to GX PCoVs (Supplementary Figure S24a).…”
Section: Phylogenetic Analysismentioning
confidence: 99%
“…GX PCoVs were first reported by Lam et al [26] in February 2020 from analysis of frozen tissue samples collected in Guangxi province between 2017 and 2018. The GX PCoVs form a separate clade to GD PCoVs and are more distantly related to SARS-CoV-2 [24,[26][27][28]. The spike proteins of GX PCoVs have a higher amino acid similarity in the S1 N-Terminal Domain (NTD) to SARS-CoV-2 than GD PCoVs, but conversely a lower similarity in the RBD to SARS-CoV-2 and GD PCoVs (Supplementary Figures S1 and S2).…”
Pangolins are the only animals other than bats proposed to have been infected with SARS-CoV-2 related coronaviruses (SARS2r-CoVs) prior to the COVID-19 pandemic. Here, we examine the novel SARS2r-CoV we previously identified in game animal metatranscriptomic datasets sequenced by the Nanjing Agricultural University in 2022, and find that sections of the partial genome phylogenetically group with Guangxi pangolin CoVs (GX PCoVs), while the full RdRp sequence groups with bat-SL-CoVZC45. While the novel SARS2r-CoV is found in 6 pangolin datasets, it is also found in 10 additional NGS datasets from 5 separate mammalian species and is likely related to contamination by a laboratory researched virus. Absence of bat mitochondrial sequences from the datasets, the fragmentary nature of the virus sequence and the presence of a partial sequence of a cloning vector attached to a SARS2r-CoV read suggests that it has been cloned. We find that NGS datasets containing the novel SARS2r-CoV are contaminated with significant Homo sapiens genetic material, and numerous viruses not associated with the host animals sampled. We further identify the dominant human haplogroup of the contaminating H. sapiens genetic material to be F1c1a1, which is of East Asian provenance. The association of this novel SARS2r-CoV with both bat CoV and the GX PCoV clades is an important step towards identifying the origin of the GX PCoVs.
HKU4-related coronaviruses (CoVs) are merbecoviruses related to Middle Eastern Respiratory Syndrome coronavirus (MERS-CoV). In 2022 and 2023, two HKU4-related CoV strains were discovered in Manis javanica (Malayan pangolin) metagenomic datasets derived from organ samples: HKU4-P251T and MjHKU4r-CoV-1. Together with the Tylonycteris robustula bat CoV 162275, which was discovered in 2022, pangolin CoVs HKU4-P251T and MjHKU4r-CoV-1 form a novel phylogenetic clade distinct from all previously documented HKU4-related CoVs. In this study, we identified a novel HKU4-related CoV in a pangolin single-cell sequencing dataset generated by BGI-Shenzhen in Shenzhen, Guangdong, China in 2020. The CoV phylogenetically belongs to the same newly identified clade. The single cell datasets were reported as generated from organ samples of a single pangolin that died of natural causes. 98% of the HKU4-related CoV reads were found in only one of the seven single cell datasets -- a large intestine cell dataset, cells of which exhibit low expression of DPP4. Bacterial contamination was found to be moderately correlated with HKU4-related CoV presence. We further identified with high confidence that the RNA-Seq dataset supporting one of four near identical variants of MjHKU4r-CoV-1 is a Sus scrofa (wild pig) metagenomic dataset, with only a trace level of Manis javanica genomic content. The presence of HKU4-related CoV reads in the dataset are almost certainly laboratory research-related and not from a premortal pangolin or pig infection. Our findings raise concerns about the provenance of the novel HKU4-related CoV we identify here, MjHKU4r-CoV-1 and its four near-identical variants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.