The function of the SARS-CoV-2 accessory protein p6, encoded by ORF6, is not fully known. Based upon its similarity to p6 from SARS-CoV, it may play a similar role, namely as an antagonist of type I interferon (IFN) signaling. Here we report the sequencing of a SARS-CoV-2 strain passaged six times after original isolation from a clinical patient in Hong Kong. The genome sequence shows a 27 nt in-frame deletion (Δ27,264-27,290) within ORF6, predicted to result in a 9 aa deletion (ΔFKVSIWNLD) from the central portion of p6. This deletion is predicted to result in a dramatic alteration in the three-dimensional structure of the resultant protein (p6 Δ22-30 ), possibly with significant functional implications. Analysis of the original clinical sample indicates that the deletion was not present, while sequencing of subsequent passages of the strain identifies the deletion as a majority variant. This suggests that the deletion originated ab initio during passaging and subsequently propagated into the majority, possibly due to the removal of selective pressure through the IFN-deficient Vero E6 cell line. The specific function of the SARS-CoV-2 p6 N-terminus, if any, has not yet been determined. However, this deletion is predicted to cause a shift from N-endo to N-ecto in the transmembrane localization of the SARS-CoV-2 p6 Δ22-30 N-terminus, possibly leading to the ablation of its native function.
SARS-CoV-2 pathogenesis, vaccine, and therapeutic studies rely on the use of animals challenged with highly pathogenic virus stocks produced in cell cultures. Ideally, these virus stocks should be genetically and functionally similar to the original clinical isolate, retaining wild-type properties to be reliably used in animal model studies. It is well-established that SARS-CoV-2 isolates serially passaged on Vero cell lines accumulate mutations and deletions in the furin cleavage site; however, these can be eliminated when passaged on Calu-3 lung epithelial cell lines, as presented in this study. As numerous stocks of SARS-CoV-2 variants of concern are being grown in cell cultures with the intent for use in animal models, it is essential that propagation methods generate virus stocks that are pathogenic in vivo. Here, we found that the propagation of a B.1.351 SARS-CoV-2 stock on Calu-3 cells eliminated viruses that previously accumulated mutations in the furin cleavage site. Notably, there were alternative variants that accumulated at the same nucleotide positions in virus populations grown on Calu-3 cells at multiple independent facilities. When a Calu-3-derived B.1.351 virus stock was used to infect hamsters, the virus remained pathogenic and the Calu-3-specific variants persisted in the population. These results suggest that Calu-3-derived virus stocks are pathogenic but care should still be taken to evaluate virus stocks for newly arising mutations during propagation.
Lack of data provenance negatively impacts scientific reproducibility and the reliability of genomic data. The ATCC Genome Portal ( https://genomes.atcc.org ) addresses this by providing data provenance information for microbial whole-genome assemblies originating from authenticated biological materials. To date, we have sequenced 1,579 complete genomes, including 466 type strains and 1,156 novel genomes.
The traceability of microbial genomics data to authenticated physical biological materials is not a requirement for depositing these data into public genome databases. This creates significant risks for the reliability and data provenance of these important genomics research resources, the impact of which is not well understood.
The quality and traceability of microbial genomics data in public databases is deteriorating as they rapidly expand and struggle to cope with data curation challenges. While the availability of public genomic data has become essential for modern life sciences research, the curation of the data is a growing area of concern that has significant real-world impacts on public health epidemiology, drug discovery, and environmental biosurveillance research1–6. While public microbial genome databases such as NCBI’s RefSeq database leverage the scalability of crowd sourcing for growth, they do not require data provenance to the original biological source materials or accurate descriptions of how the data was produced7. Here, we describe the de novo assembly of 1,113 bacterial genome references produced from authenticated materials sourced from the American Type Culture Collection (ATCC), each with full data provenance. Over 98% of these ATCC Standard Reference Genomes (ASRGs) are superior to assemblies for comparable strains found in NCBI’s RefSeq database. Comparative genomics analysis revealed significant issues in RefSeq bacterial genome assemblies related to genome completeness, mutations, structural differences, metadata errors, and gaps in traceability to the original biological source materials. For example, nearly half of RefSeq assemblies lack details on sample source information, sequencing technology, or bioinformatics methods. We suggest there is an intrinsic connection between the quality of genomic metadata, the traceability of the data, and the methods used to produce them with the quality of the resulting genome assemblies themselves. Our results highlight common problems with “ reference genomes” and underscore the importance of data provenance for precision science and reproducibility. These gaps in metadata accuracy and data provenance represent an “ elephant in the room” for microbial genomics research, but addressing these issues would require raising the level of accountability for data depositors and our own expectations of data quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.