As NLM’s Conserved Domain Database (CDD) enters its 20th year of operations as a publicly available resource, CDD curation staff continues to develop hierarchical classifications of widely distributed protein domain families, and to record conserved sites associated with molecular function, so that they can be mapped onto user queries in support of hypothesis-driven biomolecular research. CDD offers both an archive of pre-computed domain annotations as well as live search services for both single protein or nucleotide queries and larger sets of protein query sequences. CDD staff has continued to characterize protein families via conserved domain architectures and has built up a significant corpus of curated domain architectures in support of naming bacterial proteins in RefSeq. These architecture definitions are available via SPARCLE, the Subfamily Protein Architecture Labeling Engine. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.
The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, >122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at https://www.ncbi.nlm.nih.gov/refseq/.
The Conserved Domain Database (CDD) is a freely available resource for the annotation of sequences with the locations of conserved protein domain footprints, as well as functional sites and motifs inferred from these footprints. It includes protein domain and protein family models curated in house by CDD staff, as well as imported from a variety of other sources. The latest CDD release (v3.17, April 2019) contains more than 57,000 domain models, of which almost 15,000 were curated by CDD staff. The CDD curation effort increases coverage and provides finer‐grained classifications of common and widely distributed protein domain families, for which a wealth of functional and structural data have become available. The CDD maintains both live search capabilities and an archive of pre‐computed domain annotations for a selected subset of sequences tracked by the NCBI's Entrez protein database. These can be retrieved or computed for a single sequence using CD‐Search or in bulk using Batch CD‐Search, or computed via standalone RPS‐BLAST plus the rpsbproc software package. The CDD can be accessed via https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. The three protocols listed here describe how to perform a CD‐Search (Basic Protocol 1), a Batch CD‐Search (Basic Protocol 2), and a Standalone RPS‐BLAST and rpsbproc (Basic Protocol 3). © 2019 The Authors. Basic Protocol 1: CD‐search Basic Protocol 2: Batch CD‐search Basic Protocol 3: Standalone RPS‐BLAST and rpsbproc
NLM’s conserved domain database (CDD) is a collection of protein domain and protein family models constructed as multiple sequence alignments. Its main purpose is to provide annotation for protein and translated nucleotide sequences with the location of domain footprints and associated functional sites, and to define protein domain architecture as a basis for assigning gene product names and putative/predicted function. CDD has been available publicly for over 20 years and has grown substantially during that time. Maintaining an archive of pre-computed annotation continues to be a challenge and has slowed down the cadence of CDD releases. CDD curation staff builds hierarchical classifications of large protein domain families, adds models for novel domain families via surveillance of the protein ‘dark matter’ that currently lacks annotation, and now spends considerable effort on providing names and attribution for conserved domain architectures. CDD can be accessed at https://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml.
Background Inhibitors of apoptosis (IAPs) are critical regulators of programmed cell death that are essential for development, oncogenesis, and immune and stress responses. However, available knowledge regarding IAP is largely biased toward humans and model species, while the distribution, function, and evolutionary novelties of this gene family remain poorly understood in many taxa, including Mollusca, the second most speciose phylum of Metazoa. Results Here, we present a chromosome-level genome assembly of an economically significant bivalve, the hard clam Mercenaria mercenaria, which reveals an unexpected and dramatic expansion of the IAP gene family to 159 members, the largest IAP gene repertoire observed in any metazoan. Comparative genome analysis reveals that this massive expansion is characteristic of bivalves more generally. Reconstruction of the evolutionary history of molluscan IAP genes indicates that most originated in early metazoans and greatly expanded in Bivalvia through both lineage-specific tandem duplication and retroposition, with 37.1% of hard clam IAPs located on a single chromosome. The expanded IAPs have been subjected to frequent domain shuffling, which has in turn shaped their architectural diversity. Further, we observed that extant IAPs exhibit dynamic and orchestrated expression patterns among tissues and in response to different environmental stressors. Conclusions Our results suggest that sophisticated regulation of apoptosis enabled by the massive expansion and diversification of IAPs has been crucial for the evolutionary success of hard clam and other molluscan lineages, allowing them to cope with local environmental stresses. This study broadens our understanding of IAP proteins and expression diversity and provides novel resources for studying molluscan biology and IAP function and evolution.
The successful synthesis of superconducting infinite-layer nickelate thin films with the highest Tc ≈ 15 K has ignited great enthusiasm for this material class as potential analogs of the high-Tc cuprates. Pursuing a higher Tc is always an imperative task in studying a new superconducting material system. Here we report high-quality Pr0.82Sr0.18NiO2 thin films with Tconset ≈ 17 K synthesized by carefully tuning the amount of CaH2 in the topotactic chemical reduction and the effect of pressure on its superconducting properties by measuring electrical resistivity under various pressures in a cubic anvil cell apparatus. We find that the onset temperature of the superconductivity, Tconset, can be enhanced monotonically from ~17 K at ambient pressure to ~31 K at 12.1 GPa without showing signatures of saturation upon increasing pressure. This encouraging result indicates that the Tc of infinite-layer nickelates superconductors still has room to go higher and it can be further boosted by applying higher pressures or strain engineering in the heterostructure films.
Heat shock protein 70 (HSP70) members participate in a wide range of housekeeping and stress-related activities in eukaryotic cells. In marine ecosystems, bivalves encounter abiotic stresses, including high temperatures and low dissolved oxygen. Here, 133 MmHSP70 genes were identified through combined methods including Blastp, HMM and manual filtration, based on the whole Mercenaria mercenaria genome. The MmHSP70 genes were unevenly distributed, and 41 genes (33.08%) were located on Chr 7. Phylogenetic analyses indicated that the MmHSP70 gene family mainly consisted of two clusters and the Hspa12 subfamily underwent lineage-specific expansion. A high-density collinear gene block was observed between M. mercenaria Chr 7 and Cyclina sinensis Chr 14. Tandem duplication MmHSP70 gene pairs experienced different levels of purifying selection, which could be an important source of sequence and functional constraints. MmHSP70 genes showed tissue-specific and stress-specific expression. Most tandem duplication HSP70 gene pairs had high expression under hypoxia stress. HSP70 B2 tandem duplication gene pairs showed significantly increased expression under heat plus severe hypoxia stress. This study provided a comprehensive understanding of the MmHSP70 gene family in the M. mercenaria and laid a significant foundation for further studies on the functional characteristics of MmHSP70 genes during exposure to heat and hypoxia stress.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.