Benjamin A. Shoemaker scite author profile

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.

show abstract

PubChem 2019 update: improved access to chemical data

Kim

et al. 2018

View full text Add to dashboard Cite

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a key chemical information resource for the biomedical research community. Substantial improvements were made in the past few years. New data content was added, including spectral information, scientific articles mentioning chemicals, and information for food and agricultural chemicals. PubChem released new web interfaces, such as PubChem Target View page, Sources page, Bioactivity dyad pages and Patent View page. PubChem also released a major update to PubChem Widgets and introduced a new programmatic access interface, called PUG-View. This paper describes these new developments in PubChem.

show abstract

PubChem in 2021: new data content and improved web interfaces

et al. 2020

View full text Add to dashboard Cite

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a popular chemical information resource that serves the scientific community as well as the general public, with millions of unique users per month. In the past two years, PubChem made substantial improvements. Data from more than 100 new data sources were added to PubChem, including chemical-literature links from Thieme Chemistry, chemical and physical property links from SpringerMaterials, and patent links from the World Intellectual Properties Organization (WIPO). PubChem's homepage and individual record pages were updated to help users find desired information faster. This update involved a data model change for the data objects used by these pages as well as by programmatic users. Several new services were introduced, including the PubChem Periodic Table and Element pages, Pathway pages, and Knowledge panels. Additionally, in response to the coronavirus disease 2019 (COVID-19) outbreak, PubChem created a special data collection that contains PubChem data related to COVID-19 and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

show abstract

Speeding molecular recognition by using the folding funnel: The fly-casting mechanism

Shoemaker

Portman

Wolynes

2000

Proc. Natl. Acad. Sci. U.S.A.

924

962

View full text Add to dashboard Cite

Protein folding and binding are kindred processes. Many proteins in the cell are unfolded, so folding and function are coupled. This paper investigates how binding kinetics is influenced by the folding of a protein. We find that a relatively unstructured protein molecule can have a greater capture radius for a specific binding site than the folded state with its restricted conformational freedom. In this scenario of binding, the unfolded state binds weakly at a relatively large distance followed by folding as the protein approaches the binding site: the ''fly-casting mechanism.'' We illustrate this scenario with the hypothetical kinetics of binding a single repressor molecule to a DNA site and find that the binding rate can be significantly enhanced over the rate of binding of a fully folded protein.W e often glibly state that a protein must be folded to function. The reasoning underlying this statement is that organizing complex networks of chemical reactions in the cell requires these reactions to be highly specific. This specificity is achieved ultimately by having a high degree of geometrical precision in molecular binding-the famous ''lock and key.'' Geometric precision accompanies the increased rigidity of a biomolecule once it has folded; thus, apparently, folding is required for specific function. It does come as a surprise, then, that many proteins in the cell appear to be unfolded most of the time (see ref. 1 and refs. therein). Several ideas about potential biological advantages of being unfolded have been proposed. For example, the rapid turnover of unfolded proteins because of proteolytic degradation may be required for cell-cycle regulation (2). Thermodynamic arguments have also been made suggesting that coupling folding and binding may allow greater equilibrium distinctions for binding to different sites (3). In this note, we investigate whether too much rigidity may conflict with the need for biomolecules to move during their function and is therefore a kinetic disadvantage. There is much evidence for residual flexibility in the folded state, which must be thought of as having an ensemble of conformations (4). Still, the range of motions allowed in the folded state, as measured by Debye-Waller factors (5), is more restricted than those allowed in an unfolded molecule, thus slowing the exploration of configuration space. Here, we illustrate, by means of a specific example of operator binding to DNA, how the speed of molecular recognition can be enhanced by having folding (necessary for the required specificity) occur during binding rather than before.Folding and binding are kindred phenomena. The similarity of binding and folding is clear at the thermodynamic level, where both processes involve accurately locating molecular fragments with respect to each other, reducing the configurational entropy, and simultaneously lowering the free energy by the exclusion of solvent and formation of hydrogen bonds and salt bridges (6, 7). At the structural level, the similarity between packing patterns between ...

show abstract

CDD: a Conserved Domain Database for protein classification

Marchler‐Bauer

Anderson

Cherukuri

et al. 2004

Nucleic Acids Research

1,014

757

View full text Add to dashboard Cite

The Conserved Domain Database (CDD) is the protein classification component of NCBI's Entrez query and retrieval system. CDD is linked to other Entrez databases such as Proteins, Taxonomy and PubMed®, and can be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. CD-Search, which is available at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, is a fast, interactive tool to identify conserved domains in new protein sequences. CD-Search results for protein sequences in Entrez are pre-computed to provide links between proteins and domain models, and computational annotation visible upon request. Protein–protein queries submitted to NCBI's BLAST search service at http://www.ncbi.nlm.nih.gov/BLAST are scanned for the presence of conserved domains by default. While CDD started out as essentially a mirror of publicly available domain alignment collections, such as SMART, Pfam and COG, we have continued an effort to update, and in some cases replace these models with domain hierarchies curated at the NCBI. Here, we report on the progress of the curation effort and associated improvements in the functionality of the CDD information retrieval system.

show abstract

CDD: a database of conserved domain alignments with links to domain three-dimensional structure

et al. 2002

View full text Add to dashboard Cite

The Conserved Domain Database (CDD) is a compilation of multiple sequence alignments representing protein domains conserved in molecular evolution. It has been populated with alignment data from the public collections Pfam and SMART, as well as with contributions from colleagues at NCBI. The current version of CDD (v.1.54) contains 3693 such models. CDD alignments are linked to protein sequence and structure data in Entrez. The molecular structure viewer Cn3D serves as a tool to interactively visualize alignments and three-dimensional structure, and to link three-dimensional residue coordinates to descriptions of evolutionary conservation. CDD can be accessed on the World Wide Web at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. Protein query sequences may be compared against databases of position-specific score matrices derived from alignments in CDD, using a service named CD-Search, which can be found at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search runs reverse-position-specific BLAST (RPS-BLAST), a variant of the widely used PSI-BLAST algorithm. CD-Search is run by default for protein-protein queries submitted to NCBI's BLAST service at http://www.ncbi.nlm.nih.gov/BLAST.

show abstract

CDD: a curated Entrez database of conserved domain alignments

et al. 2003

View full text Add to dashboard Cite

The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.

show abstract

PubChem's BioAssay Database

Wang

Xiao

Önal-Süzek

et al. 2011

Nucleic Acids Research

499

413

View full text Add to dashboard Cite

PubChem (http://pubchem.ncbi.nlm.nih.gov) is a public repository for biological activity data of small molecules and RNAi reagents. The mission of PubChem is to deliver free and easy access to all deposited data, and to provide intuitive data analysis tools. The PubChem BioAssay database currently contains 500 000 descriptions of assay protocols, covering 5000 protein targets, 30 000 gene targets and providing over 130 million bioactivity outcomes. PubChem's bioassay data are integrated into the NCBI Entrez information retrieval system, thus making PubChem data searchable and accessible by Entrez queries. Also, as a repository, PubChem constantly optimizes and develops its deposition system answering many demands of both high- and low-volume depositors. The PubChem information platform allows users to search, review and download bioassay description and data. The PubChem platform also enables researchers to collect, compare and analyze biological test results through web-based and programmatic tools. In this work, we provide an update for the PubChem BioAssay resource, including information content growth, data model extension and new developments of data submission, retrieval, analysis and download tools.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.