Since PubChem is an open archive accepting information from many sources about a given molecule, it is imperative to provide the end-user with an aggregated view of all that is known for a single chemical structure. PubChem Compound records are derived summaries that give users access to a rich set of related content. Compound records contain unique chemical structures extracted from contributed Substance records through a process called ‘standardization’. Each Compound record points to at least one Substance record. In contrast, a Substance record might have no derived Compound record if the structure cannot be standardized or is missing (e.g., Chinese tea extract).
PubChem Compound pages (accession CID) summarize information known about a particular chemical. Take a look at these example pages:
To learn more about the Compound Summary pages, please read this PubChem blog.
One can browse chemical information currently available for PubChem Compound records, using the following link:
This page can also be reached by selecting the PubChem Table of Contents (TOC) classification from the "Select classification" drop-down menu in the PubChem Classification Browser.
Standardization in PubChem is the validation and determination of a unique chemical structure that is used to create a PubChem Compound from one or more submitted Substance records. Standardization is part of the PubChem Upload pipeline for submitted records with valid chemical structures. It allows PubChem to display one Compound page for aspirin (for example) that includes information from many submitted aspirin Substance records.
Read about the details of PubChem standardization or view a schematic:
- "Data Organization" in the PubChem Help documentation
- The "Data Organization" section in "PubChem Substance and Compound databases", S. Kim et al., Nucleic Acids Res. 2016; 44(Database issue):D1202–D1213. doi:10.1093/nar/gkv951
- "What is the difference between a substance and a compound in PubChem?" on PubChem Blog.