PubChem Substance is the primary archive for community-provided information about chemical entities.  A Substance record can contain any combination of chemical structures, synonyms, registration IDs, descriptions, related URLs, patent identifiers, database cross-references to PubMed, protein 3D structures, and biological screening results to name a few.  The only required element is an identifier unique to the contributing data source.  A Substance record, for example, might contain a unique identifier, a patent identifier and a publication reference from which it was extracted.


The provenance of a record belongs to the contributing organization and any changes to the record must come from them.  If each of ten organizations submits a record for the drug aspirin, then ten separate Substance records will be created and tracked in the archive.  If a chemical structure is present in those records and passes our automated 'standardization' procedure, then a single, derived PubChem Compound record is created linking all of those Substance records and compiling information from them.


The standardization of Substance records is not always sucessful.  Sometimes, a Substance record does not even have any structure to standardize.  Therefore, some Substance records have no corresponding entry in the Compound database.  all Compound records must point to at least one Substance record.  However,  because the standardization process of Substance records are not always successful.



  1. "Data Organization" in the PubChem Help documentation
  2. The "Data Organization" section in "PubChem Substance and Compound databases", S. Kim et al., Nucleic Acids Res. 2016; 44(Database issue):D1202–D1213. doi:10.1093/nar/gkv951
  3. "What is the difference between a substance and a compound in PubChem?" on PubChem Blog.
