Data Organization

  • Data sources submit Substance and/or BioAssay records.
  • PubChem derives Compound records from unique structures.


  • Substances
    Data sources submit records containing annotations and optional structures using PubChem Upload. Each record from each data source is assigned a unique Substance Identifier (SID).  If, for example, ten organizations submit a record of information for aspirin, ten unique Substance (SID) records will be created.  Substance records are archival allowing one to investigate previously submitted versions.


  • Compounds
    If, and only if, one or more Substance records contain structures that can be standardized to the same chemical structure, a single Compound record (CID) is automatically generated. For example, many chemical vendor Substance records containing the same structure for aspirin will be aggregated into a single Compound (CID) record. Compound records become useful summaries of all PubChem information available for a given chemical structure.


  • BioAssays
    Data sources submit bioactivity test results and relevant annotations describing biological assay experiments on substances (SIDs). Each experiment from each data source is assigned a unique BioAssay Identifier (AID).  BioAssay (AID) records include researcher-defined active/inactive determinations of bioactivity with explanation. BioAssay records are archival allowing one to investigate previously submitted versions.
  • Targets (genes and proteins)
    PubChem Target pages summarize data relevant to a given gene or protein biological target.


  • Pathways
    Each PubChem Pathway page provides information about chemicals, proteins, genes, and diseases involved in or associated with the biological pathway, which can be very important to provide a context to observed biological activity. In addition, all pathways associated with a given chemical, protein or gene are summarized on the corresponding page.


  • Taxonomy
    The taxonomy page summarizes the data available in PubChem associated with the organism.


  • Patents
    PubChem contains information on what chemicals are mentioned in a given patent document.  The PubChem Patent page provides compounds and substances mentioned in a given patent document, along with other information including patent title, abstracts, application and publication dates, applicant, inventor, and classification.




  1. The "Data Organization" section in "PubChem Substance and Compound databases", S. Kim et al., Nucleic Acids Res. 2016; 44(Database issue):D1202–D1213. doi:10.1093/nar/gkv951
  2. "What is the difference between a substance and a compound in PubChem?" on PubChem Blog.


Was this information helpful?


National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894


PubChem Help
HHS Vulnerability Disclosure


The page cannot be found

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please make sure you spelled the page name correctly or use the search box.