PubChem Glossary

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z



A #


PubChem's BioAssay (protocol) identifier, a non-zero integer.



NCBI's Protein database identifier



B #

BioActivity Types



C #


PubChem's compound identifier, a non-zero integer for a unique chemical structure.



The complexity rating of a compound is a rough estimate of how complicated the structure is, seen from the point of view of both the elements contained and the displayed structural features including symmetry.  However, neither stereochemistry nor isotope labelling are used as auxiliary criteria.  The value is computed using the Bertz/Hendrickson/Ihlenfeldt formula, described in these papers:

A scaling factor for aromaticity is used so that the complexity of benzene is the same as of cyclohexane. It is a floating point value, ranging from 0 (simple ions) to several thousand (complex natural products). Generally larger compounds are more complex than smaller ones, but highly symmetrical compounds, or compounds with few distinct atom types or elements are downgraded. Complexity is only loosely correlated with synthetic accessibility.



List all depositor's comments and additional information for this substance.



For mixture substance/compound, component is one of the single molecule.



Chemical representatives in substances. Chemical structure presented in a compound is standardized through PubChem's data pipeline. A mixture substance may have several standardized compounds. A compound record is structurally unique in the PubChem compound database.


Computed Descriptors

Information to describe the compound in different formats, including SMILES, InChI, IUPAC names.


Computed Properties

Properties that can be calculated for each compound, including molecular weight, molecular formulaXLogP, etc.


Covalently-Bonded Unit

A group of atoms connected by covalent bonds, ignoring other bond types (or a single atom without covalent bonds). The "covalently-bonded unit count" property is the number of such units in a compound.


D #

Deprecated Compound

A Compound CID which has no links to any substance. This may occur as PubChem modifies processing. A deprecated compound will not be available within Entrez.



E #


Emergency Response Guidebook (ERG)

The Emergency Response Guidebook (ERG) is designed for use by first responders (fire fighters, police, and other emergency services personnel) who may be the first to arrive at the scene of a transportation incident involving dangerous goods.  ERG is a guide to aid first responders in quickly identifying the specific or generic hazards of the material(s) involved in the incident and protecting themselves and the general public during the initial response phase of the incident.  It was developed jointly by Transport Canada (TC), the U.S. Department of Transportation (DOT), the Secretariat of Communications and Transport of Mexico (SCT) and with the collaboration of CIQUIME (Centro de Información Química para Emergencias) of Argentina.



Entrez is NCBI’s primary text search and retrieval system that integrates the PubMed database of biomedical literature with 38 other literature and molecular databases including PubChem's BioAssay, Compound and Substance databases as well as DNA and protein sequence, structure, gene, genome, genetic variation and gene expression. Learn more..



F #

G #


NCBI's Gene database identifier

H #


Number of hydrogen acceptors in the structure. Classification of hydrogens follows [J. Chem. Inf. Comput. Sci. 1997,37, 615-621].



Number of hydrogen donors in the structure. Classification of hydrogens follows [J. Chem. Inf. Comput. Sci. 1997,37, 615-621].



I #


IUPAC International Chemical Identifier. Learn more...  InChI string can be searched through the Entrez PubChem databases.


J #

K #

L #

M #

Molecular Formula

A chemical formula that indicates the kinds of atoms and the number of each kind in a molecule.  It is a way of expressing information about the atoms that constitute a particular chemical molecule.


Molecular Weight

The molecular weight is the sum of all atomic weights of the constituent atoms in a compound, measured in gr/mol. In the absence of explicit isotope labeling, averaged natural abundance (which may, for example in case of Li and U compounds, not be identical to purchasable material) is assumed. If an atom bears an explicit isotope label, 100% isotopic purity is assumed at this location, even for short-lived radioactive isotopes where this is often physically unrealistic. At this moment, it is not possible to deposit more detailed isotope composition information into the PubChem database. Pseudo-atoms which are not an element have an atomic weight of 0 g/mol.


N #

O #

Old Version Substance

Substance versions are considered to be "old" when a more recent update is provided by the depositor.



P #

Parent Compound

A parent compound is conceptually the "important" part of the molecule when the molecule has more than one covalent component. Specifically, a parent component must have at least one carbon and contain at least 70% of the heavy (non-hydrogen) atoms of all the unique covalent units (ignoring stoichiometry). Note that this is a very empirical definition and is subject to change. For example, the "parent" compound in tetracycline hydrochloride (CID 54704426) and tetracycline metaphosphate (CID 54729668) is tetracycline (CID 54675776).



NCBI's PubMed database identifier


Q #

R #

Revoked BioAssay

When a depositor removes an assay that the depositor previously deposited into PubChem, the assay is considered revoked. A revoked assay will not be available within Entrez.


Revoked Substance

When a depositor removes a substance from their substance collection, the substance is considered revoked. A revoked substance will not be available within Entrez.



S #


PubChem's substance identifier, a non-zero integer for a deposited substance.



Simplified Molecular Input Line Entry System, a line notation (a typographical method using printable characters) for entering and representing molecules.  PubChem computes two kinds of SMILES strings: 

  • Canonical SMILES : a unique SMILES string of a compound, generated by a “canonicalization” algorithm.
  • Isomeric SMILES : a SMILES string with stereochemical and isotopic specifications.
  • In nearly all situations, one should use the Isomeric SMILES, unless stereo and isotopic information is not desired.

Read this document to learn more.



A language that allows you to specify substructures using rules that are straightforward extensions of SMILES. Learn more..


Source Category

Source category, such as chemical vendors or governmental organizations, is a general purpose grouping that describes contributing organization.



Relative spatial arrangement of atoms within molecules, such as chirality.



Individual record object collected from depositors, representing a sample used at BioAssay.


Suppressed Compound

A Compound CID that links only to an old version substance. A suppressed compound will not be available within Entrez.



All names, trivial names, synonyms, frequently used IDs, and other names collected from depositors. In the compound summary page, synonyms are distinct synonyms from all corresponding substances.



T #


Topological Polar Surface Area. This is an estimate of the area (in Å2) which is polar. The implementation follows the paper by Ertl et al. [J. Med. Chem. 2000, 43, 3714-3717].  It is a simple method - only N and O are considered, 3D coordinates are not used, and there are various precomputed factors for different hybridizations, charges and participation in aromatic systems.



U #

V #


PubChem substance version number is incremented when an update is provided by the depositor.



W #

X #


The external references/links to PubChem database records.



A computationally predicted octanol-water partition coefficient (or distribution coefficient).  It is used as a measure of hydrophilcity or hydrophobicity of a molecule.  From 2009, the PubChem uses version 3 of the algorithm to generate the XlogP value, which is described in the paper by Cheng et al. Learn more..



Y #

Z #

Was this information helpful?


National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894


PubChem Help
HHS Vulnerability Disclosure


The page cannot be found

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please make sure you spelled the page name correctly or use the search box.