PubChem3D Release Notes

 

July 2019

 

The PubChem3D project computes a 3-D description of PubChem Compound records. For more details, see the open-access PubChem3D thematic series published in the Journal of Cheminformatics (https://www.biomedcentral.com/collections/pubchem3d).

 

    PubChem generates [1-3] a computed 3-D description of each compound in the PubChem Compound database that is not too large (<= 50 non-hydrogen atoms), is not too flexible (<= 15 rotatable bonds), consists of only organic elements (H, C, N, O, F, Si, P, S, Cl, Br, and I), has a single covalent unit (i.e., not a salt or a mixture), and contains only atom types recognized by the MMFF94s force field [4-5].  Currently, this includes more than 85.2 million of the 95.7 million records (+89%) in the PubChem Compound database.  Considering only the parent forms of salts are considered for 3-D (e.g., acetic acid not sodium acetate) and that many compounds have a parent with 3-D information, 92% of PubChem Compound may be considered to have 3-D information.

 

    Each computed 3-D conformer is not at an energy minimum and may not represent the lowest energetic form in vacuum, solvent, or a binding pocket.  Rather, the computationally-derived 3-D description consists of low energy conformers selected from a conformer model (a description of the conformational flexibility of a chemical structure consisting of multiple 3-D representations or poses sampled using average atom pair-wise RMSD (root mean squared distance) threshold) describing energetically-accessible and (potentially) biologically relevant conformations of a chemical structure.

 

    A conformer model consisting of up to 500 conformers per compound is created.  The average is currently ~110 conformers per compound.  This count is much too large to handle routinely in PubChem services.  As such, a diverse conformer ordering is provided.  The diverse ordering is such that the first "N" conformers selected represent the overall diversity of the conformer model for a compound.  This allows one to select the degree of coverage that is computationally feasible while ensuring maximal coverage of the shape and feature diversity present for the compound.  For PubChem purposes, only the first ten diverse conformers are accessible per compound.

 

    Available 3-D aware tools, including the download facility, score matrix service, and the PubChem 3-D viewer, allow a range of diverse conformers to be used (to a maximum of ten).  The PubChem FTP site provides either one or ten diverse conformers per compound:

      https://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/

 

    An available "Similar Conformers" neighboring relationship uses multiple conformers.  At this time, only the first nine most diverse conformers are being used.

 

    The integrated 3-D information may be viewed for each compound record by clicking on the “3D” thumbnail image at the top of the compound summary page.  An interactive 3-D structure of a compound conformer is provided (e.g., for aspirin, https://pubchem.ncbi.nlm.nih.gov/compound/Aspirin#section=3D-Conformer).  The image and structure can be downloaded.  A full-page view is also available.  You can embed this 3-D interactive structure into your page via the PubChem Widgets (read more about this here: https://pubchemdocs.ncbi.nlm.nih.gov/widgets).

 

    There is a web-based viewer available that allows 3-D structures to be visualized.  One can also overlap arbitrary sets of CIDs within PubChem.  The web-based viewer can be found here:

        https://pubchem.ncbi.nlm.nih.gov/vw3d/

 

    PubChem compares conformers by similarity [6-8] taking into account 3-D shape and 3-D orientation of protein binding features. This is indicated within the “Related Compound” section of a Compound Summary page via a “Similar Conformers” relationship (for Aspirin, you can find this here: https://pubchem.ncbi.nlm.nih.gov/compound/Aspirin#section=Related-Compounds). The neighboring relationships may be visualized in the form of an overlay of a compound (known as the reference conformer) with its similar conformer neighbor (known as the fit conformer). The shape aligned overlay of neighbored compounds may be downloaded and visualized using the web-based viewer. It is also possible to download superposition information in bulk (CSV format or stored as a property of the reference conformer) via the PubChem FTP site (found here: https://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/similar_conformers/ and also accessible here: https://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/similar_conformers/); however, to yield the resulting superposition, one must also download the PubChem3D conformers corresponding to the superposition and apply the provided rotation/translation (in that order) matrix/vector to the fit (second) conformer.

 

 

    Download of the 3-D information from either Compound Summary page, 3-D web-based viewer (https://pubchem.ncbi.nlm.nih.gov/vw3d/), or FTP site (https://ftp.ncbi.nlm.nih.gov/pubchem/Compound_3D/) includes the 3-D properties, including: MMFF partial charges; volume; steric monopole, quadrupole, and octupole moments; MMFF94 energy (with coulombic terms removed); shape fingerprint; self-overlap volumes used in ST and CT similarity computation; conformer model RMSD; the conformer model diverse ordering; and 3-D protein-binding pharmacophore features.

 

    All 3-D conformer data download is separated from the traditional 2-D information provided by PubChem. In the PubChem Download Facility, there is a 3-D check box to indicate 3-D information is desired. Similarly, download of information from the Compound Summary provides a choice between 2-D and 3-D information.

 

For further assistance, please contact info@ncbi.nlm.nih.gov.

 

Thank you,

 

- the PubChem team

 

 :-= Bibliography =-:

 [1] OEOmega, version 2.2. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2007.

 [2] OEOmega, version 2.3. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2008.

 [3] OEOmega, version 2.4. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2009.

 [4] Halgren TA. Merck Molecular Force Field: I. Basis, Form, Scope, Parameterization and Performance of MMFF94. J. Comp. Chem. 1996;17:490-519.

 [5] Halgren TA. Merck Molecular Force Field: VI. MMFF94s Option for Energy Minimization Studies.  J. Comp. Chem. 1999;20:720-729.

 [6] OEShape, version 1.7.0. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2008.

 [7] OEShape, version 1.7.2. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2009.

 [8] OEShape, version 1.8.0. OpenEye Scientific Software, Inc.; Santa Fe, NM, USA: 2010.

 

 :-= History =-:

2019 Jul 01 - Major rewrite of release notes.

2011 Mar 06 - PubChem3D version 2.0 release. Major rewrite of release notes.

2010 Jan 05 - Added link to a third presentation. Added missing "Fair Use Disclaimer".

 

 :-=  Fair Use Disclaimer  =-:

    Databases of molecular data on the NCBI FTP site include such examples as nucleotide sequences (GenBank), protein sequences, macromolecular structures, molecular variation, gene expression, and mapping data. They are designed to provide and encourage access within the scientific community to sources of current and comprehensive information. Therefore, NCBI itself places no restrictions on the use or distribution of the data contained therein. However, some submitters of the original data may claim patent, copyright, or other intellectual property rights in all or a portion of the data they have submitted. NCBI is not in a position to assess the validity of such claims and, therefore, cannot provide comment or unrestricted permission concerning the use, copying, or distribution of the information contained in the molecular databases.

Was this information helpful?

 

National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894

Copyright
FOIA
Privacy

PubChem Help
Accessibility
HHS Vulnerability Disclosure

 

The page cannot be found

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please make sure you spelled the page name correctly or use the search box.