Downloading PubChem Data

PubChem is an open access database with most of the data available for download. Exceptions may exist in cases where licensing agreements prevent our data contributors from allowing bulk downloads of some data sets.

 

There are several ways to download PubChem data:

 

Individual Record Download

Programmatic Download

Bulk Download

From PubChem Search pages

From the PubChem FTP site

 

 

Individual Record Download

All or part of the data for an individual PubChem record may be downloaded in various file formats, using the Download button available on the top-right corner of a Compound Summary, Substance Record, or BioAssay Record page.  A Download button is also available above various data views that present certain types of information (for example, bioassay data tables, classification views, 3-D conformer views, etc.)

 

 

Programmatic Download

PubChem data may be downloaded programmatically using various programmatic access routes including: 

 

E-Utilities 

Power User Gateway (PUG)

PUG-SOAP

PUG-REST 

PUG-View

PubChemRDF REST interface.

 

For more details, refer to PubChem programmatic access overview.

 

 

Bulk Download

 

From PubChem Search pages

The PubChem Search pages (e.g. the PubChem home page and underlying result lists) provide a direct interface to downloading search results. The download button is on the upper right side:

 

 

This will bring up a pop-up panel of options that lets you select the format and compression options for a downloaded file. Simply select options (radio buttons) and a format, and the download will begin immediately. Note that the options are different for different record types. For compounds it looks like this:

 


If you have a specific list of records  you would like to download, you can use the "Upload ID List" function on the homepage:

 

 

This will bring up an input panel where you can specify what type of record you're talking about (compounds, assays, etc.), and provide a list of record identifiers either directly or via file upload. After uploading your list, you can download the records as above.

 

From the PubChem FTP Site

The PubChem File Transfer Protocol (FTP) site (https://ftp.ncbi.nlm.nih.gov/pubchem) allows the user to download various kinds of PubChem data in bulk.  Here is a brief overview of the layout of the directories at the FTP site.  For more detailed information on the content in each directory, please see the README file in that directory.

 

./Bioassay

PubChem BioAssay data

./Compound

Full and incremental data dump for PubChem compounds (without annotations and 3-D conformer models).

./Compound_3D

Computationally generated 3-D structures for PubChem compounds, along with other 3-D properties such as molecular volume, shape quadrupoles, shape fingerprint, etc.

./Other

Other PubChem data, including chemical-patent data from Google Patents and IBM.

./RDF

PubChem data formatted in Resource Description Framework (RDF).

./Substance

Full and incremental data dump for PubChem substances, deposited by individual data submitters.

./Target

List of genes targeted in PubChem BioAssays.

./presentations

Slides for some PubChem presentations.

./publications

Some full-text articles about PubChem. A full publication list is available at: https://pubchemdocs.ncbi.nlm.nih.gov/publications.

./specifications

Data specification for PubChem records.

 

Was this information helpful?

The page cannot be found

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please make sure you spelled the page name correctly or use the search box.