Downloading PubChem Data
PubChem is an open access database with most of the data available for download. Exceptions may exist in cases where licensing agreements prevent our data contributors from allowing bulk downloads of some data sets.
There are several ways to download PubChem data:
Individual Record Download
All or part of the data for an individual PubChem record may be downloaded in various file formats, using the Download button available on the top-right corner of a Compound Summary, Substance Record, or BioAssay Record page. A Download button is also available above various data views that present certain types of information (for example, bioassay data tables, classification views, 3-D conformer views, etc.)
Programmatic Download
PubChem data may be downloaded programmatically using various programmatic access routes including:
For more details, refer to PubChem programmatic access overview.
Bulk Download
From PubChem Search pages
The PubChem Search pages (e.g. the PubChem home page and underlying result lists) provide a direct interface to downloading search results. The download button is on the upper right side:
This will bring up a pop-up panel of options that lets you select the format and compression options for a downloaded file. Simply select options (radio buttons) and a format, and the download will begin immediately. Note that the options are different for different record types. For compounds it looks like this:
If you have a specific list of records you would like to download, you can use the "Upload ID List" function on the homepage:
This will bring up an input panel where you can specify what type of record you're talking about (compounds, assays, etc.), and provide a list of record identifiers either directly or via file upload. After uploading your list, you can download the records as above.
From the PubChem FTP Site
The PubChem File Transfer Protocol (FTP) site (https://ftp.ncbi.nlm.nih.gov/pubchem) allows the user to download various kinds of PubChem data in bulk. Here is a brief overview of the layout of the directories at the FTP site. For more detailed information on the content in each directory, please see the README file in that directory.
PubChem BioAssay data
Full and incremental data dump for PubChem compounds (without annotations and 3-D conformer models).
Computationally generated 3-D structures for PubChem compounds, along with other 3-D properties such as molecular volume, shape quadrupoles, shape fingerprint, etc.
Other PubChem data, including chemical-patent data from Google Patents and IBM.
PubChem data formatted in Resource Description Framework (RDF).
Full and incremental data dump for PubChem substances, deposited by individual data submitters.
List of genes targeted in PubChem BioAssays.
Slides for some PubChem presentations.
Some full-text articles about PubChem. A full publication list is available at: https://pubchemdocs.ncbi.nlm.nih.gov/publications.
Data specification for PubChem records.