PUG View is a REST-style web service that provides information content that is not directly contained within the primary PubChem Substance, Compound, or BioAssay records. Its purpose is primarily to drive the PubChem database summary record web pages, but can also be used independently as a programmatic web service.
PUG View is mainly designed to provide complete summary reports on individual PubChem records. Users may also be interested in PUG REST (https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest-tutorial), a different style of service that gives smaller bits of information about one or more PubChem records.
An overview of PUG View can be found in the following paper:
[PubMed PMID: 31399858] [PubMed Central PMCID: PMC6688265] [Free Full Text]
PUG View provides structured information in a variety of formats, specified at the end of the URL path. Most results can be formatted as JSON(P), XML, or ASN.1 as text (ASNT) or base64-encoded binary (ASNB). For example, these all contain exactly the same information, just in different formats:
An XML schema is available here. Note that the JSON and ASN.1 formats follow the same content model.
PUG View provides record summaries for the three primary PubChem databases - Compounds, Substances, and BioAssays - as well as patents and targets. Each of these can be accessed as an index, providing a listing of what information is present, but without the entire data content; essentially a table of contents for that record:
Or the complete data can be retrieved:
This choice of index or full data is applicable to all the primary record types.
If only a subcategory of information is desired, a heading can be used to restrict the data returned. Note that the index as above is a convenient way to see what headings are present for a given record, as not all records will have all possible headings present. For example, to get just the experimental property section:
Or even just a single value type, like melting point:
Section headings that can be used in PUG-View data retrieval can be found in the PubChem Compound TOC tree (using the PubChem Classification Browswers).
Compounds records are accessed by CID number. Note that PUG View provides textual and third-party information associated with the compound, but not the chemical structure, which is handled by other PubChem services.
Substances records are accessed by SID number. Information on substances is fairly minimal; in particular, no third party annotation is associated with substances. Again, chemical structure is not part of PUG View’s results.
BioAssays are accessed by AID number.
Patents can be accessed by an identifier string. For USPTO patent grants, this is 'US' followed by a 7-digit number. For applications, it is 'US' followed by a 4-digit year, followed by a 6-digit number.
Gene information can be retrieved by NCBI Gene ID:
Protein information can be retrieved by NCBI Protein Accession:
Pathway information can be retrieved by Source:ExternalID:
Taxonomy information can be retrieved by NCBI Taxonomy ID:
Taxonomy information can be retrieved by Cell Line name (case-insensitive):
The following are not primary PubChem records, but rather extra information of various sorts that is attached to PubChem records. These reports contain information not present in the main record data described above.
PUG View can provide information of a specific type across all of PubChem’s primary databases. For example, if you are interested in all of the experimental viscosity measurements contained within PubChem and its associated third-party annotations, you can request this by heading:
Or equivalently (useful if the heading contains special characters not compatible with URL syntax):
This will include PubChem identifiers – CIDs in this example – for each data value, along with attribution detailing exactly where each bit of information was obtained.
Note that in the new data model, a heading may refer to different types of PubChem records, making it necessary to specify which one is intended:
Also keep in mind that some headings have more data than others, and retrieval is limited. There will be "Page" and "TotalPages" values at the end of the request data, that will indicate the given page number and whether there is more data than shown in the given request (that is, whether TotalPages is greater than one). By default, page #1 is returned, but subsequent pages (up to the TotalPages limit) can be accessed by adding a page argument:
Lastly, it is possible to get a complete list of all annotation headings (and their types) for which PubChem has any data, and that can be used in URLs such as the above:
PUG View can list all PubChem depositors and their SIDs for a given compound, including a categorization of each source – such as chemical vendor, research and development, journal publishers, etc.:
If a given compound has neighbors – other compounds with similar chemical structure – that have useful information like bioactivities or patents, etc., this will give a listing of such neighbors, grouped by information type:
This will give URLs into PubMed for literature associated with a compound, organized by subheading:
This is used do display biologic images associated with compounds. The integer here is an internal identifier, which will be present in the primary compound record.
This is a specialized image generator for QR codes that link to the LCSS page for a compound, intended for safety and hazard labelling.
This gives a listing of all the NCBI LinkOut records present for a substance, compound, or assay.
This gives a listing of 3D protein structures associated with a compound.
This is another specialized retrieval for attachments associated with some records, such as spectral images, etc. This key value will be present in the main record.
- Some users are often confused with PUG-View and PUG-REST. While PUG-REST retrieves property values computed by PubChem, PUG-View retrieves annotations collected from other data sources.
- Contrary to PUG-REST, PUG-View takes only CID (rather than chemical names, InChIKeys or other identifiers). Therefore, to get annotations corresponding to non-CID identifiers, they need to be converted to CIDs first and then those CIDs should be used in PUG-View requests.
Another important difference between PUG-REST and PUG-View is that PUG-View cannot take multiple CIDs in a single request, whereas PUG-REST can. That is, of the following two PUG-View requests, only the first one will work:
National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894