PubChem provides several ways for programmatic access to its data, including:
PUG-REST, a Representational State Transfer (REST)-style web service that supplies specific bits of information on one or more PubChem records. PUG-REST is a simplified access route to PubChem without the overhead of XML or SOAP envelopes that are required with PUG and PUG-SOAP. PUG-REST also provides convenient access to information on PubChem records not possible with the other PUG services. It is intended to handle short, synchronous requests - that is, the result is given in a single call that may last at most 30s (the default timeout on PubChem servers), without any intermediate step to poll whether that request has completed.
PUG-View is a REST-style web service that provides full reports, including third-party textual annotation, for individual PubChem records. Its purpose is primarily to drive the PubChem summary web pages, but can also be used independently as a programmatic web service.
- Power User Gateway (PUG)
PUG provides programmatic access to PubChem services via a single common gateway interface (CGI), called ‘pug.cgi’, available at http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi. This CGI is the central gateway to several PubChem services. Instead of taking any Uniform Resource Locator (URL) arguments, PUG exchanges data through XML via a Hypertext Transfer Protocol (HTTP) POST.
PUG-SOAP provides a web service access to PubChem data, using the simple object access protocol (SOAP). It provides an easier programmatic access to much of the same functionality as PUG, but it breaks down operations into simpler functions as defined via the web service definition language (WSDL), and uses SOAP-formatted message envelopes for information exchange. This WSDL/SOAP layer is most suitable for SOAP-aware GUI workflow applications (e.g. Taverna and Pipeline Pilot) and programming/scripting languages (e.g. C, C++, C#, .NET, Perl, Python and Java).
Entrez Utilities (also called E-Utilities or E-Utils)
E-Utils are a set of programs used to access to information contained in the Entrez system. While suited for accessing text or numeric-fielded data, they cannot deal with more complex types of data specific to PubChem, such as chemical structures and tabular bioactivity data.
PubChem has a standard time limit of 30 seconds per web service request. If a request is not completed within the 30-second limit for any reason, a timeout error will be returned. To work around certain slower operations, one may use an ‘asynchronous’ approach, where a so-called ‘key’ is returned as a response to the initial request. This key is then used to check periodically whether the operation has finished, and, when complete, retrieve the results.
All PubChem web pages (or requests to NCBI in general) have a policy that users should throttle their web page requests, which includes web-based programmatic services. Violation of usage policies may result in the user being temporarily blocked from accessing PubChem (or NCBI) resources. The current request volume limits are:
- No more than 5 requests per second.
- No more than 400 requests per minute.
- No longer than 300 second running time per minute.
It should be noted that these limits can be lowered through the dynamic traffic control at times of excessive load. Throttling information is provided in the HTTP header response, indicating the system-load state and the per-user limits. Based on this throttling information, the user should moderate the speed at which requests are sent to PubChem.