PubChem Identifier Exchange Service

This document describes how to use the PubChem Identifier Exchange web service, which can be found at the following URL: 


https://pubchem.ncbi.nlm.nih.gov/idexchange/

 

The input IDs are converted into CIDs, which is called input CIDs.  The CIDs that satisfy the condition specified by the selected operation type are retrieved.  These CIDs are called output CIDs. The output CIDs are then converted into the ID type specified by the user and written into a file or Entrez history.

Requests are processed in the order received. All requests are kept private; without your unique 64-bit key, nobody else can see what structures you have submitted.

 

  • The ID exchange service has a processing time limit, which is currently 30 minutes.  Any job that exceeds this time limit will be killed.
  • The maximum number of input IDs that this service can accept is 500,000 and any input exceeding this limit is rejected immediately.  However, the number of input IDs that it can process within the processing time limit (30 minutes) varies greatly depending on the type of input and output IDs as well as the operation types.  If a job is too large to be completed within the time limit, it should be broken into smaller jobs.

 

Steps to identifier Exchange

 

1) Select an input format and give the ID list:

  • Registry IDs: External registry ID list.
  • CIDs: Compound ID list.
  • SIDs: Substance ID list.
  • InChIs: InChI string list.
  • InChIKeys: InChIKey string list.
  • SMILES: SMILES string list.  Both canonical and isomeric SMILES strings are allowed.
  • Synonyms: Synonyms string list.
  • The input list can be provided by text, a file, or Entrez history. Registry IDs, SIDs, CIDs, InChIKey, and SMILES can be separated by white space, comma, tab, or carriage return, however InChI and Synonyms should be separated by tab or carriage return only. If Registry IDs are provided, the Registry Source Name must also be provided.

 

2) Select an operation type:

  • Same CID: Same CIDs as input CIDs.
  • Same, Stereochemistry: CIDs that have same stereo centers as input CIDs (isotopes can vary).
  • Same, Isotopes: CIDs that have the same isotopes as input CIDs (stereochemistry can vary).
  • Same, Connectivity: CIDs that have the same connectivity as input CIDs (isotopes and stereochemistry can vary).
  • Same parent: CIDs that have the same parents as input CIDs (salt-form, hydrate-form, and charge-state can vary). See here for the definition of a parent compound.
  • Same parent, Stereochemistry: CIDs whose parent compounds have the same stereochemistry as the parent compounds of the input CIDs (salt-form, hydrate-form, charge-state, and isotopes can vary).
  • Same parent, Isotopes: CIDs whose parent compounds have the same isotopes as the parent compounds of the input CIDs (salt-form, hydrate-form, charge-state, and stereochemistry can vary).  
  • Same parent, Connectivity: CIDs whose parent compounds have the same connectivity as the parent compounds of the input CIDs (salt-form, hydrate-form, charge-state, isotopes, and stereochemistry can vary).
  • Similar 2D Compound: CIDs that are structurally similar to input CIDs in terms of 2-D similarity.  See this paper for more details about 2-D similarity evaluation in PubChem.
  • Similar 3D Conformer: CIDs that are structurally similar to input CIDs in terms of 3-D similarity.  See this paper for more details about 3-D similarity evaluation in PubChem.

 

3) Select an output type:

  • Registry IDs: External registry ID list. If Registry ID is selected, the DSN (Data Source Name) should also be provided.
  • CIDs: Compound ID list.
  • SIDs: Substance ID list.
  • InChIs: InChI string list.
  • InChIKeys: InChIKey string list.
  • SMILES: Isomeric SMILES string list.
  • Synonyms: Synonyms string list.
  • Titles: List of the titles (usually chemical names) of the compound summary pages.

 

4) Select an output method:

  • Entrez History (SID or CID only): The result is saved into Entrez history. Registry IDs will be converted into SIDs.
  • Single column showing unique results: The result file has the output IDs only. Duplicate IDs are removed.
  • Two column file showing input-output correspondence: The result file includes both input and output IDs.  

 

5) Select a compression method for result file:

  • None: No compression.
  • Gzip: Gzip (.gz) compression, the default.
  • Bzip2: Bzip2 (.bz2) (.bz2) compression.

 

6) Select an action:

  • Submit Job: Begin the computation. You will be taken to a self-refreshing waiting page while your request is queued on NCBI's servers. When done, the URL of the final result file is displayed. Use this URL to download the results to your computer. The result can also be saved into Entrez history.
  • Save Job: Save the XML specification of this request, mainly for use with PubChem's PUG service. If Entrez history is used for input ID, instead of save a webenv value that might be expired soon, the full input ID list is retrieved from Entrez history and saved. If Entrez history is selected for output method, single-column file method will be used instead.
  • Load Job: Load and submit a saved XML job file.
  • Clear Form: Clear the form.
Was this information helpful?

The page cannot be found

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please make sure you spelled the page name correctly or use the search box.