PubChem Score Matrix Service
This document describes how to use the PubChem score matrix web service, which can be found at the following URL: https://pubchem.ncbi.nlm.nih.gov/score_matrix/
The score matrix service computes matrices of similarity scores for PubChem compound database identifiers. This functionality is also available though PubChem's PUG service.
The size of the matrix that may be computed through this service is limited, though any number of requests may be submitted. Requests are processed in the order received. All requests are kept private; without your unique 64-bit key, nobody else can see what identifiers you have submitted.
1) Choose a scoring method from those below; more choices may be added in the future.
- 2D Similarity: Substructure key-based 2D Tanimoto similarity. Scores are in the range [0 .. 100].
- 3D Similarity, shape: 3D similarity, optimized by shape overlap. Scores are in the range [0 .. 100].
- 3D Similarity, feature: 3D similarity, optimized by feature overlap. Scores are in the range [0 .. 100].
2) When 3D scoring is requested, the results are conformer-specific. One may select a number of conformers per compound to include in the matrix, and scores for each conformer will be computed separately. Note that not every compound will have any 3D conformers, or as many conformers as requested. By default, if a given CID has no 3D conformer computed for it, but its parent structure has, the parent CID will automatically be substituted in the matrix; check "do not substitute 3D parents" to disable this and return results only for exactly the requested CIDs.
3) Input a list of CIDs as one of the following three formats. If this is the only list supplied (step #2 below is skipped), then these CIDs are scored against themselves; the result may be a diagonal half-matrix, depending on the format.
- String: A simple string containing whitespace or comma-separated integers.
- File: An uncompressed text file from your local computer, containing lines of whitespace or comma-separated integers.
- Entrez History: If searches have been done in the PubChem Compound database, then a menu will appear listing the available search results, each with a brief description and count.
4) Optionally input a second list of CIDs, in which case a full matrix will be returned, with list #1 scored against list #2.
5) Select a format for the matrix file. The choices are:
- CSV: Comma-separated values, the default.
- Id-Id-Score: A text file where each line contains id - [tab] - id - [tab] - score.
6) Select a compression method for the matrix file. The choices are:
7) Initiate the request:
- Submit Job: Begin the computation. You will be taken to a self-refreshing waiting page while your request is queued on NCBI's servers. When done, the URL of the final matrix file is displayed. Use this URL to download the results to your computer.
- Save Job: Save the XML specification of this request, mainly for use with PubChem's PUG service.