PubChem Entrez Indices and Filters

The PubChem index search is a very powerful tool within the Entrez system. Users can simply type search term(s) followed by the bracketed index field name. Then click the "Go" button.

 

Usage Examples

 

Search for DTP/NCI's substance record with NSC#78:

On the PubChem homepage or Entrez PubChem Substance search page, enter "DTP/NCI[Sourcename], 78[objectid]" in the search box, and then click the Go button.

Search for all compounds containing gold:

On the PubChem homepage or Entrez PubChem Compound search page enter "Au[el]", and then click the Go button.  

Search for all compounds with heavy atom count between 10 and 12:

On the PubChem homepage or Entrez PubChem Compound search page enter "10:12[hac]", and then click the Go button.

 

The following fields can be searched within Entrez PubChem databases (with field aliases in square brackets; pick one alias that's easily memorized in case multiple aliases are available). For integer/real number fields, the range search can be done as shown above. Some indices and filters were removed in October 2016. Click here to see the list.

 

 

PubChem Compound Indices and Filters

 

All [ALL]: All of the following fields are searched. If a string query is presented without a field alias, by default, [ALL] is searched.
Uid [UID]: The integer represents CID for each Pccompound database. By default, an integer without a field alias is recognized as a UID. Same as [CID].
Filter [Filter]: Limits the records. A number of filters are available to restrict the search to compounds with particular information. The specialized Filters in this database are:

  • has_3d_conformer: records have 3d conformers
  • has_dailymed: records with associated dailymed info
  • has_mesh: records with associated MeSH terms
  • has_pharm: records with associated pharmacological actions
  • has_parent: records that have a parent structure
  • has_patent: records with associated patent info
  • has_no_parent: records that do not have a parent
  • has_src_nih_mlp: records generated from NIH Molecular Libraries Program
  • has_src_vendor: records with vendors info

ActiveAidCount [AC, ACNT]: Using this filter users can query for compounds which are active in a certain number of assays.
ActiveAidRatio [AAR]: Ratio should be between zero and 1. Ratio equals to the number of BioAssays where compounds were tested active divided by number of BioAssays where compounds tested with any result.
AtomChiralCount [ACC, ACCNT]: Total count of chiral atoms in a given compound, integer.
AtomChiralDefCount [ACDC, ACDCNT]: Total count of defined chiral atoms in a given compound, integer.
AtomChiralUndefCount [ACUC, ACUCNT]: Total count of undefined chiral atoms in a given compound, integer.
BondChiralCount [BCC, BCCNT]: Total count of chiral bonds in a given compound, integer.
BondChiralDefCount [BCDC, BCDCNT]: Total count of defined chiral bonds in a given compound, integer.
BondChiralUndefCount [BCUC, BCUCNT]: Total count of undefined chiral bonds in a given compound, integer.
CompleteSynonym [CSYN, CSYNO]: Compound's synonyms, based on all substance related to this compound.
Complexity [CPLX]: Compound complexity.
CompoundID [CID]: Compound ID. Same as [UID].
CovalentUnitCount [CUC, CUCNT]: Integer.
CreateDate: Date this compound created in PubChem.
Element [ELMT, EL]: Chemical element in a compound.
ExactMass [EMAS, EXMASS]: The calculated mass of an ion or a molecule containing most likely isotopic composition for a single random molecule, corresponding to mass of most intense mol/molecule peak in a MS spec. A real number.
HeavyAtomCount [HAC, HACNT]: Atom count in a compound except hydrogen, integer.
HydrogenBondAcceptorCount [HBAC, HBACNT]: Hydrogen bond acceptors for a compound, integer.
HydrogenBondDonorCount [HBDC, HBDCNT]: Hydrogen bond donors for a compound, integer.
InChI [INCH, INCHI]: Standard IUPAC International Chemical Identifier.
InChIKey [INCHIKEY]: Standard IUPAC International Chemical Identifier Key.  

 

InChI string and InChIKey can be searched through the Entrez PubChem databases.

For instance, to search with the InChIKey of aspirin: "BSYNRYMUTXBXSQ-UHFFFAOYSA-N": 

type or paste "BSYNRYMUTXBXSQ-UHFFFAOYSA-N"[InChIKey] into the PubChem Compound, or PubChem Substance, or the Entrez Global search box, then click Go button.

Note:
     The quote marks and the square brackets are required.
     'InChI=' is required when search with an InChI string. 

IsotopeAtomCount [IAC, IACNT]: Isotope atom numbers in a compound.
IUPACName [UPAC, IUPAC]: Standard IUPAC name for compound.
MeSHTerm [MSHT, MESHT]: Medical Subject Heading term. Note that MeSH entry terms (synonyms for the Medical Subject Heading term) are also indexed.
MolecularWeight [MW, MWT, MOLWT]: Mass of a molecule calculated using the average mass of each element weighted for its natural isotopic abundance. E.g., Carbon has two natural isotopes 12 and 13 with relative abundances of 98.9% and 1.1% to yield an average mass of 12.011 g/mol. A real number.
MonoisotopicMass [MMAS, MIMASS]: Mass of a molecule calculated using the mass of the most abundant isotope of each element. E.g., Carbon has a monoisotopic mass of 12.000 g/mol. A real number.
PharmAction [PHMA, PHARMA]: MeSH pharmacological actions.
RotatableBondCount [RBC, RBCNT]: Count of rotatable bonds
SourceName [SRC, SRCNAM, SRCNAME]: Depositor name officially recorded in PubChem databases. See current list of PubChem data sources.
SourceCategory [SRCC, SRCCAT, SRCCATG]: Depositor categories.
SubstanceID [SID]: Substance identifier, integer.
Synonym [SYNO]: Synonyms for substance.
TotalAidCount [TAC]: TotalAidCount includes any assay that a compound is tested, it should cover active/inactive/inconclusive/unspecified.
TotalFormalCharge [TFC, CHG, CHRG]: Total formal charge.
TPSA[TPSA]: Topological Polar Surface Area.
XLogP [XLGP, LOGP].

 

 

PubChem Substance Indices and Filters

 

All [ALL]: All of the following fields are searched. If a string query is presented without a field alias, by default, [ALL] is searched.
Uid [UID]: The integer represents SID for Pcsubstance database. By default, an integer without a field alias is recognized as a UID. Same as [SID].
Filter [Filter]: Limits the records. A number of filters are available to restrict the search to substances with particular information. The specialized Filters in this database are:

  • has_autogen_on: records where structure is to be generated from synonyms
  • has_autogen_success: records where structure successfully generated from synonyms
  • has_deposited_3d: records with associated computational 3D info
  • has_deposited_3d_experimental: records with associated experimental 3D info
  • has_patent: records with associated patent info
  • has_src_nih_mlp: records generated from NIH Molecular Libraries Program
  • has_src_vendor: records with vendors info
  • hasnohold: records that are not on hold
  • hasonhold: records that are on hold

AssaySourceName [ASRC, ASRCNAM, ASRCNAME]: Allows filtering of by assay source name. See current list of PubChem data sources.
Comment [CMT]: Substance or BioAssay comment.
CompleteSynonym [CSYN, CSYNO]: Compound's synonyms, based on all substance related to this compound.
ComponentCID [CCID]: Component compound identifier.
CompoundID [CID]: Compound identifier, integer.
DepositDate [DDAT, DEPDAT]: Deposition timestamp for a substance.
ModifyDate: Date this substance record is modified.
SourceCategory [SRCC, SRCCAT, SRCCATG]: Depositor categories.
SourceID [SRID, SRCID]: Depositor's external id.
SourceName [SRC, SRCNAM, SRCNAME]: Depositor name officially recorded in PubChem databases. See current list of PubChem data sources.
SourceReleaseDate [SRD, SRDAT, RLSDAT]
StandardizedCID [SCID]: Standardized compound identifier, integer.
SubstanceID [SID]: Substance ID. Same as [UID].
Synonym [SYNO]: Synonyms for substance.
TotalAidCount [TAC] 

 

 

PubChem BioAssay Indices and Filters

 

All [ALL]: All of the following fields are searched. If a string query is presented without a field alias, by default, [ALL] is searched.
Uid [UID]: The integer represents AID for Pcassay database. By default, an integer without a field alias is recognized as a UID.
Filter [Filter]: Limits the records. A number of filters are available, to retrieve records in the same or other databases that the current BioAssay records are cross-referenced to.

  • mlp: assay records contributed by Molecular Library Program (MLP).  Note that MLP includes both previous phase Molecular Library Screening Center Network (MLSCN) and current phase Molecular Library Probe production Center Network (MLPCN).
  • all or pcassay_all: all assays.
  • active_concentration: assay records with 'active concentration' attribute provided.
  • screening: assay of the 'Screening' activity outcome method category.
  • confirmatory: assay of the 'Confirmatory' activity outcome method category.
  • summary: assay of the 'Summary' activity outcome method category.
  • pcassay_biosystems_active: assay records with BioSystems link via active compounds.
  • pcassay_biosystems_target: assay records with BioSystems link via protein target.
  • pcassay_gene: assay records with gene information provided.
  • pcassay_nuccore: assay records with nucleotide link provided.
  • pcassay_nuccore_rna_target: assay records with RNA target provided.
  • pcassay_nucleotide: assay records with nucleotide link provided.
  • pcassay_nucleotide_rna_target: assay records with RNA target provided.
  • pcassay_omim: assay records with omim link provided.
  • pcassay_pathway: assay records with pathway link provided.
  • pcassay_pcassay: another filter for all assays.
  • pcassay_pcassay_active: assays that contain active results.
  • pcassay_pcassay_activityneighbor: assay records with activity overlap based related bioassays.
  • pcassay_pcassay_neighbor: assay records related bioassays which are provided by PubChem depositors.
  • pcassay_pcassay_neighbor_summary: assay records with summary for related bioassays which are provided by PubChem depositor.
  • pcassay_pcassay_targetneighbor: assay records with target similarity based related bioassays.
  • pcassay_protein_target: assay records with protein target provided.
  • pcassay_protein_target_pig: assay records with protein targets that are similar to PIG proteins.
  • pcassay_pubmed: assay records with pubmed link provided.
  • pcassay_taxonomy: assay records with taxonomy link provided.
  • pcassay_structure: assay records with protein structure link provided.
  • pcassay_pmc: assay records with pmc link provided.
  • pcassay_pccompound: assay records with PubChem compound link provided.
  • pcassay_pccompound_active: assay records with active PubChem compound link provided.
  • pcassay_pccompound_inactive: assay records with inactive PubChem compound link provided.
  • pcassay_pccompound_inconclusive: assay records with inconclusive PubChem compound link provided.
  • pcassay_pccompound_probe: assay records with chemical probe PubChem compound link provided.
  • pcassay_pcsubstance: assay records with PubChem substance link provided.
  • pcassay_pcsubstance_active: assay records with active PubChem substance link provided.
  • pcassay_pcsubstance_inactive: assay records with inactive PubChem substance link provided.
  • pcassay_pcsubstance_inconclusive: assay records with inconclusive PubChem substance link provided.
  • pcassay_pcsubstance_probe: assay records with chemical probe PubChem substance link provided.
  • Rnai: assay records containing screening data for RNAi.
  • Small_molecule: assay records containing screening data for chemicals.

ActiveSidCount [AC, ACNT]: Number of substances (identified by SID--substance identifier from Pcsubstance) that are considered as active in a BioAssay.
Activity Outcome Method [ACMD]: Description on how activity outcome is determined. Choices of search query include:

  • screening: reports number of 'Screening' assay - Single Concentration Activity Observed: Activity outcome was defined based on the percentage of inhibition from test at a single dose.
  • confirmatory: reports number of 'Confirmatory' assay - Concentration-Response Relationship Observed: Activity outcome was defined based on EC50/IC50 values and so forth, derived from dose response curves following tests with multiple concentrations.
  • summary: reports number of 'Summary' assay - Candidate Probes/Leads with Supporting Evidence: An assay which summarizes information from multiple assays.
  • other: reports number of assays in the 'Other' category - An assay which does not fall into the above categories.

AssayComment [ACMT, ACMMNT]: comment for a BioAssay provided by depositor.
AssayDescription [ADES, ADESC, ADSC]: Description for the BioAssay provided by depositor.
AssayName [ANAM, ANAME]: Name of a BioAssay provided by depositor.
AssayProtocol [APRL, APRTL]: Protocol for a BioAssay provided by depositor.
AssaySourceID [ASRD, ASRID]: External assay source identifier.
DepositDate [DDAT, DDATE]: Date when BioAssay record is deposited into PubChem. Date format is yyyy/mm/dd. mm and dd are optional.
GrantNumber [GRN,GRNUM]: NIH Grant Numbers.

ModifyDate [MDAT, MDATE]: Last date when a BioAssay data content is modified. Date format is yyyy/mm/dd. mm and dd are optional.
NucleicAcidReagentID [NARD,NARID]: NCBI Probe Database identifiers(ProbeDB ID) referred by BioAssay.
PigGI [PIGI,PIGGI]: Identical sequence NCBI Protein GI number similar to a BioAssay target.
ProbeCidCount [ACC, ACCNT]: Number of unique chemicals (identified by CID--compound identifiers from Pccompound) that are considered as probe in a BioAssay.
ProteinTargetGI [PTGI]: NCBI Protein GI number of a BioAssay protein target.
ProteinTargetName [PTN]: NCBI Protein name of a BioAssay protein target.
RNATargetGI [NARD]: NCBI Nucleotide GI number of a BioAssay nucleotide target.
ReleaseDate [RDAT, RDATE]: Date when a BioAssay data is released to public by PubChem. Date format is yyyy/mm/dd. mm and dd are optional.
SourceCategory [SRCC, SRCCAT, SRCCATG]: Category of BioAssay data source.
SourceName [SNME, SNAME]: Source name of a BioAssay data specified by depositor.
SynonymTested [SYNT]: MESH names and synonyms that are associated with any chemical structure tested in a BioAssay.
TaxonomyName [TXNM,TXNAM,TXNAME]: NCBI Entrez Taxonomy name.
TotalSidCount [TSC]: Total number of substances tested in a BioAssay.

Was this information helpful?

The page cannot be found

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please make sure you spelled the page name correctly or use the search box.