PubChemRDF 1.7β has been released


Posted on October 19, 2020


A significant update has been made to PubChemRDF, machine-readable PubChem data formatted using the Resource Description Framework (RDF) (  (If you have never heard about PubChemRDF before, please read this PubChem blog first.)


What is PubChemRDF?

RDF is a World Wide Web Consortium (W3C) standard model for data interchange on the web.  In RDF, knowledge is expressed as statements, each of which consists of three discrete parts: a subject, an object, and a predicate that specifies the relationship between them.  So, the trio of these parts is called a triple.  For example, the sentence “asbestos can cause mesothelioma” consists of “asbestos” (subject), “mesothelioma” (object) and “can cause” (predicate).  Similarly, the sentence “ethanol is metabolized to acetaldehyde” can be broken down into a triple of “ethanol” (subject), “acetaldehyde”, “is metabolized to” (predicate).  In essence, RDF expresses knowledge into a directed, labeled graph.


PubChemRDF refers to the RDF-formatted PubChem data.  It contains information on various entities in PubChem (chemicals, bioassays, genes, proteins, pathways, literature, etc.) and their relationships.  With PubChemRDF, researchers can work with PubChem data using Semantic Web technologies (  In addition, PubChemRDF facilitates PubChem data sharing, analysis, and integration with data from other resources.


PubChemRDF 1.7-beta


What’s new in PubChemRDF 1.7β?

  • Updated vocabularies
    To define the semantic relationships (that is, predicates) between entities (subjects and objects), PubChemRDF uses pre-existing, domain-specific ontological frameworks (rather than creating new ones), such as Chemical Entities of Biological Interest (ChEBI) , CHEMical INFormation ontology (CHEMINF), Protein Ontology (PRO), Gene Ontology (GO), BioAssay Ontology (BAO), among others.  Since PubChemRDF was first introduced, some terms in these ontologies were deprecated or replaced with new ones.  These changes are now reflected in PubChemRDF 1.7β.
  • New subdomain
    In PubChemRDF 1.7β, a new subdomain, called Pathway, is added to encode information on biological pathways and their relationship with genes, proteins, and chemicals. This Pathway subdomain supersedes the BioSystem subdomain used in the previous versions of PubChemRDF.
  • GI to accession
    In the previous versions, numeric identifiers called GI numbers were used to denote proteins or genes.  However, NCBI phased out the use of GI numbers in its databases, as explained in a series of blog posts.  Accordingly, changes have been made to allow one to access PubChemRDF data using the ‘accession’ identifiers.

Where can I learn more about PubChemRDF 1.7β?

To learn more about this topic, please read the following:


Was this information helpful?


National Library of Medicine
8600 Rockville Pike
Bethesda, MD 20894


PubChem Help
HHS Vulnerability Disclosure


The page cannot be found

The page you are looking for might have been removed, had its name changed, or is temporarily unavailable. Please make sure you spelled the page name correctly or use the search box.