Significant Update to PubChemRDF!
Posted on June 23, 2015
PubChemRDF 1.5β is now available. The new version is faster, supports linked data in new formats, features improved search and query functions, and contains new links.
PubChemRDF expresses data in a Resource Description Framework (RDF) format using ontological frameworks and semantic web technologies. It facilitates data sharing and analysis, and integrates with other National Center for Biotechnology Information (NCBI) resources along with external resources across scientific domains. To learn more about this project, please see our earlier blog post and PubChemRDF release notes.
The 1.5β release contains a number of new features and technological improvements including:
- Faster Speed
PubChemRDF data is now served from a triple-store and provides a noticeable speed improvement, especially for records with lots of data. Previously, RDF was generated on the fly from data stored in disparate data systems.
- Addition of MeSH
Major improvements were made to the reference subdomain. Most notable is the addition of Medical Subject Heading (MeSH) annotation of PubMed records. This includes MeSH topical descriptors (with optional qualifier) that indicate the subject of an article and MeSH (supplementary) concepts that indicate things like chemicals and diseases discussed in an article.
- Direct links to authoritative RDF resources
PubChemRDF now enhances cross-integration by providing direct links to available authoritative RDF resources within applicable subdomains, including: reference, synonym, and inchikey to MeSH RDF; protein to UniProt RDF; protein and substance to PDB RDF; biosystem to Reactome RDF; substance to ChEMBL RDF; and compound to WikiData RDF. For example, the links to PDB RDF help to distinguish proteins and associated chemical substances found in a Protein Data Bank (PDB) crystal structure.
- Addition of ‘concept’ subdomain
A new ‘concept’ subdomain provides the means to annotate PubChemRDF subdomains. For example, annotation between nodes within the concept subdomain allows a hierarchy of concepts to be created, such as those in the WHO ATC classification. These can then be applied, such as in the case of adding links from chemical substance synonyms to a WHO ATC classification to indicate its therapeutic and pharmacological properties.
- New links added between the compound and biosystem subdomains
Previously, the biosystem subdomain linked only to the protein subdomain. The added links between the compound and biosystem subdomains help to indicate the chemical structure involved in a given pathway.
- Support for protein complexes
Protein complex targets are now distinguished within the bioassay subdomain and are linked to the component protein units.
- Linked Data using JSON
- Substring searches
PubChemRDF REST interface now provides a substring search. For example, this returns chemical substance synonyms that contain the string “aspirin”:
- Simple SPARQL-like query functions
PubChemRDF REST interface provides simple SPARQL-like query capabilities for grouping and filtering relevant resources. For instance, the following query can retrieve the ChEBI class assignments for PubChem substances:
- To read more on this topic, please consider exploring these links:
- PubChemRDF Release Notes
- PubChemRDF Initial release announcement