Table of Contents
References to External Databases
Compounds and reactions in enviPath can be automatically linked to external databases.At the Compound level, we've integrated links to PubChem, ChEBI, and KEGG. Reactions, on the other hand, can now be directly linked to Rhea, a curated database of biochemical reactions.This functionality is available on the package, pathway, compound, compound structure or reaction upon the selection of the “Update References” option on the “Actions” dropdown. This will trigger an automatic workflow that will link the compounds and reactions to external databases.The “Update References” feature on the Package page, will update all the compounds and reactions of the package, while for the Pathway page only the compounds and reactions within the pathway will be updated and allowing reaching the highest granularity on compound, compounds structure and reaction page.To optimize resource usage in enviPath, references can be updated only once per compound or reaction.If new references want to be added please contact us.
Algorithm logic
Updating references of a compound
- Check if the compound is a labelled compound, i.e. 14C labelled compound. If so, we do not check for external references.
- Try to find a match1 on PubChem using the following InChIkeys:
- Use the isomeric SMILES to calculate the InChIkey (mild canonicalization)
- Neutralise the molecule and calculate its InChIkey (charge removal)
- Canonicalize the SMILES, removing stereochemical information and calculate its InChIkey (strong canonicalization)
- Use the SMILES obtained on the previous steps a, b, c and search on PubChem using them.
- If any PubChem identifier was obtained from the previous workflow, these are used to extract all the synonyms of each identifier and the corresponding ChEBI and KEGG identifiers are extracted by pattern matching.
- We expand the amount of ChEBI identifiers by using their REST API library libChEBIj and we fetch compounds that are base/acid conjugate, enantiomers or tautomers of the first one.
Updating references of a reaction
- For each substrate and product, check if the references were updated at any point. If not, update their references.
- Check if all substrates and products have a ChEBI identifier. If so, use it to query Rhea identifiers available using the Rhea REST API
1. We find matches by performing a request to PubChem’s REST API using the compound domain. When using InChiKey we use “inchikey” as namespace and for SMILES we use “smiles” as namespace.