Pathway prediction in enviPath can now be performed in batch mode. This feature accepts a .tsv file containing SMILES representations and compound names as input. It processes the file and returns a .tsv output containing the predicted transformation products (TPs) along with relevant metadata. Batch mode is designed to support experimentalists in generation of suspect screening lists, ideal for non-target analysis.
The batch prediction tool requires the following three inputs:
The batch mode uses a greedy-search algorithm where:
The probability of a reaction (p_edge) is obtained from the machine learning-based relative reasoning algorithm.
The probability of a child node generated during pathway search, also known as the combined probability, is determined based on the probability of the parent node and the reaction probability, as illustrated in the below figure.
All transformation products (TPs) resulting from reactions with a probability greater than zero are stored in a priority queue, ordered in descending order based on their combined probability. The greedy algorithm makes a locally optimal decision at each step by always expanding the node with the highest combined probability first.
Pathway exploration continues until either the user-defined threshold for the number of TPs is reached, or there are no more TPs remaining in the priority queue to expand.
In some cases, compounds may have fewer predicted TPs than the user-defined threshold, or no TPs predicted at all. This can happen for either of the following reasons: