pathway_prediction_in_batch_mode
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
pathway_prediction_in_batch_mode [2025/04/28 11:31] – shankar | pathway_prediction_in_batch_mode [2025/05/22 09:53] (current) – shankar | ||
---|---|---|---|
Line 2: | Line 2: | ||
Pathway prediction in enviPath can now be performed in batch mode. This feature accepts a .tsv file containing SMILES representations and compound names as input. It processes the file and returns a .tsv output containing the predicted transformation products (TPs) along with relevant metadata. Batch mode is designed to support experimentalists in generation of suspect screening lists, ideal for non-target analysis. | Pathway prediction in enviPath can now be performed in batch mode. This feature accepts a .tsv file containing SMILES representations and compound names as input. It processes the file and returns a .tsv output containing the predicted transformation products (TPs) along with relevant metadata. Batch mode is designed to support experimentalists in generation of suspect screening lists, ideal for non-target analysis. | ||
+ | |||
+ | ==== Input requirements ==== | ||
+ | |||
+ | The batch prediction tool requires the following three inputs: | ||
+ | |||
+ | |||
+ | * **Input file**: A .tsv file containing SMILES strings and compound names,** without a header**. A **template input file** can be downloaded [[https:// | ||
+ | * **Relative Reasoning model**: The relative reasoning model that will be used for the predictions. A default model is pre-selected, | ||
+ | * **Number of transformation products (TPs)**: The maximum number of TPs to predict per input compound. The default value is set to 30 TPs per compound, but users can choose any number between 1 and 50. | ||
+ | |||
+ | ==== Algorithm Description ==== | ||
+ | |||
+ | The batch mode uses a greedy-search algorithm where: | ||
+ | * Compounds are represented as Nodes | ||
+ | * Biotransformation reactions are represented as edges, each assigned a weight corresponding to the predicted probability of the reaction occurring, based on available data and competing reaction pathways. | ||
+ | |||
+ | The probability of a reaction (p_edge) is obtained from the machine learning-based relative reasoning algorithm. | ||
+ | |||
+ | The probability of a child node generated during pathway search, also known as the **combined probability**, | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | All transformation products (TPs) resulting from reactions with a probability greater than zero are stored in a priority queue, ordered in descending order based on their combined probability. | ||
+ | The greedy algorithm makes a locally optimal decision at each step by always expanding the node with the highest combined probability first. | ||
+ | |||
+ | Pathway exploration continues until either the user-defined threshold for the number of TPs is reached, or there are no more TPs remaining in the priority queue to expand. | ||
+ | |||
+ | In some cases, compounds may have fewer predicted TPs than the user-defined threshold, or no TPs predicted at all. | ||
+ | This can happen for either of the following reasons: | ||
+ | * No applicable transformation rules were available beyond a certain point, causing the pathway search to stop. | ||
+ | * The probabilities for all predicted transformation products were zero, preventing further expansion. | ||
pathway_prediction_in_batch_mode.1745839903.txt.gz · Last modified: 2025/04/28 11:31 by shankar