scine_chemoton.gears.pathfinder¶

Classes

Pathfinder(db_manager)

A class to represent a list of reactions as a graph and query this graph for simple paths between two nodes.

Exceptions

DifferentSubgraphsError

exception scine_chemoton.gears.pathfinder.DifferentSubgraphsError[source]¶

args¶

with_traceback()¶: Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class scine_chemoton.gears.pathfinder.Pathfinder(db_manager)[source]¶

A class to represent a list of reactions as a graph and query this graph for simple paths between two nodes. In a simple path, every node part of the path is visited only once.

Attributes:

_calculationsdb.Collection: Collection of the calculations of the connected database.
_compoundsdb.Collection: Collection of the compounds of the connected database.
_flasksdb.Collection: Collection of the flasks of the connected database.
_reactionsdb.Collection: Collection of the reactions of the connected database.
_elementary_stepsdb.Collection: Collection of the elementary steps of the connected database.
_structuresdb.Collection: Collection of the structures of the connected database.
_propertiesdb.Collection: Collection of the properties of the connected database.
graph_handler: A class handling the construction of the graph. Can be adapted to one’s needs.
_use_old_iteratorbool: Bool to indicate if the old iterator shall be used querying for paths between a source - target pair.
_unique_iterator_memoryTuple[Tuple[List[str], float], Iterator]: Memory of iterator with the corresponding path and its length as well as the iterator.
start_compoundsList[str]: A list containing the compounds which are present at the start.
start_compounds_setbool: Bool to indicate if start_compounds are set.
_pseudo_inffloat: Float for edges with infinite weight.
compound_costsDict[str, float]: A dictionary containing the cost of the compounds with the compounds as keys.
compound_costs_solvedbool: Bool to indicate if all compounds have a compound cost.

class BarrierBasedHandler(db_manager, model, structure_model=<scine_database.Model object>)[source]¶

A class derived from the basic graph handler class to encode the reaction barrier information in the edges. The barriers of the elementary step with the minimal TS energy of a reaction are employed. The barriers are converted to rate constants, normalized over all rate constants in the graph and then the cost function \(|log(normalized\ rate\ constant)|\) is applied to obtain the weight.

Attributes:

temperaturefloat: The temperature for calculating the rate constants from the barriers. Default is 298.15 K.
_rate_constant_normalizationfloat: The factor to normalize the rate constant.

add_reaction(reaction)¶

Add a reaction to the graph. Each reaction node represents the LHS and RHS. Hence every reagent of a reaction is connected to every product of a reaction via one reaction node.

For instance:

A + B = C + D, reaction R

A -> R1 -> C
A -> R1 -> D
B -> R1 -> C
B -> R1 -> D

C -> R2 -> A
C -> R2 -> B
D -> R2 -> A
D -> R2 -> B

Representing this reaction in the graph yields 4 compound nodes, 2 reaction nodes (same reaction) and 16 edges (2*2*2*2). The weights assigned to the edges depends on the _get_weight implementation.

The edges from a compound node to a reaction node contain several pieces of information:: weight: the weight of the edge 1 if the reaction is not barrierless, otherwise it is set to self.barrierless_weight required_compounds: the IDs of the other reagents of this side of the reaction in a list required_compound_costs: the sum over all compound costs of the compounds in the required_compounds list None by default
The edges from a reaction node to a compound node contain several information:: weight: the weight of the edge, set to 0 required_compounds: the IDs of the other products emerging; added for easier information extraction during the path analysis

Parameters:

reactiondb.Reaction: The reaction to be added to the graph.

get_allowed_reaction_sides(reaction_id)¶

Return type:: Side

get_barrier_limit()[source]¶

Return type:: float

get_temperature()[source]¶

Gets the set temperature.

Returns:

self.temperaturefloat: The set temperature.

get_valid_reaction_ids()[source]¶

Basic filter function for reactions. Per default, it returns all reactions.

Returns:

List[db.ID]: List of IDs of the filtered reactions.

initialize_collections(manager)¶

Return type:: None

static possible_attributes()¶

Return type:: List[str]

set_barrier_limit(barrier_limit)[source]¶

Return type:: None

set_temperature(temperature)[source]¶

Setting the temperature for determining the rate constants.

Parameters:

temperaturefloat: The temperature in Kelvin.

unset_collections()¶

Return type:: None

class BasicHandler(manager, model, structure_model)[source]¶

A basic class to handle the construction of the nx.DiGraph. A list of reactions can be added differently, depending on the implementation of _get_weight and get_valid_reaction_ids.

Attributes:

graphnx.DiGraph: The directed graph.
barrierless_weightfloat: The weight to be set for barrierless reactions.
modeldb.Model: A model for filtering the valid reactions. Per default (“any”, “any”, “any”), reactions are included regardless of the model.
filter_negative_barriersbool: If True, reactions with negative barriers are filtered out.
use_structure_modelbool: If True, the structure model is used to filter out reactions.
structure_modeldb.Model: A model for filtering the valid reactions. Per default (“any”, “any”, “any”), both structure and energy evaluations are based on the same model.
use_only_enabled_aggregatesbool: If True, only enabled aggregates are used.

add_reaction(reaction)[source]¶

Add a reaction to the graph. Each reaction node represents the LHS and RHS. Hence every reagent of a reaction is connected to every product of a reaction via one reaction node.

For instance:

A + B = C + D, reaction R

A -> R1 -> C
A -> R1 -> D
B -> R1 -> C
B -> R1 -> D

C -> R2 -> A
C -> R2 -> B
D -> R2 -> A
D -> R2 -> B

Representing this reaction in the graph yields 4 compound nodes, 2 reaction nodes (same reaction) and 16 edges (2*2*2*2). The weights assigned to the edges depends on the _get_weight implementation.

The edges from a compound node to a reaction node contain several pieces of information:: weight: the weight of the edge 1 if the reaction is not barrierless, otherwise it is set to self.barrierless_weight required_compounds: the IDs of the other reagents of this side of the reaction in a list required_compound_costs: the sum over all compound costs of the compounds in the required_compounds list None by default
The edges from a reaction node to a compound node contain several information:: weight: the weight of the edge, set to 0 required_compounds: the IDs of the other products emerging; added for easier information extraction during the path analysis

Parameters:

reactiondb.Reaction: The reaction to be added to the graph.

get_allowed_reaction_sides(reaction_id)[source]¶

Return type:: Side

get_valid_reaction_ids()[source]¶

Basic filter function for reactions. Per default, it returns all reactions.

Return type:

List[ID]

Returns:

List[db.ID]: List of IDs of the filtered reactions.

initialize_collections(manager)¶

Return type:: None

static possible_attributes()¶

Return type:: List[str]

unset_collections()¶

Return type:: None

class Options[source]¶

A class to vary the setup of Pathfinder.

barrier_limit: float¶

float: The maximum barrier for elementary steps to be included in the graph. Only valid with ‘barrier’ graph handler

barrierless_weight: float¶

float: The weight for barrierless reactions (basic) and rate constant (barrier), respectively.

filter_negative_barriers: bool¶

bool: Forbid elementary steps with negative barriers or not.

graph_handler: str¶

str: A string indicating which graph handler shall be used (available are : ‘basic’ and ‘barrier’).

model: Model¶

db.Model: The model for the energies of compounds to be included in the graph.

structure_model: Model¶

db.Model: The model for the structures of compounds to be included in the graph.

temperature: float¶

float: The temperature in Kelvin for the rate constant calculation.

unset_collections()¶

Duplicate name to HoldCollections method to be triggered in pickling process, so infinite _parent loops are avoided.

Return type:: None

use_only_enabled_aggregates: bool¶

bool: Allow only elementary steps with a given model.

use_structure_model: bool¶

bool: Allow only elementary steps with a given model.

build_graph()[source]¶: Build the nx.DiGraph() from a list of filtered reactions.

calculate_compound_costs(recursive=True)[source]¶

Determine the cost for all compounds via determining their shortest paths from the start_compounds. If this succeeds, set compound_costs_solved to True. Otherwise it stays False.

The algorithm works as follows: Given the starting conditions, one loops over the individual starting compounds as long as: - the self._pseudo_inf entries in self.compound_costs are reduced - for any entry in self.compounds_cost a lower cost is found With each starting compound, one loops over compounds which have yet no cost assigned. For each start - target compound pair, the shortest path is determined employing Dijkstra’s algorithm. The weight function checks the weight of the edges as well as the costs of the required compounds listed in the required_compounds of the traversed edges. If the path exceeds the length of self._pseudo_inf, this path is not considered for further evaluation. The weight of the starting compound is added to the tmp_cost. If the target compound has no weight assigned yet in compound_costs OR if the target compound has a weight assigned which is larger (in compound_costs as well as in tmp_compound_costs) than the current tmp_cost is written to the temporary storage of tmp_compound_costs.

After the loop over all starting compounds completes, the collected costs for the found targets are written to compound_costs. The convergence variables are updated and the while loop continues.

Parameters:

recursivebool: All compounds are checked for shorter paths, True by default. If set to False, compounds for which a cost has been determined are not checked in the next loop.

Notes

Checks if the start compounds are set.
Checks if the graph contains any nodes.

export_compound_costs(filename='compound_costs.json')[source]¶

Export the compound cost dictionary to a .json file.

Parameters:

filenamestr, optional: Name of the file to write compound costs into, by default “compound_costs.json”

export_graph(filename='graph.json')[source]¶

Export the graph without compound costs as dictionary to .json file.

Parameters:

filenamestr, optional: Name of the file to write graph into, by default “graph.json”.

extract_connected_graph(included_nodes)[source]¶

Extract a connected subgraph from a given graph and a given list of nodes.

Return type:

DiGraph

Parameters:

included_nodesList[str]: A list of nodes which should be included in the graph.

Returns:

selected_subgraphnx.DiGraph: The connected subgraph including the requested nodes.

find_paths(source, target, n_requested_paths=3, n_skipped_paths=0)[source]¶

Query the build graph for simple paths between a source and target node.

Return type:

List[Tuple[List[str], float]]

Parameters:

sourcestr: The ID of the starting compound as string.
targetstr: The ID of the targeted compound as string.
n_requested_pathsint: Number of requested paths, by default 3
n_skipped_pathsint: Number of skipped paths from, by default 0. For example, when four paths are found (n_requested_paths=4) and n_skipped_paths=2, the third, fourth, fifth and sixth path are returned. Therefore, this allows setting the starting point of the query.

Returns:

found_pathsList[Tuple[List[str] float]]: List of paths where each item (path) consists of the list of nodes of the path and its length.

Notes

Requires a built graph

find_unique_paths(source, target, number=3, custom_weight='weight')[source]¶

Find a unique number of paths from a given source node to a given target node. Paths can have the same total length (in terms of sum over edge weights), but if one is solely interested in one path of paths with identical length, the shortest (in terms of length) longest (in terms of number of nodes) path is returned. This is called the unique path (shortest longest path).

Return type:

List[Tuple[List[str], float]]

Parameters:

sourcestr: The ID of the starting compound as string.
targetstr: The ID of the targeted compound as string.
numberint: The number of unique paths to be returned. Per default, 3 paths are returned.

Returns:

path_tuple_listList[Tuple[List[str], float]]: List of paths where each item (path) consists the list of nodes of the path and its length.

Notes

- Checks if a stored iterator for the given source-target pair should be used.

- Maximal ten paths with identical length are compared.

get_elementary_step_sequence(path)[source]¶

Prints the sequence of elementary steps of a path with the compounds written as molecular formulas with multiplicity and charge as well as the final cost of the path. Reactant node is returned in red, product node in blue to enhance readability.

Return type:

str

Parameters:

pathTuple[List[str] float]: Path containing a list of the traversed nodes and the cost of this path.

Returns:

str: A string of the elementary step sequence of a given path.

get_overall_reactants(path)[source]¶

Summarize the overall reactants of a given path. Count the appearance of compounds in a reaction, -1 for reactants and +1 for products.

Return type:

List[List[Tuple[str, float]]]

Parameters:

pathList[str]: Path containing a list of the traversed nodes.

Returns:

: List[List[Tuple[str, float]]]: A tuple containing the reactants and products of a given path as list of tuples consisting of the aggregate ID and its factor.

get_overall_reaction_equation(path)[source]¶

Summarize a given path to a reaction equation and return its string. Count the appearance of compounds in a reaction, -1 for reactants and +1 for products. Returns the factor and the compound as a molecular formula.

Return type:

str

Parameters:

pathList[str]: Path containing a list of the traversed nodes.

Returns:

str: A string of the overall reaction equation of a given path.

static get_valid_graph_handler_options()[source]¶

Return type:: List[str]

initialize_collections(manager)¶

Return type:: None

load_graph(graph_filename, compound_cost_filename='')[source]¶

Initialize a basic graph handler with default settings. The graph is imported from the given file and set as the graph of the graph handler. Optionally, the compound costs are imported and set as compound_cost. The compound costs are considered to be solved. The graph is automatically updated with the compound costs.

Parameters:

graph_filenamestr: Name of the .json file containing the graph.
compound_cost_filenamestr: Name of the .json file containing the compound costs.

static possible_attributes()¶

Return type:: List[str]

reset_graph_compound_costs()[source]¶

Reset the ‘weight’ of edges from compound to reaction nodes by subtracting the required compound costs. Allows to re-calculate the compound costs under different starting conditions.

Notes

Checks if the compound costs have successfully been determined.

set_start_conditions(conditions)[source]¶

Add the IDs of the start compounds to self.start_compounds and add entries for cost in self.compound_cost.

Parameters:

conditionsDict[str float]: The IDs of the compounds as keys and its given cost as values.

unset_collections()¶

Return type:: None

update_graph_compound_costs()[source]¶

Update the ‘weight’ of edges from compound to reaction nodes by adding the compound costs. The compound costs are the sum over the costs stored in self.compound_costs of the required compounds. The edges of the resulting graph contain a weight including the required_compound_costs based on the starting conditions. All analysis of the graph therefore depend on the starting conditions.

Notes

Checks if the compound costs have successfully been determined.
Checks if the graph has been updated with the compound costs.