scine_chemoton.gears.pathfinder¶
Classes
|
A class to represent a list of reactions as a graph and query this graph for simple paths between two nodes. |
Exceptions
- exception scine_chemoton.gears.pathfinder.DifferentSubgraphsError[source]¶
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class scine_chemoton.gears.pathfinder.Pathfinder(db_manager)[source]¶
A class to represent a list of reactions as a graph and query this graph for simple paths between two nodes. In a simple path, every node part of the path is visited only once.
- Attributes:
- _calculationsdb.Collection
Collection of the calculations of the connected database.
- _compoundsdb.Collection
Collection of the compounds of the connected database.
- _flasksdb.Collection
Collection of the flasks of the connected database.
- _reactionsdb.Collection
Collection of the reactions of the connected database.
- _elementary_stepsdb.Collection
Collection of the elementary steps of the connected database.
- _structuresdb.Collection
Collection of the structures of the connected database.
- _propertiesdb.Collection
Collection of the properties of the connected database.
- graph_handler
A class handling the construction of the graph. Can be adapted to one’s needs.
- _use_old_iteratorbool
Bool to indicate if the old iterator shall be used querying for paths between a source - target pair.
- _unique_iterator_memoryTuple[Tuple[List[str], float], Iterator]
Memory of iterator with the corresponding path and its length as well as the iterator.
- start_compoundsList[str]
A list containing the compounds which are present at the start.
- start_compounds_setbool
Bool to indicate if start_compounds are set.
- _pseudo_inffloat
Float for edges with infinite weight.
- compound_costsDict[str, float]
A dictionary containing the cost of the compounds with the compounds as keys.
- compound_costs_solvedbool
Bool to indicate if all compounds have a compound cost.
- class BarrierBasedHandler(db_manager, model, structure_model=<scine_database.Model object>)[source]¶
A class derived from the basic graph handler class to encode the reaction barrier information in the edges. The barriers of the elementary step with the minimal TS energy of a reaction are employed. The barriers are converted to rate constants, normalized over all rate constants in the graph and then the cost function \(|log(normalized\ rate\ constant)|\) is applied to obtain the weight.
- Attributes:
- temperaturefloat
The temperature for calculating the rate constants from the barriers. Default is 298.15 K.
- _rate_constant_normalizationfloat
The factor to normalize the rate constant.
- add_reaction(reaction)¶
Add a reaction to the graph. Each reaction node represents the LHS and RHS. Hence every reagent of a reaction is connected to every product of a reaction via one reaction node.
For instance:
A + B = C + D, reaction RA -> R1 -> CA -> R1 -> DB -> R1 -> CB -> R1 -> DC -> R2 -> AC -> R2 -> BD -> R2 -> AD -> R2 -> BRepresenting this reaction in the graph yields 4 compound nodes, 2 reaction nodes (same reaction) and 16 edges (2*2*2*2). The weights assigned to the edges depends on the
_get_weight
implementation.- The edges from a compound node to a reaction node contain several pieces of information:
weight: the weight of the edge 1 if the reaction is not barrierless, otherwise it is set to self.barrierless_weight required_compounds: the IDs of the other reagents of this side of the reaction in a list required_compound_costs: the sum over all compound costs of the compounds in the required_compounds list None by default
- The edges from a reaction node to a compound node contain several information:
weight: the weight of the edge, set to 0 required_compounds: the IDs of the other products emerging; added for easier information extraction during the path analysis
- Parameters:
- reactiondb.Reaction
The reaction to be added to the graph.
- get_allowed_reaction_sides(reaction_id)¶
- Return type:
Side
- get_temperature()[source]¶
Gets the set temperature.
- Returns:
- self.temperaturefloat
The set temperature.
- get_valid_reaction_ids()[source]¶
Basic filter function for reactions. Per default, it returns all reactions.
- Returns:
- List[db.ID]
List of IDs of the filtered reactions.
- class BasicHandler(manager, model, structure_model)[source]¶
A basic class to handle the construction of the nx.DiGraph. A list of reactions can be added differently, depending on the implementation of
_get_weight
andget_valid_reaction_ids
.- Attributes:
- graphnx.DiGraph
The directed graph.
- barrierless_weightfloat
The weight to be set for barrierless reactions.
- modeldb.Model
A model for filtering the valid reactions. Per default (“any”, “any”, “any”), reactions are included regardless of the model.
- filter_negative_barriersbool
If True, reactions with negative barriers are filtered out.
- use_structure_modelbool
If True, the structure model is used to filter out reactions.
- structure_modeldb.Model
A model for filtering the valid reactions. Per default (“any”, “any”, “any”), both structure and energy evaluations are based on the same model.
- use_only_enabled_aggregatesbool
If True, only enabled aggregates are used.
- add_reaction(reaction)[source]¶
Add a reaction to the graph. Each reaction node represents the LHS and RHS. Hence every reagent of a reaction is connected to every product of a reaction via one reaction node.
For instance:
A + B = C + D, reaction RA -> R1 -> CA -> R1 -> DB -> R1 -> CB -> R1 -> DC -> R2 -> AC -> R2 -> BD -> R2 -> AD -> R2 -> BRepresenting this reaction in the graph yields 4 compound nodes, 2 reaction nodes (same reaction) and 16 edges (2*2*2*2). The weights assigned to the edges depends on the
_get_weight
implementation.- The edges from a compound node to a reaction node contain several pieces of information:
weight: the weight of the edge 1 if the reaction is not barrierless, otherwise it is set to self.barrierless_weight required_compounds: the IDs of the other reagents of this side of the reaction in a list required_compound_costs: the sum over all compound costs of the compounds in the required_compounds list None by default
- The edges from a reaction node to a compound node contain several information:
weight: the weight of the edge, set to 0 required_compounds: the IDs of the other products emerging; added for easier information extraction during the path analysis
- Parameters:
- reactiondb.Reaction
The reaction to be added to the graph.
- class Options[source]¶
A class to vary the setup of Pathfinder.
-
barrier_limit:
float
¶ - float
The maximum barrier for elementary steps to be included in the graph. Only valid with ‘barrier’ graph handler
-
barrierless_weight:
float
¶ - float
The weight for barrierless reactions (basic) and rate constant (barrier), respectively.
-
graph_handler:
str
¶ - str
A string indicating which graph handler shall be used (available are : ‘basic’ and ‘barrier’).
-
model:
Model
¶ - db.Model
The model for the energies of compounds to be included in the graph.
-
structure_model:
Model
¶ - db.Model
The model for the structures of compounds to be included in the graph.
- unset_collections()¶
Duplicate name to HoldCollections method to be triggered in pickling process, so infinite _parent loops are avoided.
- Return type:
-
barrier_limit:
- calculate_compound_costs(recursive=True)[source]¶
Determine the cost for all compounds via determining their shortest paths from the
start_compounds
. If this succeeds, setcompound_costs_solved
toTrue
. Otherwise it staysFalse
.The algorithm works as follows: Given the starting conditions, one loops over the individual starting compounds as long as: - the self._pseudo_inf entries in self.compound_costs are reduced - for any entry in self.compounds_cost a lower cost is found With each starting compound, one loops over compounds which have yet no cost assigned. For each start - target compound pair, the shortest path is determined employing Dijkstra’s algorithm. The weight function checks the
weight
of the edges as well as the costs of the required compounds listed in therequired_compounds
of the traversed edges. If the path exceeds the length of self._pseudo_inf, this path is not considered for further evaluation. The weight of the starting compound is added to the tmp_cost. If the target compound has no weight assigned yet incompound_costs
OR if the target compound has a weight assigned which is larger (incompound_costs
as well as intmp_compound_costs
) than the currenttmp_cost
is written to the temporary storage oftmp_compound_costs
.After the loop over all starting compounds completes, the collected costs for the found targets are written to
compound_costs
. The convergence variables are updated and the while loop continues.- Parameters:
- recursivebool
All compounds are checked for shorter paths, True by default. If set to False, compounds for which a cost has been determined are not checked in the next loop.
Notes
Checks if the start compounds are set.
Checks if the graph contains any nodes.
- export_compound_costs(filename='compound_costs.json')[source]¶
Export the compound cost dictionary to a .json file.
- Parameters:
- filenamestr, optional
Name of the file to write compound costs into, by default “compound_costs.json”
- export_graph(filename='graph.json')[source]¶
Export the graph without compound costs as dictionary to .json file.
- Parameters:
- filenamestr, optional
Name of the file to write graph into, by default “graph.json”.
- extract_connected_graph(included_nodes)[source]¶
Extract a connected subgraph from a given graph and a given list of nodes.
- Return type:
DiGraph
- Parameters:
- included_nodesList[str]
A list of nodes which should be included in the graph.
- Returns:
- selected_subgraphnx.DiGraph
The connected subgraph including the requested nodes.
- find_paths(source, target, n_requested_paths=3, n_skipped_paths=0)[source]¶
Query the build graph for simple paths between a source and target node.
- Return type:
- Parameters:
- sourcestr
The ID of the starting compound as string.
- targetstr
The ID of the targeted compound as string.
- n_requested_pathsint
Number of requested paths, by default 3
- n_skipped_pathsint
Number of skipped paths from, by default 0. For example, when four paths are found (
n_requested_paths=4
) andn_skipped_paths=2
, the third, fourth, fifth and sixth path are returned. Therefore, this allows setting the starting point of the query.
- Returns:
- found_pathsList[Tuple[List[str] float]]
List of paths where each item (path) consists of the list of nodes of the path and its length.
Notes
Requires a built graph
- find_unique_paths(source, target, number=3, custom_weight='weight')[source]¶
Find a unique number of paths from a given source node to a given target node. Paths can have the same total length (in terms of sum over edge weights), but if one is solely interested in one path of paths with identical length, the shortest (in terms of length) longest (in terms of number of nodes) path is returned. This is called the unique path (shortest longest path).
- Return type:
- Parameters:
- sourcestr
The ID of the starting compound as string.
- targetstr
The ID of the targeted compound as string.
- numberint
The number of unique paths to be returned. Per default, 3 paths are returned.
- Returns:
- path_tuple_listList[Tuple[List[str], float]]
List of paths where each item (path) consists the list of nodes of the path and its length.
Notes
- Checks if a stored iterator for the given source-target pair should be used.- Maximal ten paths with identical length are compared.
- get_elementary_step_sequence(path)[source]¶
Prints the sequence of elementary steps of a path with the compounds written as molecular formulas with multiplicity and charge as well as the final cost of the path. Reactant node is returned in red, product node in blue to enhance readability.
- Return type:
- Parameters:
- pathTuple[List[str] float]
Path containing a list of the traversed nodes and the cost of this path.
- Returns:
- str
A string of the elementary step sequence of a given path.
- get_overall_reactants(path)[source]¶
Summarize the overall reactants of a given path. Count the appearance of compounds in a reaction, -1 for reactants and +1 for products.
- get_overall_reaction_equation(path)[source]¶
Summarize a given path to a reaction equation and return its string. Count the appearance of compounds in a reaction, -1 for reactants and +1 for products. Returns the factor and the compound as a molecular formula.
- Return type:
- Parameters:
- pathList[str]
Path containing a list of the traversed nodes.
- Returns:
- str
A string of the overall reaction equation of a given path.
- load_graph(graph_filename, compound_cost_filename='')[source]¶
Initialize a basic graph handler with default settings. The graph is imported from the given file and set as the graph of the graph handler. Optionally, the compound costs are imported and set as compound_cost. The compound costs are considered to be solved. The graph is automatically updated with the compound costs.
- Parameters:
- graph_filenamestr
Name of the .json file containing the graph.
- compound_cost_filenamestr
Name of the .json file containing the compound costs.
- reset_graph_compound_costs()[source]¶
Reset the ‘weight’ of edges from compound to reaction nodes by subtracting the required compound costs. Allows to re-calculate the compound costs under different starting conditions.
Notes
Checks if the compound costs have successfully been determined.
- set_start_conditions(conditions)[source]¶
Add the IDs of the start compounds to self.start_compounds and add entries for cost in self.compound_cost.
- Parameters:
- conditionsDict[str float]
The IDs of the compounds as keys and its given cost as values.
- update_graph_compound_costs()[source]¶
Update the ‘weight’ of edges from compound to reaction nodes by adding the compound costs. The compound costs are the sum over the costs stored in self.compound_costs of the required compounds. The edges of the resulting graph contain a weight including the required_compound_costs based on the starting conditions. All analysis of the graph therefore depend on the starting conditions.
Notes
Checks if the compound costs have successfully been determined.
Checks if the graph has been updated with the compound costs.