scine_chemoton.gears.pathfinder

Classes

Pathfinder(db_manager)

A class to represent a list of reactions as a graph and query this graph for simple paths between two nodes.

Exceptions

DifferentSubgraphsError

exception scine_chemoton.gears.pathfinder.DifferentSubgraphsError[source]
args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class scine_chemoton.gears.pathfinder.Pathfinder(db_manager)[source]

A class to represent a list of reactions as a graph and query this graph for simple paths between two nodes. In a simple path, every node part of the path is visited only once.

Attributes:
_calculationsdb.Collection

Collection of the calculations of the connected database.

_compoundsdb.Collection

Collection of the compounds of the connected database.

_flasksdb.Collection

Collection of the flasks of the connected database.

_reactionsdb.Collection

Collection of the reactions of the connected database.

_elementary_stepsdb.Collection

Collection of the elementary steps of the connected database.

_structuresdb.Collection

Collection of the structures of the connected database.

_propertiesdb.Collection

Collection of the properties of the connected database.

graph_handler

A class handling the construction of the graph. Can be adapted to one’s needs.

_use_old_iteratorbool

Bool to indicate if the old iterator shall be used querying for paths between a source - target pair.

_unique_iterator_memoryTuple[Tuple[List[str], float], Iterator]

Memory of iterator with the corresponding path and its length as well as the iterator.

start_compoundsList[str]

A list containing the compounds which are present at the start.

start_compounds_setbool

Bool to indicate if start_compounds are set.

_pseudo_inffloat

Float for edges with infinite weight.

compound_costsDict[str, float]

A dictionary containing the cost of the compounds with the compounds as keys.

compound_costs_solvedbool

Bool to indicate if all compounds have a compound cost.

class BarrierBasedHandler(db_manager, model, structure_model=<scine_database.Model object>)[source]

A class derived from the basic graph handler class to encode the reaction barrier information in the edges. The barriers of the elementary step with the minimal TS energy of a reaction are employed. The barriers are converted to rate constants, normalized over all rate constants in the graph and then the cost function \(|log(normalized\ rate\ constant)|\) is applied to obtain the weight.

Attributes:
temperaturefloat

The temperature for calculating the rate constants from the barriers. Default is 298.15 K.

_rate_constant_normalizationfloat

The factor to normalize the rate constant.

add_reaction(reaction)

Add a reaction to the graph. Each reaction node represents the LHS and RHS. Hence every reagent of a reaction is connected to every product of a reaction via one reaction node.

For instance:

A + B = C + D, reaction R
A -> R1 -> C
A -> R1 -> D
B -> R1 -> C
B -> R1 -> D

C -> R2 -> A
C -> R2 -> B
D -> R2 -> A
D -> R2 -> B

Representing this reaction in the graph yields 4 compound nodes, 2 reaction nodes (same reaction) and 16 edges (2*2*2*2). The weights assigned to the edges depends on the _get_weight implementation.

The edges from a compound node to a reaction node contain several pieces of information:

weight: the weight of the edge 1 if the reaction is not barrierless, otherwise it is set to self.barrierless_weight required_compounds: the IDs of the other reagents of this side of the reaction in a list required_compound_costs: the sum over all compound costs of the compounds in the required_compounds list None by default

The edges from a reaction node to a compound node contain several information:

weight: the weight of the edge, set to 0 required_compounds: the IDs of the other products emerging; added for easier information extraction during the path analysis

Parameters:
reactiondb.Reaction

The reaction to be added to the graph.

get_allowed_reaction_sides(reaction_id)
Return type:

Side

get_barrier_limit()[source]
Return type:

float

get_temperature()[source]

Gets the set temperature.

Returns:
self.temperaturefloat

The set temperature.

get_valid_reaction_ids()[source]

Basic filter function for reactions. Per default, it returns all reactions.

Returns:
List[db.ID]

List of IDs of the filtered reactions.

initialize_collections(manager)
Return type:

None

static possible_attributes()
Return type:

List[str]

set_barrier_limit(barrier_limit)[source]
Return type:

None

set_temperature(temperature)[source]

Setting the temperature for determining the rate constants.

Parameters:
temperaturefloat

The temperature in Kelvin.

unset_collections()
Return type:

None

class BasicHandler(manager, model, structure_model)[source]

A basic class to handle the construction of the nx.DiGraph. A list of reactions can be added differently, depending on the implementation of _get_weight and get_valid_reaction_ids.

Attributes:
graphnx.DiGraph

The directed graph.

barrierless_weightfloat

The weight to be set for barrierless reactions.

modeldb.Model

A model for filtering the valid reactions. Per default (“any”, “any”, “any”), reactions are included regardless of the model.

filter_negative_barriersbool

If True, reactions with negative barriers are filtered out.

use_structure_modelbool

If True, the structure model is used to filter out reactions.

structure_modeldb.Model

A model for filtering the valid reactions. Per default (“any”, “any”, “any”), both structure and energy evaluations are based on the same model.

use_only_enabled_aggregatesbool

If True, only enabled aggregates are used.

add_reaction(reaction)[source]

Add a reaction to the graph. Each reaction node represents the LHS and RHS. Hence every reagent of a reaction is connected to every product of a reaction via one reaction node.

For instance:

A + B = C + D, reaction R
A -> R1 -> C
A -> R1 -> D
B -> R1 -> C
B -> R1 -> D

C -> R2 -> A
C -> R2 -> B
D -> R2 -> A
D -> R2 -> B

Representing this reaction in the graph yields 4 compound nodes, 2 reaction nodes (same reaction) and 16 edges (2*2*2*2). The weights assigned to the edges depends on the _get_weight implementation.

The edges from a compound node to a reaction node contain several pieces of information:

weight: the weight of the edge 1 if the reaction is not barrierless, otherwise it is set to self.barrierless_weight required_compounds: the IDs of the other reagents of this side of the reaction in a list required_compound_costs: the sum over all compound costs of the compounds in the required_compounds list None by default

The edges from a reaction node to a compound node contain several information:

weight: the weight of the edge, set to 0 required_compounds: the IDs of the other products emerging; added for easier information extraction during the path analysis

Parameters:
reactiondb.Reaction

The reaction to be added to the graph.

get_allowed_reaction_sides(reaction_id)[source]
Return type:

Side

get_valid_reaction_ids()[source]

Basic filter function for reactions. Per default, it returns all reactions.

Return type:

List[ID]

Returns:
List[db.ID]

List of IDs of the filtered reactions.

initialize_collections(manager)
Return type:

None

static possible_attributes()
Return type:

List[str]

unset_collections()
Return type:

None

class Options[source]

A class to vary the setup of Pathfinder.

barrier_limit: float
float

The maximum barrier for elementary steps to be included in the graph. Only valid with ‘barrier’ graph handler

barrierless_weight: float
float

The weight for barrierless reactions (basic) and rate constant (barrier), respectively.

filter_negative_barriers: bool
bool

Forbid elementary steps with negative barriers or not.

graph_handler: str
str

A string indicating which graph handler shall be used (available are : ‘basic’ and ‘barrier’).

model: Model
db.Model

The model for the energies of compounds to be included in the graph.

structure_model: Model
db.Model

The model for the structures of compounds to be included in the graph.

temperature: float
float

The temperature in Kelvin for the rate constant calculation.

unset_collections()

Duplicate name to HoldCollections method to be triggered in pickling process, so infinite _parent loops are avoided.

Return type:

None

use_only_enabled_aggregates: bool
bool

Allow only elementary steps with a given model.

use_structure_model: bool
bool

Allow only elementary steps with a given model.

build_graph()[source]

Build the nx.DiGraph() from a list of filtered reactions.

calculate_compound_costs(recursive=True)[source]

Determine the cost for all compounds via determining their shortest paths from the start_compounds. If this succeeds, set compound_costs_solved to True. Otherwise it stays False.

The algorithm works as follows: Given the starting conditions, one loops over the individual starting compounds as long as: - the self._pseudo_inf entries in self.compound_costs are reduced - for any entry in self.compounds_cost a lower cost is found With each starting compound, one loops over compounds which have yet no cost assigned. For each start - target compound pair, the shortest path is determined employing Dijkstra’s algorithm. The weight function checks the weight of the edges as well as the costs of the required compounds listed in the required_compounds of the traversed edges. If the path exceeds the length of self._pseudo_inf, this path is not considered for further evaluation. The weight of the starting compound is added to the tmp_cost. If the target compound has no weight assigned yet in compound_costs OR if the target compound has a weight assigned which is larger (in compound_costs as well as in tmp_compound_costs) than the current tmp_cost is written to the temporary storage of tmp_compound_costs.

After the loop over all starting compounds completes, the collected costs for the found targets are written to compound_costs. The convergence variables are updated and the while loop continues.

Parameters:
recursivebool

All compounds are checked for shorter paths, True by default. If set to False, compounds for which a cost has been determined are not checked in the next loop.

Notes

  • Checks if the start compounds are set.

  • Checks if the graph contains any nodes.

export_compound_costs(filename='compound_costs.json')[source]

Export the compound cost dictionary to a .json file.

Parameters:
filenamestr, optional

Name of the file to write compound costs into, by default “compound_costs.json”

export_graph(filename='graph.json')[source]

Export the graph without compound costs as dictionary to .json file.

Parameters:
filenamestr, optional

Name of the file to write graph into, by default “graph.json”.

extract_connected_graph(included_nodes)[source]

Extract a connected subgraph from a given graph and a given list of nodes.

Return type:

DiGraph

Parameters:
included_nodesList[str]

A list of nodes which should be included in the graph.

Returns:
selected_subgraphnx.DiGraph

The connected subgraph including the requested nodes.

find_paths(source, target, n_requested_paths=3, n_skipped_paths=0)[source]

Query the build graph for simple paths between a source and target node.

Return type:

List[Tuple[List[str], float]]

Parameters:
sourcestr

The ID of the starting compound as string.

targetstr

The ID of the targeted compound as string.

n_requested_pathsint

Number of requested paths, by default 3

n_skipped_pathsint

Number of skipped paths from, by default 0. For example, when four paths are found (n_requested_paths=4) and n_skipped_paths=2, the third, fourth, fifth and sixth path are returned. Therefore, this allows setting the starting point of the query.

Returns:
found_pathsList[Tuple[List[str] float]]

List of paths where each item (path) consists of the list of nodes of the path and its length.

Notes

Requires a built graph

find_unique_paths(source, target, number=3, custom_weight='weight')[source]

Find a unique number of paths from a given source node to a given target node. Paths can have the same total length (in terms of sum over edge weights), but if one is solely interested in one path of paths with identical length, the shortest (in terms of length) longest (in terms of number of nodes) path is returned. This is called the unique path (shortest longest path).

Return type:

List[Tuple[List[str], float]]

Parameters:
sourcestr

The ID of the starting compound as string.

targetstr

The ID of the targeted compound as string.

numberint

The number of unique paths to be returned. Per default, 3 paths are returned.

Returns:
path_tuple_listList[Tuple[List[str], float]]

List of paths where each item (path) consists the list of nodes of the path and its length.

Notes

- Checks if a stored iterator for the given source-target pair should be used.
- Maximal ten paths with identical length are compared.
get_elementary_step_sequence(path)[source]

Prints the sequence of elementary steps of a path with the compounds written as molecular formulas with multiplicity and charge as well as the final cost of the path. Reactant node is returned in red, product node in blue to enhance readability.

Return type:

str

Parameters:
pathTuple[List[str] float]

Path containing a list of the traversed nodes and the cost of this path.

Returns:
str

A string of the elementary step sequence of a given path.

get_overall_reactants(path)[source]

Summarize the overall reactants of a given path. Count the appearance of compounds in a reaction, -1 for reactants and +1 for products.

Return type:

List[List[Tuple[str, float]]]

Parameters:
pathList[str]

Path containing a list of the traversed nodes.

Returns:
: List[List[Tuple[str, float]]]

A tuple containing the reactants and products of a given path as list of tuples consisting of the aggregate ID and its factor.

get_overall_reaction_equation(path)[source]

Summarize a given path to a reaction equation and return its string. Count the appearance of compounds in a reaction, -1 for reactants and +1 for products. Returns the factor and the compound as a molecular formula.

Return type:

str

Parameters:
pathList[str]

Path containing a list of the traversed nodes.

Returns:
str

A string of the overall reaction equation of a given path.

static get_valid_graph_handler_options()[source]
Return type:

List[str]

initialize_collections(manager)
Return type:

None

load_graph(graph_filename, compound_cost_filename='')[source]

Initialize a basic graph handler with default settings. The graph is imported from the given file and set as the graph of the graph handler. Optionally, the compound costs are imported and set as compound_cost. The compound costs are considered to be solved. The graph is automatically updated with the compound costs.

Parameters:
graph_filenamestr

Name of the .json file containing the graph.

compound_cost_filenamestr

Name of the .json file containing the compound costs.

static possible_attributes()
Return type:

List[str]

reset_graph_compound_costs()[source]

Reset the ‘weight’ of edges from compound to reaction nodes by subtracting the required compound costs. Allows to re-calculate the compound costs under different starting conditions.

Notes

  • Checks if the compound costs have successfully been determined.

set_start_conditions(conditions)[source]

Add the IDs of the start compounds to self.start_compounds and add entries for cost in self.compound_cost.

Parameters:
conditionsDict[str float]

The IDs of the compounds as keys and its given cost as values.

unset_collections()
Return type:

None

update_graph_compound_costs()[source]

Update the ‘weight’ of edges from compound to reaction nodes by adding the compound costs. The compound costs are the sum over the costs stored in self.compound_costs of the required compounds. The edges of the resulting graph contain a weight including the required_compound_costs based on the starting conditions. All analysis of the graph therefore depend on the starting conditions.

Notes

  • Checks if the compound costs have successfully been determined.

  • Checks if the graph has been updated with the compound costs.