Central class for unified IUPAC-like ranking of organic and inorganic structures. More...
#include <RankingTree.h>
Data Structures | |
struct | EdgeData |
Data class that sets which supplementary data is stored for a tree edge. More... | |
struct | JunctionInfo |
Data class to store junction vertex and paths from the source vertices. More... | |
struct | VertexData |
Data class that sets which supplementary data is stored for a tree vertex. More... | |
Public Types | |
Member types | |
enum | ExpansionOption { ExpansionOption::OnlyRequiredBranches, ExpansionOption::Full } |
using | BglType = boost::adjacency_list< boost::setS, boost::vecS, boost::bidirectionalS, VertexData, EdgeData > |
The BGL Graph type used to store the tree. | |
Public Member Functions | |
Special member functions | |
RankingTree (const Graph &graph, const StereopermutatorList &stereopermutators, std::string molGraphviz, AtomIndex atomToRank, const std::vector< AtomIndex > &excludeIndices={}, ExpansionOption expansionMethod=ExpansionOption::OnlyRequiredBranches, const boost::optional< AngstromPositions > &positionsOption=boost::none) | |
Performs ranking of a central atom's substituents. More... | |
Information | |
std::vector< std::vector < AtomIndex > > | getRanked () const |
std::string | dumpGraphviz (const std::string &title="", const std::unordered_set< TreeVertexIndex > &squareVertices={}, const std::unordered_set< TreeVertexIndex > &colorVertices={}, const std::set< TreeEdgeIndex > &colorEdges={}) const |
Private Types | |
using | TreeVertexIndex = BglType::vertex_descriptor |
Type of tree vertex accessor. | |
using | TreeEdgeIndex = BglType::edge_descriptor |
Type of tree edge accessor. | |
using | VariantType = boost::variant< TreeVertexIndex, TreeEdgeIndex > |
Variant type of both. | |
Private Member Functions | |
TreeVertexIndex | parent_ (const TreeVertexIndex &index) const |
Returns the parent of a node. Fails if called on the root! | |
std::vector< TreeVertexIndex > | adjacents_ (TreeVertexIndex index) const |
Returns an unordered list of all adjacent vertices (in- and out-adjacents) | |
unsigned | adjacentTerminalHydrogens_ (const TreeVertexIndex &index) const |
Counts the number of terminal hydrogens on out-edges of the specified index. | |
bool | isBondSplitDuplicateVertex_ (const TreeVertexIndex &index) const |
Checks whether a tree index is the result of a bond split. | |
bool | isCycleClosureDuplicateVertex_ (const TreeVertexIndex &index) const |
Checks whether a tree index is the result of a cycle closure. | |
std::vector< TreeVertexIndex > | auxiliaryAdjacentsToRank_ (TreeVertexIndex sourceIndex, const std::vector< TreeVertexIndex > &excludeIndices) const |
template<typename ComparisonSets > | |
bool | notEmpty_ (const ComparisonSets &comparisonSets) const |
Returns whether any of the comparison multisets has an entry. | |
unsigned | nonDuplicateDegree_ (const TreeVertexIndex &index) const |
std::vector< TreeVertexIndex > | addBondOrderDuplicates_ (const TreeVertexIndex &treeSource, const TreeVertexIndex &treeTarget) |
std::unordered_set < TreeVertexIndex > | treeIndicesInBranch_ (TreeVertexIndex index) const |
Returns all tree indices in the branch from the specified index up to root. | |
std::unordered_set< AtomIndex > | molIndicesInBranch_ (TreeVertexIndex index) const |
Returns all mol indices in the branch from the specified index up to root. | |
unsigned | duplicateDepth_ (TreeVertexIndex index) const |
Returns a depth measure of a duplicate vertex for sequence rule 1. | |
unsigned | depthOfNode_ (TreeVertexIndex index) const |
Returns the depth of a node in the tree. | |
unsigned | mixedDepth_ (const TreeVertexIndex &vertexIndex) const |
Returns a mixed depth measure for ranking both vertices and edges. | |
unsigned | mixedDepth_ (const TreeEdgeIndex &edgeIndex) const |
Returns a mixed depth measure for ranking both vertices and edges. | |
JunctionInfo | junction_ (const TreeVertexIndex &a, const TreeVertexIndex &b) const |
bool | molIndexExistsInBranch_ (AtomIndex molIndex, TreeVertexIndex treeIndex) const |
Returns whether a molecular graph index exists in a specific branch. | |
std::vector< TreeVertexIndex > | expand_ (const TreeVertexIndex &index, const std::unordered_set< AtomIndex > &molIndicesInBranch) |
template<typename SetValueType , typename ComparatorType > | |
bool | multisetCompare_ (const std::multiset< SetValueType, ComparatorType > &a, const std::multiset< SetValueType, ComparatorType > &b) const |
template<typename SetValueType , typename ComparatorType > | |
void | compareBFSSets_ (const std::map< TreeVertexIndex, std::multiset< SetValueType, ComparatorType > > &comparisonSets, const std::vector< std::vector< TreeVertexIndex > > &undecidedSets, OrderDiscoveryHelper< TreeVertexIndex > &orderingHelper) const |
std::string | toString (const TreeEdgeIndex &edge) const |
std::string | toString (const VariantType &vertexOrEdge) const |
template<unsigned ruleNumber, bool bfsDownOnly, bool insertEdges, bool insertVertices, typename MultisetValueType , typename MultisetComparatorType > | |
void | runBFS_ (TreeVertexIndex sourceIndex, OrderDiscoveryHelper< TreeVertexIndex > &orderingHelper, const boost::optional< unsigned > &depthLimitOptional=boost::none) const |
template<typename ValueType , typename ComparatorType > | |
std::string | makeBFSStateGraph_ (const std::string &title, const TreeVertexIndex &base, const std::map< TreeVertexIndex, std::multiset< ValueType, ComparatorType > > &comparisonSet, const std::vector< std::vector< TreeVertexIndex > > &undecidedSets) const |
Creates a graphviz representation of the BFS state. | |
std::string | _make4BGraph (const TreeVertexIndex &sourceIndex, const std::map< TreeVertexIndex, std::set< VariantType > > &representativeStereodescriptors, const TreeVertexIndex &branchA, const TreeVertexIndex &branchB, const std::vector< std::vector< VariantType > > &branchAOrders, const std::vector< std::vector< VariantType > > &branchBOrders, const std::vector< std::vector< VariantType > >::reverse_iterator &branchAIter, const std::vector< std::vector< VariantType > >::reverse_iterator &branchBIter) const |
Creates graphviz representation of like/unlike pairing in sequence rule 4B. | |
std::vector< std::vector < AtomIndex > > | mapToAtomIndices_ (const std::vector< std::vector< TreeVertexIndex > > &treeRankingSets) const |
Maps sets returned by an OrderDiscoveryHelper from tree indices to atom indices. | |
void | applySequenceRules_ (const boost::optional< AngstromPositions > &positionsOption) |
std::vector< std::vector < TreeVertexIndex > > | auxiliaryApplySequenceRules_ (TreeVertexIndex sourceIndex, const std::vector< TreeVertexIndex > &adjacentsToRank, const boost::optional< unsigned > &depthLimitOptional=boost::none) const |
Static Private Member Functions | |
static std::string | adaptMolGraph_ (std::string molGraph) |
static void | writeGraphvizFiles_ (const std::vector< std::string > &graphvizStrings) |
Writes graphviz log files from graph strings. | |
static std::string | toString (TreeVertexIndex vertex) |
static std::unordered_set < TreeVertexIndex > | collectSeeds_ (const std::map< TreeVertexIndex, std::vector< TreeVertexIndex > > &seeds, const std::vector< std::vector< TreeVertexIndex > > &undecidedSets) |
static bool | relevantSeeds_ (const std::map< TreeVertexIndex, std::vector< TreeVertexIndex > > &seeds, const std::vector< std::vector< TreeVertexIndex > > &undecidedSets) |
Private Attributes | |
BglType | tree_ |
The BGL Graph representing the acyclic tree. | |
OrderDiscoveryHelper < TreeVertexIndex > | branchOrderingHelper_ |
The helper instance for discovering the ordering of the to-rank branches. | |
const Graph & | graph_ |
const StereopermutatorList & | stereopermutatorsRef_ |
const std::string | adaptedMolGraphviz_ |
Static Private Attributes | |
static unsigned | debugMessageCounter_ |
static constexpr TreeVertexIndex | rootIndex = 0 |
A readbility-improving constexpr replacement for the root index 0 in code. | |
Central class for unified IUPAC-like ranking of organic and inorganic structures.
The general procedure is that any cycles are broken down in a BFS iteration through the molecular graph at the atom vertex of instantiation. Afterwards, all branches are expanded in a DFS-like manner to ensure that cycle closures are properly reported as duplicate atoms. In the rank() member function, the actual ranking as demanded by the IUPAC sequence rules (adapted for the mixed inorganic/organic molecular graphs) is performed.
Option deciding in what manner the ranking tree is expanded
The optimized expansion only expands positions that are needed for ranking while applying sequence rule 1 immediately. Full expansion expands the entire tree indiscriminately of whether the information will ever be needed prior to applying any sequence rules.
Enumerator | |
---|---|
OnlyRequiredBranches |
Expand the tree only as needed. |
Full |
Exhaustively expand all vertices until completion before ranking. |
Scine::Molassembler::RankingTree::RankingTree | ( | const Graph & | graph, |
const StereopermutatorList & | stereopermutators, | ||
std::string | molGraphviz, | ||
AtomIndex | atomToRank, | ||
const std::vector< AtomIndex > & | excludeIndices = {} , |
||
ExpansionOption | expansionMethod = ExpansionOption::OnlyRequiredBranches , |
||
const boost::optional< AngstromPositions > & | positionsOption = boost::none |
||
) |
Performs ranking of a central atom's substituents.
Complexity Theoretical complexity is unclear to me. No idea if the IUPAC sequence rules imply an upper bound on theoretical complexity. Given a sufficiently complicated molecule, a single ranking can be immensely complicated and time intensive. For practical cases however, ranking is constant-time.
|
staticprivate |
Modifies the graphviz molgraph into a digraph so that it can be combined with all the other digraphs using gvpack
|
private |
Adds the required duplicate atoms for bonds of high bond orders
This adds the appropriate amount of duplicate atoms to the source and target vertices according to the bond order (if integral). Returns the new duplicate tree vertex indices of duplicate nodes added to the source vertex only, not those added to the target vertex.
|
private |
Ranks the direct substituents of the root atom, applying sequence rules 2-5
This function ranks the direct substituents of the atom this RankingTree was instantiated upon by the sequential application of the 2013 IUPAC Blue Book sequence rules 2-5 (the ctor applies #1). They are somewhat adapted since the priority of asymmetric centers of higher (and also lower) shape must also be considered because transition metal chemistry is also included in this library.
|
private |
Determines which tree indices are to be ranked
Determines which tree indices are to be ranked based on the central index and a list of index excludes
|
private |
This function ranks the direct substituents of a selected central tree vertex by the sequential application of the 2013 IUPAC Blue Book sequence rules. They are somewhat adapted since the priority of asymmetric centers of higher (and also lower) shape must also be considered because transition metal chemistry is also included in this library.
It returns a sorted vector of vectors, in which every sub-vector represents a set of equal-priority tree vertices. The sorting is ascending, meaning from lowest priority to highest priority.
The main differences to rank() are that all sequence rule BFS progressions can begin at any vertex in the tree and therefore must consider both in- and out-edges at every vertex. Additionally, this function assumes that any auxiliary stereodescriptors have already been instantiated, while rank() explicitly instantiates them prior to the application of sequence rule three.
sourceIndex | The tree vertex from which to rank substituents |
adjacentsToRank | A list of adjacent tree vertices to rank |
depthLimitOptional | An optional limitation on depth of sequence rule application |
|
inlineprivate |
Compares all possible pairs of multisets of yet undecided branches and adds less-than relationships to the orderingHelper in case new relationships are discovered.
NOTE: This is a common pattern in the application of sequence rules, although the multiset value type and the applied comparator may vary, so it is extracted here.
SetValueType | The multisets may contain different value types depending on the sequence rule being applied, but the semantics of how they are compared are always the same, so we abstract over this type. |
ComparatorType | The multisets may have differing comparators depending on the sequence rule being applied, but the semantics of how they are compared are always the same, so we abstract over this type. |
comparisonSets | A mapping from branch indices that are ranked in the OrderDiscoveryHelper to multisets containing values that establish relative priority in the sequence rule currently being applied |
undecidedSets | This is a const reference to an up-to-date result value of the OrderDiscoveryHelper's getUndecidedSets() function. This is useful since undecidedSets is a commonly required variable in the application of sequence rules and can therefore be used across all of them |
orderingHelper | This is a reference to the OrderDiscoveryHelper currrently in use to rank branches of the acyclic tree. This function adds new relationships (if discovered) to this reference. |
std::string Scine::Molassembler::RankingTree::dumpGraphviz | ( | const std::string & | title = "" , |
const std::unordered_set< TreeVertexIndex > & | squareVertices = {} , |
||
const std::unordered_set< TreeVertexIndex > & | colorVertices = {} , |
||
const std::set< TreeEdgeIndex > & | colorEdges = {} |
||
) | const |
Returns an annotated graphviz graph of the tree
Creates a graphviz representation of the tree, with optional title string, and sets of vertices represented as squares or colored in.
|
private |
Adds missing adjacents to vertex, returning new and existing tree children
Expands a tree node by checking the existing node children plus its parent and comparing this against the adjacent indices from the molecule. Returns all new and existing child tree vertex indices of the newly expanded node. Pre-expansion existing children may come about due to the addition of multiple bond order duplicate atoms.
std::vector< std::vector<AtomIndex> > Scine::Molassembler::RankingTree::getRanked | ( | ) | const |
Fetches the ranked result
Returns the ranked result as a sorted vector of vectors, in which every sub-vector represents a set of equal-priority substituents. The sorting is ascending, meaning from lowest priority to highest priority.
|
private |
Returns the deepest vertex in the tree in whose child branches both a and b are located
|
inlineprivate |
Since multiset's operator < does not actually USE the custom comparator when comparing the contained values, we have to call lexicographical_compare with a supplied comparator!
|
private |
Returns the node degree, excluding edges to or from nodes marked as duplicate
|
staticprivate |
In all BFS-like iterations, we need to check that there are suitable seeds to continue the BFS expansion for all branches that are yet undifferentiated. This is required in pretty much every sequence rule application, so it is generalized here.
seeds | A mapping from branch indices that are being ranked to a set of tree vertices that are suitable continuations of the current BFS iteration. |
undecidedSets | This is a const reference to an up-to-date result value of the OrderDiscoveryHelper's getUndecidedSets() function. This is useful since undecidedSets is a commonly required variable in the application of sequence rules and can therefore be used across all of them |
|
inlineprivate |
Performs a BFS traversal through the tree, performing ranking of a single rule
Performs a BFS traversal through the tree, adding encountered edges and/or vertices into a multiset with a custom comparator, and comparing those multisets at every iteration to discover ordering relations between the corresponding original branches.
ruleNumber | For logging purposes, indicate which sequence rule is being tested with this BFS traversal |
BfsDownOnly | If set true, only out-edges of any BFS seeds are considered for the multisets and seed continuations. This has the effect that the traversal is unidirectional, going "down" from the root of the tree, which is drawn at the top of the graphical representation (see dumpGraphviz). If set false, in- and out-edges are considered for BFS continuations, so that BFS proceeds in all directions simultaneously from every seed starting at the source index. |
insertEdges | If set true, edge indices are inserted into the multiset. |
insertVertices | If set true, vertex indices are inserted into the multiset. |
MultisetValueType | The value type of the multiset used for comparison. |
MultisetComparatorType | The Comparator supplied as constructing argument for the multiset and any lexicographical comparisons involving multiple multisets. Here, the edge or vertex properties being compared in the current sequence rule must be encoded. |
|
staticprivate |
For properly naming all the log files emitted in case the appropriate log particular is set.