Molassembler  1.0.0
Molecule graph and conformer library
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph Class Reference

Simulates a graph from which triangle inequality bounds can be calculated by shortest-paths. More...

#include <ImplicitBoundsGraph.h>

Data Structures

class  edge_iterator
 An iterator to enumerate all edges in a graph. More...
 
struct  EdgeWeightMap
 A helper struct permitting read-access to an edge weight via an edge descriptor. More...
 
class  in_group_edge_iterator
 An iterator to enumerate only edges to the same part of the graph from specific vertices. More...
 
class  vertex_iterator
 A random access iterator through all vertex descriptors of the graph. More...
 
struct  VertexIndexMap
 A helper struct to turn vertex descriptors into numeric indices. More...
 

Public Types

using VertexDescriptor = unsigned long
 
using EdgeDescriptor = std::pair< VertexDescriptor, VertexDescriptor >
 
using BoundsMatrix = Eigen::MatrixXd
 

Public Member Functions

 ImplicitBoundsGraph (const PrivateGraph &inner, BoundsMatrix bounds)
 
VertexDescriptor num_vertices () const
 Returns the number of vertices simulated by the graph, which is 2N. More...
 
VertexDescriptor num_edges () const
 Returns the number of edges currently simulated by the graph. More...
 
std::pair< EdgeDescriptor, bool > edge (VertexDescriptor i, VertexDescriptor j) const
 Fetches an edge descriptor for a speculative edge from i to j. More...
 
bool hasExplicit (const EdgeDescriptor &edge) const
 Checks if there is explicit information present for a left-to-right edge. More...
 
outcome::result< Eigen::MatrixXd > makeDistanceBounds () const noexcept
 Generates a bare distance bounds matrix by repeated shortest-paths calculations. More...
 
outcome::result< Eigen::MatrixXd > makeDistanceMatrix (Random::Engine &engine) noexcept
 Generates a distance matrix by randomly fixing distances within triangle inequality bounds. More...
 
outcome::result< Eigen::MatrixXd > makeDistanceMatrix (Random::Engine &engine, Partiality partiality) noexcept
 
VertexDescriptor source (const EdgeDescriptor &e) const
 Returns the source vertex from an edge descriptor. More...
 
VertexDescriptor target (const EdgeDescriptor &e) const
 Returns the target vertex from an edge descriptor. More...
 
const Eigen::MatrixXd & getMatrix () const
 Allows const-ref access to the underlying matrix for debug purposes. More...
 
VertexDescriptor out_degree (VertexDescriptor i) const
 Returns the number of out-edges for a particular vertex. More...
 
double & lowerBound (VertexDescriptor a, VertexDescriptor b)
 
double & upperBound (VertexDescriptor a, VertexDescriptor b)
 
double lowerBound (VertexDescriptor a, VertexDescriptor b) const
 
double upperBound (VertexDescriptor a, VertexDescriptor b) const
 
double maximalImplicitLowerBound (VertexDescriptor i) const
 Returns the length of the maximal implicit lower bound outgoing from a left vertex. More...
 
EdgeWeightMap getEdgeWeightPropertyMap () const
 Returns an instance of EdgeWeightMap.
 
vertex_iterator vbegin () const
 Returns a begin-iterator for vertices.
 
vertex_iterator vend () const
 Returns an end-iterator for vertices.
 
edge_iterator ebegin () const
 Returns a begin edge iterator.
 
edge_iterator eend () const
 Returns an end edge iterator.
 
edge_iterator obegin (VertexDescriptor i) const
 Returns a begin edge iterator for the out-edges of a specific vertex.
 
edge_iterator oend (VertexDescriptor i) const
 Returns an end edge iterator for the out-edges of a specific vertex.
 
in_group_edge_iterator in_group_edges_begin (const VertexDescriptor i) const
 
in_group_edge_iterator in_group_edges_end (const VertexDescriptor i) const
 

Static Public Member Functions

static VertexDescriptor left (const VertexDescriptor a) noexcept
 
static VertexDescriptor right (const VertexDescriptor a) noexcept
 
static VertexDescriptor internal (const VertexDescriptor i) noexcept
 
static bool isLeft (const VertexDescriptor i) noexcept
 Check if a vertex descriptor is part of the left subgraph. More...
 
static bool sameSide (const VertexDescriptor i, const VertexDescriptor j) noexcept
 

Static Private Member Functions

static void explainContradictionPaths_ (VertexDescriptor a, VertexDescriptor b, const std::vector< VertexDescriptor > &predecessors, const std::vector< double > &distances)
 

Private Attributes

const PrivateGraphinnerGraphPtr_
 Pointer to molecule from which the bounds come from.
 
std::array< Utils::ElementType, 2 > heaviestAtoms_
 Stores the two heaviest element types.
 
Eigen::MatrixXd distances_
 Dense adjacency matrix for O(1) access to fixed distances.
 

Detailed Description

Simulates a graph from which triangle inequality bounds can be calculated by shortest-paths.

Based off of pairwise bounds collected from SpatialModel, this class simulates a graph in which one-to-all triangle inequality bounds can be calculated by a single-source shortest paths calculation.

The general construction of the graph is loosely described in: Havel, Timothy F. "Distance geometry: Theory, algorithms, and chemical applications." Encyclopedia of Computational Chemistry 120 (1998): 723-742.

In essence, for every atomic index a ∈ [0, N), the graph contains two vertices, left(a) and right(a). For every pair of atomic indices a, b that have a distance bound, i.e. a lower and upper bound on their spatial distance within the current metric space, there are six directed edges in the graph:

  • Two edges (one bidirectional edge) between left(a) and left(b) with the edge weight of the upper bound
  • Two edges between right(a) and right(b) with the edge weight of the upper bound
  • A unidirectional edge from left(a) to right(b) with the edge weight of the lower bound, negated
  • A unidirectional edge from left(b) to right(a) with the edge weight of the lower bound, negated

If there is no explicit bounding information present for the pair a, b, then there are still two edges in the graph:

  • A unidirectional edge from left(a) to right(b) with the edge weight of the respective atom types' van-der-Walls radii summed
  • A unidirectional edge from left(b) to right(a) with the edge weight of the respective atom types' van-der-Walls radii summed

There are never edges from the right part of the graph to the left.

If a distance bound is narrowed to a fixed distance, this is represented as raising the lower bound to the fixed distance and lowering the upper bound to the fixed distance.

A lot of the implemented functions are not strictly necessary for the specialized shortest-paths algorithm that is used in makeDistanceMatrix(), which is the main purpose of the class, but exist to allow this graph to integrate with boost::graph in order to easily benchmark it with other algorithms like Bellman-Ford and compare it with ExplicitBoundsGraph, which uses a fully-explicit boost::graph class as its underlying datastructure.

Why is the implicit nature of the bonds on the basis of an underlying matrix so much faster than ExplicitBoundsGraph when it comes to generating distance matrices?

  • The GOR and Bellman-Ford shortest-paths algorithms have a complexity of O(V * E), where V is the number of vertices and E the number of edges. Since for every left vertex V, there are V-1 edges to the right side (whether explicit or implicit does not matter), E is on the order of V², so the entire shortest-paths algorithm is O(V³). Under certain conditions when on a left vertex during the shortest-paths calculation, one can choose to ignore all edges to the right side. While in an explicit BGL graph, this would entail getting the target vertex, comparing and incrementing the iterator, in this class, which merely simulates the existence of edges, the entire logic of an alternate iterator class that simulates only in-group (i.e. left-left and right-right) edges can be significantly reduced. This condition leads to a big reduction in the number of possible paths and thus heavily accelerates the shortest paths calculation.
  • Less memory use is perhaps also a factor. In the generation of distance bounds, ExplicitBoundsGraph requires memory for its underlying BGL graph and edge weight map, which are heavily redundant, plus an N² double matrix for storing the resulting distance matrix. ImplicitBoundsGraph requires only the N² double matrix.

Notes for reading the implementation:

  • a, b denote internal indices (molecule [0-N) interval)
  • i, j denote external indices (since graph simulates 2N vertices)
  • The functions left, right, isLeft and internal help to interconvert indices
  • If you encounter an expression like i % 2 == j % 2, this tests whether both indices are in the same part (left or right) of the graph.

Some advice on optimization opportunities:

Do not try to apply an algorithm that uses the existing shortest path information from a previous run to calculate the new shortest paths. There are already few shortest paths algorithms well suited to graphs with negative real edge weights. Dynamic algorithms are differentiated into whether they can deal with edge weight changes, vertex additions, vertex removals, etc. A fully dynamic algorithm can deal with any of those (albeit only a singular one at a time). A fully batch dynamic algorithm can deal with an arbitrary number of any type of graph changes. You need one of those, since fixing a distance in ImplicitBoundsGraph either entails changing six edge weights or changing one and adding five new edges.

These algorithms already have a really hard time getting to better bounds than recalculating the solution from scratch, which is where most effort has been poured into.

In the same vein, do not try to decide which parts of the shortest paths tree are volatile after change in order to avoid recalculating the entire solution. The idea here is to avoid recalculating the tree if the next a-b pair is in the stable part of the shortest-paths tree after the prior distance fixture. Creating an oracle to decide this is probably about as hard as creating a batch dynamic SSSP algorithm that can deal with negative edge weights, so: really hard.

Member Function Documentation

std::pair<EdgeDescriptor, bool> Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::edge ( VertexDescriptor  i,
VertexDescriptor  j 
) const

Fetches an edge descriptor for a speculative edge from i to j.

Fetches an edge descriptor for the edge from i to j, and whether there is such an edge. If there is no such edge (i.e. the pair's second boolean is false), then using the EdgeDescriptor in other functions results in undefined behavior.

Complexity \(\Theta(1)\)

const Eigen::MatrixXd& Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::getMatrix ( ) const
inline

Allows const-ref access to the underlying matrix for debug purposes.

Complexity \(\Theta(1)\)

bool Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::hasExplicit ( const EdgeDescriptor &  edge) const

Checks if there is explicit information present for a left-to-right edge.

Complexity \(\Theta(1)\)

static bool Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::isLeft ( const VertexDescriptor  i)
inlinestaticnoexcept

Check if a vertex descriptor is part of the left subgraph.

Complexity \(\Theta(1)\)

outcome::result<Eigen::MatrixXd> Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::makeDistanceBounds ( ) const
noexcept

Generates a bare distance bounds matrix by repeated shortest-paths calculations.

This function generates a matrix representing distance bounds, NOT a DistanceBoundsMatrix.

Complexity: O(N * O(shortest paths algorithm)) Complexity \(\Theta(V^2 \cdot E)\)

outcome::result<Eigen::MatrixXd> Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::makeDistanceMatrix ( Random::Engine engine)
noexcept

Generates a distance matrix by randomly fixing distances within triangle inequality bounds.

This function generates a distance matrix. For every pair of atomic indices a, b (traversed at random), a shortest-paths calculation is carried out to determine the triangle inequality limits between a and b. A random distance is chosen between these limits and added to the underlying graph.

The generation procedure is destructive, i.e. the same ImplicitBoundsGraph cannot be reused to generate multiple distance matrices.

Complexity \(\Theta(V^3 \cdot E)\). Benchmarked by analysis/BenchmarkGraphAlgorithms

NOTE: This double definition may seem strange, but is necessary to use the forward-declared enum class Partiality correctly.

outcome::result<Eigen::MatrixXd> Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::makeDistanceMatrix ( Random::Engine engine,
Partiality  partiality 
)
noexcept

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

double Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::maximalImplicitLowerBound ( VertexDescriptor  i) const

Returns the length of the maximal implicit lower bound outgoing from a left vertex.

Complexity \(\Theta(1)\)

VertexDescriptor Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::num_edges ( ) const

Returns the number of edges currently simulated by the graph.

Returns the number of edges currently simulated by the graph.

Complexity \(\Theta(N^2)\)

VertexDescriptor Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::num_vertices ( ) const

Returns the number of vertices simulated by the graph, which is 2N.

Returns the number of vertices simulated by the graph, which is 2 * N

Complexity \(\Theta(1)\)

VertexDescriptor Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::out_degree ( VertexDescriptor  i) const

Returns the number of out-edges for a particular vertex.

Complexity \(\Theta(N)\)

VertexDescriptor Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::source ( const EdgeDescriptor &  e) const
inline

Returns the source vertex from an edge descriptor.

Complexity \(\Theta(1)\)

VertexDescriptor Scine::Molassembler::DistanceGeometry::ImplicitBoundsGraph::target ( const EdgeDescriptor &  e) const
inline

Returns the target vertex from an edge descriptor.

Complexity \(\Theta(1)\)


The documentation for this class was generated from the following file: