Molassembler  1.0.0
Molecule graph and conformer library
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
Scine::Molassembler Namespace Reference

Central library namespace. More...

Namespaces

 AtomInfo
 Information on particular element types.
 
 DistanceGeometry
 Distance geometry-related classes and functions.
 
 GraphAlgorithms
 Core graph-level algorithms (not requiring stereopermutator information)
 
 Hashes
 Classes and methods to compute hashes of atom environments.
 
 Interpret
 Given Cartesian coordinates, construct graphs or molecules.
 
 IO
 Input and output.
 
 Random
 Randomness source for the library.
 
 ShapeInference
 Methods to determine local shapes of atoms based on graph information.
 
 Shapes
 Symmetry definitions and properties.
 
 Stereopermutations
 Data classes for permutational spatial arrangement modeling.
 
 Stereopermutators
 Stereopermutator implementation details.
 
 Temple
 Template shorthands, optimizers and constexpr data types.
 

Data Structures

class  AngstromPositions
 A wrapper class around Utils' PositionCollection to emphasize that the positions stored therein are in Angstrom. More...
 
class  AtomStereopermutator
 Handles the steric permutation of substituents of a non-terminal central atom. More...
 
class  BondStereopermutator
 Handles specific relative arrangements of two atom stereopermutators joined by a bond. More...
 
class  Cycles
 Wrapper class to make working with RDL in C++ more pleasant. More...
 
class  DirectedConformerGenerator
 Helper type for directed conformer generation. More...
 
struct  Editing
 Class with static functions providing higher-level molecule operations. More...
 
class  PrivateGraph
 Library internal graph class wrapping BGL types. More...
 
class  Graph
 Represents the connectivity of atoms of a molecule. More...
 
struct  IteratorRange
 Homogeneous pair of iterators with begin and end member fns. More...
 
struct  MolGraphWriter
 Helper class to write the Graph as Graphviz output. More...
 
class  OrderDiscoveryHelper
 Container aiding in gradual discovery of order. More...
 
class  RankingTree
 Central class for unified IUPAC-like ranking of organic and inorganic structures. More...
 
class  Molecule
 Models a molecule as a graph (connectivity of atoms) and a list of stereopermutators. More...
 
struct  Options
 Contains all global settings for the library. More...
 
struct  RankingInformation
 Ranking data of substituents around a central vertex. More...
 
class  JsonSerialization
 Class representing a compact JSON serialization of a molecule. More...
 
class  StereopermutatorList
 Manages all stereopermutators that are part of a Molecule. More...
 
struct  SiteMapping
 
struct  BondIndex
 Type used to refer to particular bonds. Orders first < second. More...
 

Typedefs

using SiteIndex = Temple::StrongIndex< site_index_tag, unsigned >
 
using SiteToShapeVertexMap = Temple::StrongIndexFlatMap< SiteIndex, Shapes::Vertex >
 
using AtomIndex = std::size_t
 Unsigned integer atom index type. Used to refer to particular atoms.
 

Enumerations

enum  ChiralStatePreservation { ChiralStatePreservation::None, ChiralStatePreservation::EffortlessAndUnique, ChiralStatePreservation::Unique, ChiralStatePreservation::RandomFromMultipleBest }
 Specifies the effects of graph modifications on chiral centers. More...
 
enum  ShapeTransition { ShapeTransition::PrioritizeInferenceFromGraph, ShapeTransition::MaximizeChiralStatePreservation }
 Influences the choice of shape in substituent additions and removals that lead to increases or decreases of ligand size. More...
 
enum  BondType : unsigned {
  Single, Double, Triple, Quadruple,
  Quintuple, Sextuple, BondType::Eta
}
 Discrete bond type numeration. More...
 
enum  LengthUnit { Bohr, Angstrom }
 Length units.
 
enum  AtomEnvironmentComponents : unsigned {
  Connectivity = 0, ElementTypes = (1 << 0), BondOrders = (1 << 1), Shapes = (1 << 2),
  Stereopermutations = (1 << 3), All = ElementTypes | BondOrders | Shapes | Stereopermutations
}
 For bitmasks grouping components of immediate atom environments. More...
 

Functions

Utils::BondOrderCollection uffBondOrders (const Utils::ElementTypeCollection &elements, const AngstromPositions &angstromPositions)
 Calculates a floating-point bond order collection via UFF-like bond distance modelling. More...
 
Utils::BondOrderCollection covalentRadiiBondOrders (const Utils::ElementTypeCollection &elements, const AngstromPositions &angstromPositions)
 Calculates a binary (single or none) bond order collection via covalent radii. More...
 
std::vector< outcome::result
< Utils::PositionCollection >> 
generateRandomEnsemble (const Molecule &molecule, unsigned numStructures, const DistanceGeometry::Configuration &configuration=DistanceGeometry::Configuration{})
 Generate multiple sets of positional data for a Molecule. More...
 
std::vector< outcome::result
< Utils::PositionCollection >> 
generateEnsemble (const Molecule &molecule, unsigned numStructures, unsigned seed, const DistanceGeometry::Configuration &configuration=DistanceGeometry::Configuration{})
 Generate multiple sets of positional data for a Molecule. More...
 
outcome::result
< Utils::PositionCollection
generateRandomConformation (const Molecule &molecule, const DistanceGeometry::Configuration &configuration=DistanceGeometry::Configuration{})
 Generate a 3D structure of a Molecule. More...
 
outcome::result
< Utils::PositionCollection
generateConformation (const Molecule &molecule, unsigned seed, const DistanceGeometry::Configuration &configuration=DistanceGeometry::Configuration{})
 Generate a 3D structure of a Molecule. More...
 
boost::optional< unsigned > smallestCycleContaining (AtomIndex atom, const Cycles &cycles)
 Yields the size of the smallest cycle containing an atom. More...
 
std::unordered_map< AtomIndex,
unsigned > 
makeSmallestCycleMap (const Cycles &cycleData)
 Creates a mapping from atom index to the size of the smallest cycle containing that index. More...
 
std::vector< AtomIndexmakeRingIndexSequence (std::vector< BondIndex > edgeDescriptors)
 Create cycle vertex sequence from unordered edges. More...
 
std::vector< AtomIndexcentralizeRingIndexSequence (std::vector< AtomIndex > ringIndexSequence, AtomIndex center)
 Centralize a cycle vertex sequence at a particular vertex. More...
 
unsigned countPlanarityEnforcingBonds (const std::vector< BondIndex > &edgeSet, const Graph &graph)
 Count the number of planarity enforcing bonds. More...
 
unsigned numRotatableBonds (const Molecule &mol)
 Calculates a number of freely rotatable bonds in a molecule. More...
 
PrivateGraph::Edge toInner (const BondIndex &bondIndex, const PrivateGraph &graph)
 Transform BondIndex to PrivateGraph::Edge.
 
BondIndex toOuter (const PrivateGraph::Edge &edge, const PrivateGraph &graph)
 Transform PrivateGraph::Edge to BondIndex.
 
std::vector< int > canonicalAutomorphism (const PrivateGraph &inner, const std::vector< Hashes::WideHashType > &hashes)
 Calculate the canonical labeling of a molecule from a coloring specified by a set of hashes. More...
 
std::vector< unsigned > distance (AtomIndex i, const Graph &graph)
 Calculates the graph distance from a single atom index to all others. More...
 
bool enantiomeric (const Molecule &a, const Molecule &b)
 Returns whether three dimensional representations of two molecules are mirror images of one another. More...
 
bool diastereomeric (const Molecule &a, const Molecule &b)
 Determine whether molecules are diastereomers of one another. More...
 
bool epimeric (const Molecule &a, const Molecule &b)
 Determine whether molecules are epimers of one another. More...
 
Molecule enantiomer (const Molecule &a)
 Generates a molecule's enantiomer. More...
 
Random::EnginerandomnessEngine ()
 Randomness source for the entire library. More...
 
SiteToShapeVertexMap siteToShapeVertexMap (const Stereopermutations::Stereopermutation &stereopermutation, const RankingInformation::RankedSitesType &canonicalSites, const std::vector< RankingInformation::Link > &siteLinks)
 Generates a flat mapping from site indices to shape vertices. More...
 
Temple::StrongIndexFlatMap
< Shapes::Vertex, SiteIndex
shapeVertexToSiteIndexMap (const Stereopermutations::Stereopermutation &stereopermutation, const RankingInformation::RankedSitesType &canonicalSites, const std::vector< RankingInformation::Link > &siteLinks)
 Generates a flat mapping from shape vertices to site indices. More...
 
Stereopermutations::Stereopermutation stereopermutationFromSiteToShapeVertexMap (const SiteToShapeVertexMap &siteToShapeVertexMap, const std::vector< RankingInformation::Link > &links, const RankingInformation::RankedSitesType &canonicalSites)
 
std::size_t hash_value (const BondIndex &bond)
 Hash for BondIndex so it can be used as a key type in unordered containers. More...
 

Variables

constexpr bool buildTypeIsDebug = true
 
constexpr unsigned nBondTypes = 7
 Number of distinct bond types present in the library.
 

Detailed Description

Central library namespace.

Enumeration Type Documentation

For bitmasks grouping components of immediate atom environments.

Differing strictnesses of comparisons may be desirable for various purposes, hence a modular comparison function is provided.

Warning
Setting Stereopermutations without setting Shapes does nothing.
enum Scine::Molassembler::BondType : unsigned
strong

Discrete bond type numeration.

Bond type enumeration. Besides the classic organic single, double and triple bonds, bond orders up to sextuple are explicitly included.

Enumerator
Eta 

Internal bond order to mark haptic binding sites.

Bond order used internally only that relabels bonds in haptic binding sites. Name from the standard eta notation for haptic ligands.

Specifies the effects of graph modifications on chiral centers.

In case a substituent is added or removed at a stereopermutator, an attempt is made to transfer chiral information into the new geometry. How this attempt is made can be altered.

The following graphs are to illustrate which chiral information transfers are possible under which setting.

The first is for the situation of ligand gain. There are very few edges and shapes in this graph because in nearly all cases, there are multiple best mappings, and propagation in these cases are enabled only in RandomFromMultipleBest, in which case the graph gets quite dense, so it's not shown. The green edges here are the ligand gain situations propagated under EffortlessAndUnique.

The second is for the situation of ligand loss. The edge label indicates the group of shape ligands from which a ligand is being removed. Green edges represent situtations propagated under EffortlessAndUnique, black edges those under Unique. RandomFromMultipleBest is not shown due to graph density.

Enumerator
None 

Don't try to preserve chiral state.

EffortlessAndUnique 

Use only completely unambiguous zero-effort mappings. Those are the green edges in the graphs below. Note that the ligand gain situation from square planar to square pyramidal is not unique, and therefore not shown as green.

Unique 

Propagates if the best shape mapping is unique, i.e. there are no other mappings with the same quality measures. This enables all green and black edges.

RandomFromMultipleBest 

Chooses randomly from the set of best mappings, permitting chiral state propagation in all cases. So propagating chiral state from square planar to square pyramidal is now possible – there are two ways of placing the new apical ligand – but you only get one of them.

Influences the choice of shape in substituent additions and removals that lead to increases or decreases of ligand size.

Enumerator
PrioritizeInferenceFromGraph 

Try to infer a shape from graph information first. Supplant with best shape for chiral state preservation.

MaximizeChiralStatePreservation 

Always choose the shape so that the maximum amount of chiral information can be preserved (as per ChiralStatePreservation setting).

Function Documentation

std::vector<int> Scine::Molassembler::canonicalAutomorphism ( const PrivateGraph &  inner,
const std::vector< Hashes::WideHashType > &  hashes 
)

Calculate the canonical labeling of a molecule from a coloring specified by a set of hashes.

Complexity Underlying algorithm is exponential-time, but we use graph invariants in addition, which should give us sub-exponential complexity for common cases.

Parameters
innerThe inner graph representation of a Molecule
hashesA flat map of hashes for each vertex
Exceptions
std::domain_errorIf the size of mol's graph exceeds the maximum value of int. This is the limit for the underlying labeling algorithm.
std::invalid_argumentIf vector of hashes length does not match the molecule's number of vertices.
Warning
The canonical form of the molecule is influenced by the provided hashes. Any comparisons on the canonical form must use the same comparison bitmask as when generating the hashes for this function call here.
Returns
The canonical labeling sequence of atom indices into mol.
std::vector<AtomIndex> Scine::Molassembler::centralizeRingIndexSequence ( std::vector< AtomIndex >  ringIndexSequence,
AtomIndex  center 
)

Centralize a cycle vertex sequence at a particular vertex.

Complexity \(O(N)\)

unsigned Scine::Molassembler::countPlanarityEnforcingBonds ( const std::vector< BondIndex > &  edgeSet,
const Graph &  graph 
)

Count the number of planarity enforcing bonds.

Counts the number of planarity enforcing bonds in a set of edge descriptors. Double bonds are considered planarity enforcing.

Complexity \(O(N)\)

Utils::BondOrderCollection Scine::Molassembler::covalentRadiiBondOrders ( const Utils::ElementTypeCollection &  elements,
const AngstromPositions &  angstromPositions 
)

Calculates a binary (single or none) bond order collection via covalent radii.

Complexity \(\Theta(N^2)\)

Note
This is just a convenience forwarder for Utils::BondDetector::detectBonds for different types
bool Scine::Molassembler::diastereomeric ( const Molecule &  a,
const Molecule &  b 
)

Determine whether molecules are diastereomers of one another.

Two molecules are diastereomers if they are not mirror images of one another and have different configurations at one or more stereocenters.

std::vector<unsigned> Scine::Molassembler::distance ( AtomIndex  i,
const Graph &  graph 
)

Calculates the graph distance from a single atom index to all others.

Performs a BFS through the entire graph starting at the supplied index and records the distance of vertices it encounters.

Complexity \(O(N)\)

Exceptions
std::out_of_rangeIf i >= N()
Returns
A vector containing the distances of all vertices to the supplied index
Molecule Scine::Molassembler::enantiomer ( const Molecule &  a)

Generates a molecule's enantiomer.

Complexity \(O(A)\) where \(A\) is the number of chiral atom stereopermutators of one molecule

Parameters
aThe molecule whose enantiomer to generate
Note
If there are no atom stereopermutators in this molecule with more than one stereopermutations, yields a Molecule identical to a
Warning
This has been tested very little
Returns
The enantiomer to a molecule
bool Scine::Molassembler::enantiomeric ( const Molecule &  a,
const Molecule &  b 
)

Returns whether three dimensional representations of two molecules are mirror images of one another.

This algorithm looks through both Molecules' StereopermutatorLists and checks whether each pair of assigned AtomStereopermutators are mirror images of one another and there is at least one enantiomeric pair.

Note
When determining enantiomerism of a pair of AtomStereopermutators, if either permutator is unassigned, then three-dimensional representations may be enantiomeric or not (determined by random choice of assignments at conformer generation time). In these cases, this function returns false.
bool Scine::Molassembler::epimeric ( const Molecule &  a,
const Molecule &  b 
)

Determine whether molecules are epimers of one another.

Two molecules are epimers if they differ in exactly one stereocenter.

outcome::result<Utils::PositionCollection> Scine::Molassembler::generateConformation ( const Molecule &  molecule,
unsigned  seed,
const DistanceGeometry::Configuration &  configuration = DistanceGeometry::Configuration{} 
)

Generate a 3D structure of a Molecule.

Parameters
moleculeThe molecule for which to generate three-dimensional positions. This molecule may not contain stereopermutators with zero assignments.
seedA number to seed the pseudo-random number generator used in conformer generation with
configurationThe configuration object to control Distance Geometry in detail. The defaults are usually fine.

Complexity Roughly \(O(N^3)\)

See Also
generateEnsemble
Returns
A result type which may or may not contain a PositionCollection (in Bohr length units). The result type is much like an optional, except that in the error case it carries data about the error in order to help diagnose possible mistakes made in the molecular graph specification.
std::vector< outcome::result<Utils::PositionCollection>> Scine::Molassembler::generateEnsemble ( const Molecule &  molecule,
unsigned  numStructures,
unsigned  seed,
const DistanceGeometry::Configuration &  configuration = DistanceGeometry::Configuration{} 
)

Generate multiple sets of positional data for a Molecule.

In the case of a molecule that does not have unassigned stereopermutators, this is akin to generating a conformational ensemble. If there are unassigned stereopermutators, these are assigned at random (consistent with relative statistical occurrences of stereopermutations) for each structure.

Parameters
moleculeThe molecule for which to generate sets of three-dimensional positions. This molecule may not contain stereopermutators with zero assignments.
numStructuresThe number of desired structures to generate
seedA number to seed the pseudo-random number generator used in conformer generation with
configurationThe configuration object to control Distance Geometry in detail. The defaults are usually fine.
Precondition
molecule may not contain stereopermutators with zero assignments as this means that the molecule is not representable in three dimensions.
configuration's preconditions must be met

Complexity Roughly \(O(C \cdot N^3)\) where \(C\) is the number of conformers and \(N\) is the number of atoms in molecule

Note
The Distance Geometry procedure can fail stochastically without fault in the input. See the documentation of DgError for detailed description of error return codes and how to deal with them.
This function is parallelized. Use the OMP_NUM_THREADS environment variable to control the number of threads used. Results are sequenced and reproducible.
Returns
A result type which may or may not contain a vector of PositionCollections (in Bohr length units). The result type is much like an optional, except that in the error case it carries data about the error in order to help diagnose possible mistakes made in the molecular graph specification.
outcome::result<Utils::PositionCollection> Scine::Molassembler::generateRandomConformation ( const Molecule &  molecule,
const DistanceGeometry::Configuration &  configuration = DistanceGeometry::Configuration{} 
)

Generate a 3D structure of a Molecule.

Parameters
moleculeThe molecule for which to generate three-dimensional positions. This molecule may not contain stereopermutators with zero assignments.
configurationThe configuration object to control Distance Geometry in detail. The defaults are usually fine.

Complexity Roughly \(O(N^3)\)

Note
This function advances the state of the global PRNG.
See Also
generateEnsemble
Returns
A result type which may or may not contain a PositionCollection (in Bohr length units). The result type is much like an optional, except that in the error case it carries data about the error in order to help diagnose possible mistakes made in the molecular graph specification.
std::vector< outcome::result<Utils::PositionCollection>> Scine::Molassembler::generateRandomEnsemble ( const Molecule &  molecule,
unsigned  numStructures,
const DistanceGeometry::Configuration &  configuration = DistanceGeometry::Configuration{} 
)

Generate multiple sets of positional data for a Molecule.

In the case of a molecule that does not have unassigned stereopermutators, this is akin to generating a conformational ensemble. If there are unassigned stereopermutators, these are assigned at random (consistent with relative statistical occurrences of stereopermutations) for each structure.

Parameters
moleculeThe molecule for which to generate sets of three-dimensional positions. This molecule may not contain stereopermutators with zero assignments.
numStructuresThe number of desired structures to generate
configurationThe configuration object to control Distance Geometry in detail. The defaults are usually fine.
Precondition
molecule may not contain stereopermutators with zero assignments as this means that the molecule is not representable in three dimensions.
configuration's preconditions must be met

Complexity Roughly \(O(C \cdot N^3)\) where \(C\) is the number of conformers and \(N\) is the number of atoms in molecule

Note
The Distance Geometry procedure can fail stochastically without fault in the input. See the documentation of DgError for detailed description of error return codes and how to deal with them.
This function is parallelized. Use the OMP_NUM_THREADS environment variable to control the number of threads used. Results are sequenced and reproducible.
This function advances the state of the global PRNG.
Returns
A result type which may or may not contain a vector of PositionCollections (in Bohr length units). The result type is much like an optional, except that in the error case it carries data about the error in order to help diagnose possible mistakes made in the molecular graph specification.
auto ensemble = generateRandomEnsemble(mol, 10);
unsigned conformerIndex = 0;
for(auto& conformerResult : ensemble) {
if(conformerResult) {
std::to_string(conformerIndex) + ".mol",
mol,
conformerResult.value()
);
} else {
std::cout << "Conformer " << conformerIndex << " failed: "
<< conformerResult.error().message() << "\n";
}
++conformerIndex;
}
std::size_t Scine::Molassembler::hash_value ( const BondIndex &  bond)

Hash for BondIndex so it can be used as a key type in unordered containers.

Complexity \(\Theta(1)\)

std::vector<AtomIndex> Scine::Molassembler::makeRingIndexSequence ( std::vector< BondIndex >  edgeDescriptors)

Create cycle vertex sequence from unordered edges.

From a set of unordered graph edge descriptors, this function creates one of the two possible vertex index sequences describing the cycle.

Complexity \(O(E^2)\) worst case

std::unordered_map<AtomIndex, unsigned> Scine::Molassembler::makeSmallestCycleMap ( const Cycles &  cycleData)

Creates a mapping from atom index to the size of the smallest cycle containing that index.

Complexity \(\Theta(R)\) where \(R\) is the number of relevant cycles in the molecule

Note
The map does not contain entries for indices not enclosed by a cycle.
unsigned Scine::Molassembler::numRotatableBonds ( const Molecule &  mol)

Calculates a number of freely rotatable bonds in a molecule.

The number of rotatable bonds is calculated as a bond-wise sum of contributions that is rounded at the end:

A bond can contribute to the number of rotatable bonds if

  • It is of bond type Single
  • Neither atom connected by the bond is terminal
  • There is no assigned BondStereopermutator on the bond

A bond meeting the prior criteria:

  • If not part of a cycle, contributes a full rotatable bond to the sum
  • If part of a cycle, contributes (S - 3) / S to the sum (where S is the cycle size).

Complexity \(\Theta(B)\) where \(B\) is the number of bonds

Warning
The number of rotatable bonds is an unphysical descriptor and definitions differ across libraries. Take the time to read the algorithm description implemented here and do some testing. If need be, all information used by this algorithm is accessible from the Molecule interface, and a custom algorithm can be implemented.
Random::Engine& Scine::Molassembler::randomnessEngine ( )

Randomness source for the entire library.

This instance supplies the library with randomness. The default seeding behavior is build-type dependent. In debug builds, the seed is fixed. In release builds, the generator is seeded from std::random_device.

Note
If you wish to get deterministic behavior in release builds, re-seed the generator.

Complexity \(\Theta(1)\)

Warning
Do not use this instance in any static object's destructor!
Temple::StrongIndexFlatMap<Shapes::Vertex, SiteIndex> Scine::Molassembler::shapeVertexToSiteIndexMap ( const Stereopermutations::Stereopermutation &  stereopermutation,
const RankingInformation::RankedSitesType &  canonicalSites,
const std::vector< RankingInformation::Link > &  siteLinks 
)

Generates a flat mapping from shape vertices to site indices.

Generates exactly the inverse map to generateSiteToSymmetryPositionMap

auto mapping = shapeVertexToSiteIndexMap(...);
unsigned siteIndexAtSymmetryPositionFive = mapping.at(5u);

Complexity \(\Theta(N)\)

SiteToShapeVertexMap Scine::Molassembler::siteToShapeVertexMap ( const Stereopermutations::Stereopermutation &  stereopermutation,
const RankingInformation::RankedSitesType &  canonicalSites,
const std::vector< RankingInformation::Link > &  siteLinks 
)

Generates a flat mapping from site indices to shape vertices.

Generates a mapping from site indices to shape vertices according to the ranking character distribution to shape vertex of a stereopermutation (its characters member) and any defined links between shape positions.

auto mapping = siteToShapeVertexMap(...);
unsigned symmetryPositionOfSiteFour = mapping.at(4u);

Complexity \(\Theta(N)\)

boost::optional<unsigned> Scine::Molassembler::smallestCycleContaining ( AtomIndex  atom,
const Cycles &  cycles 
)

Yields the size of the smallest cycle containing an atom.

Complexity \(O(U + C)\) where \(U\) is the number of unique ring families of the molecule and \(C\) is the number of cycles containing atom

Warning
Do not use this a lot. Consider makeSmallestCycleMap() instead.
Utils::BondOrderCollection Scine::Molassembler::uffBondOrders ( const Utils::ElementTypeCollection &  elements,
const AngstromPositions &  angstromPositions 
)

Calculates a floating-point bond order collection via UFF-like bond distance modelling.

Complexity \(\Theta(N^2)\)

Exceptions
std::logic_errorIf interpreted fractional bond orders are greater than 6.5. In these cases, the structure is most likely unreasonable.
Warning
UFF parameter bond order calculation is a very primitive approximation and carries a high risk of misinterpretation