Molassembler  3.0.0
Molecule graph and conformer library
 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
Scine::Molassembler::DirectedConformerGenerator Class Reference

Helper type for directed conformer generation. More...

#include <DirectedConformerGenerator.h>

Data Structures

struct  EnumerationSettings
 Settings for enumeration. More...
 
class  Impl
 
struct  Relabeler
 Relabeler for decision lists with minimized structures. More...
 

Public Member Functions

Constructors
 DirectedConformerGenerator (Molecule molecule, BondStereopermutator::Alignment alignment=BondStereopermutator::Alignment::Staggered, const BondList &bondsToConsider={})
 Constructor. More...
 
Special member functions
 DirectedConformerGenerator (DirectedConformerGenerator &&other) noexcept
 
DirectedConformerGeneratoroperator= (DirectedConformerGenerator &&other) noexcept
 
 DirectedConformerGenerator (const DirectedConformerGenerator &other)=delete
 
DirectedConformerGeneratoroperator= (const DirectedConformerGenerator &other)=delete
 
 ~DirectedConformerGenerator ()
 
Modification
DecisionList generateNewDecisionList (Random::Engine &engine=randomnessEngine())
 Generate a new list of discrete dihedral arrangement choices. More...
 
bool insert (const DecisionList &decisionList)
 Adds a decision list to the underlying set-like data structure. More...
 
bool contains (const DecisionList &decisionList) const
 Checks whether a DecisionList is part of the underlying set. More...
 
Information
BondStereopermutator::Alignment alignment () const
 Get alignment with which this generator was instantiated with.
 
const BondListbondList () const
 Accessor for list of relevant bonds. More...
 
unsigned decisionListSetSize () const
 Number of conformer decision lists stored in the underlying set-like data structure. More...
 
unsigned idealEnsembleSize () const
 Number of conformers needed for full ensemble. More...
 
Result< Utils::PositionCollectiongenerateRandomConformation (const DecisionList &decisionList, const DistanceGeometry::Configuration &configuration=DistanceGeometry::Configuration{}, BondStereopermutator::FittingMode fitting=BondStereopermutator::FittingMode::Nearest) const
 Try to generate a conformer for a particular decision list. More...
 
Result< Utils::PositionCollectiongenerateConformation (const DecisionList &decisionList, unsigned seed, const DistanceGeometry::Configuration &configuration=DistanceGeometry::Configuration{}, BondStereopermutator::FittingMode fitting=BondStereopermutator::FittingMode::Nearest) const
 Try to generate a conformer for a particular decision list. More...
 
Molecule conformationMolecule (const DecisionList &decisionList) const
 Yields a molecule reference for a particular decision list. More...
 
DecisionList getDecisionList (const Utils::AtomCollection &atomCollection, BondStereopermutator::FittingMode mode=BondStereopermutator::FittingMode::Nearest) const
 Infer a decision list for relevant bonds from an atom collection. More...
 
DecisionList getDecisionList (const Utils::PositionCollection &positions, BondStereopermutator::FittingMode mode=BondStereopermutator::FittingMode::Thresholded) const
 Infer a decision list for relevant bonds from positional information only. More...
 
void enumerate (std::function< void(const DecisionList &, Utils::PositionCollection)> callback, unsigned seed, const EnumerationSettings &settings={})
 Enumerate all conformers of the captured molecule. More...
 
void enumerateRandom (std::function< void(const DecisionList &, Utils::PositionCollection)> callback, const EnumerationSettings &settings={})
 Enumerate all conformers of the captured molecule. More...
 
Relabeler relabeler () const
 Generates a relabeler for the molecule and considered bonds.
 
std::vector< int > binMidpointIntegers (const DecisionList &decision) const
 
std::vector< std::pair< int,
int > > 
binBounds (const DecisionList &decision) const
 Relabels a DecisionList into the bounds of its bin.
 

Static Public Member Functions

Static functions
static boost::variant
< IgnoreReason,
BondStereopermutator
considerBond (const BondIndex &bondIndex, const Molecule &molecule, BondStereopermutator::Alignment alignment=BondStereopermutator::Alignment::Staggered)
 Decide whether to consider a bond's dihedral values for directed conformer generation or not. More...
 
static unsigned distance (const DecisionList &a, const DecisionList &b, const DecisionList &bounds)
 Calculates a distance metric between two decision lists for dihedral permutations. More...
 

Private Attributes

std::unique_ptr< ImplpImpl_
 

Public types

enum  IgnoreReason {
  IgnoreReason::AtomStereopermutatorPreconditionsUnmet, IgnoreReason::HasAssignedBondStereopermutator, IgnoreReason::HasTerminalConstitutingAtom, IgnoreReason::InCycle,
  IgnoreReason::IsEtaBond, IgnoreReason::RotationIsIsotropic
}
 Type used to represent the list of bonds relevant to directed conformer generation. More...
 
using BondList = std::vector< BondIndex >
 Type used to represent the list of bonds relevant to directed conformer generation.
 
using DecisionList = std::vector< std::uint8_t >
 Type used to represent assignments at bonds. More...
 
static constexpr std::uint8_t unknownDecision = std::numeric_limits<std::uint8_t>::max()
 Value set in decision lists if no decision could be recovered.
 

Detailed Description

Helper type for directed conformer generation.

Generates new combinations of BondStereopermutator assignments and provides helper functions for the generation of conformers using these combinations and the reverse, finding the combinations from conformers.

It is important that you lower your expectations for the modeling of dihedral energy minima, however. Considering that Molassembler neither requires you to supply a correct graph, never detects or kekulizes aromatic systems nor asks you to supply an overall charge for a molecule, it should be understandable that the manner in which Molassembler decides where dihedral energy minima are is somewhat underpowered. The manner in which shape vertices are aligned in stereopermutation enumeration isn't even strictly based on a physical principle. We suggest the following to make the most of what the library can do for you:

  • Read the documentation for the various alignments. Consider using not just the default Staggered alignment, but either EclipsedAndStaggered or BetweenEclipsedAndStaggered to improve your chances of capturing all rotational minima. This will likely generate more conformers than strictly required, but should capture all minima.
  • Energy minimize all generated conformers with a suitable method and then deduplicate.
  • Consider using the Relabeler to do a final deduplication step.

Client code for generating conformers could look something like this. Make sure to differentiate cases in which the list of considered bonds by bondList() is empty, since many member functions behave differently in those circumstances.

auto mol = IO::read(...);
std::vector<Utils::PositionCollection> conformers;
if(generator.bondList().empty()) {
// The generator has decided that there are no bonds that need to be
// systematically rotated for directed conformer generation. You might
// as well just use a conformation directly from generateConformation for
// the sole conformation needed for a full ensemble
auto conformerResult = generateConformation(mol);
if(conformerResult) {
conformers.push_back(std::move(conformerResult.value()));
} else {
std::cout << "Could not generate conformer: " << conformerResult.error().message() << "\n";
}
} else {
while(generator.decisionListSetSize() != generator.idealEnsembleSetSize()) {
auto newDecisionList = generator.generateNewDecisionList();
auto conformerResult = generator.generateConformation(newDecisionList);
if(conformerResult) {
conformers.push_back(std::move(conformerResult.value()));
} else {
std::cout << "Could not generate conformer: " << conformerResult.error().message() << "\n";
}
}
}
Note
This type is not copyable.

Member Typedef Documentation

Type used to represent assignments at bonds.

Note
You can serialize / deserialize this with Scine::base64::encode and Scine::base64::decode. It's not the most efficient representation but still better than each position having its own character.

Member Enumeration Documentation

Type used to represent the list of bonds relevant to directed conformer generation.

Enumerator
AtomStereopermutatorPreconditionsUnmet 

There is not an assigned stereopermutator on both ends of the bond.

HasAssignedBondStereopermutator 

There is already an assigned bond stereopermutator on the bond.

HasTerminalConstitutingAtom 

At least one constituting atom is terminal.

InCycle 

This bond is in a cycle.

Despite the fact that cycle bonds may very well contribute to the conformational ensemble, it is difficult to reason about conformational flexibility of cycles:

  • Is a cycle aromatic or anti-aromatic?
  • Is there a partial conjugated system?
  • Are trans-arrangements of cycle atom sequences feasible in the cycle?

We see four possible strategies of dealing with cycle bonds:

  1. Consider all of them.
  2. Add chemical intuitive reasoning to exclude some bonds in cycles from consideration
  3. Use Distance Geometry to reason about possible dihedrals
  4. Consider none of them.

The first strategy has several important drawbacks: Dihedrals are heavily restricted and/or correlated in the chemically common small cycles, and most, if not nearly all, combinations will not be representable in three dimensions. Bonds from cycles incur heavy cost in the decision list set representation and computational time needed to generate conformations because it is not possible with mere triangle inequality smoothing to determine representability of these conformers, and hence a full refinement is done for each.

In contrast, the second strategy could yield properly limited assignment possibilities for common chemical patterns. The algorithms needed to answer the questions listed above are complex and could easily fail outside common organic chemical patterns.

The last strategy is cleanest, but also likely considerably computationally expensive.

For now, we take strategy number four - ignoring bonds in cycles for directed conformer generation - until we can dedicate some resources to approach three.

IsEtaBond 

This bond is an eta bond (indicates bonding to haptic ligands, and therefore excluded)

RotationIsIsotropic 

Rotation about this bond is isotropic (all ligands have same ranking on at least one side)

Constructor & Destructor Documentation

Scine::Molassembler::DirectedConformerGenerator::DirectedConformerGenerator ( Molecule  molecule,
BondStereopermutator::Alignment  alignment = BondStereopermutator::Alignment::Staggered,
const BondList bondsToConsider = {} 
)
explicit

Constructor.

Parameters
moleculeMolecule for which to generate conformers
alignmentAlignment with which to generate BondStereopermutator on considered bonds
bondsToConsiderA list of suggestions of which bonds to consider. Bonds for which considerBond() yields an IgnoreReason will still be ignored. If the list is empty, all bonds of a molecule will be tested against considerBond().

Complexity \(\Theta(B)\) where \(B\) is the number of bonds in the molecule. If there is a particularly large shape in the molecule, this can dominate with \(\Theta(S!)\).

Scales linearly with the number of bonds in molecule or bondsToConsider's size.

Member Function Documentation

std::vector<int> Scine::Molassembler::DirectedConformerGenerator::binMidpointIntegers ( const DecisionList decision) const

Relabels a DecisionList into bin midpoint integers Returns the dihedra angles. The angle is defined as the first angle of the stereopermutation at the given bond (note that this is inconsistent with its definition in Relabeler::add).

const BondList& Scine::Molassembler::DirectedConformerGenerator::bondList ( ) const

Accessor for list of relevant bonds.

Complexity \(\Theta(1)\)

Note
This list may be empty. Many member functions may throw under these conditions.
Molecule Scine::Molassembler::DirectedConformerGenerator::conformationMolecule ( const DecisionList decisionList) const

Yields a molecule reference for a particular decision list.

Complexity \(\Theta(N)\) bond stereopermutator assignments

static boost::variant<IgnoreReason, BondStereopermutator> Scine::Molassembler::DirectedConformerGenerator::considerBond ( const BondIndex bondIndex,
const Molecule molecule,
BondStereopermutator::Alignment  alignment = BondStereopermutator::Alignment::Staggered 
)
static

Decide whether to consider a bond's dihedral values for directed conformer generation or not.

Parameters
bondIndexThe bond to consider
moleculeThe molecule in which bond exists
smallestCycleMapA map of atom indices to the smallest cycle they are in
alignmentAlignment to generate BondStereopermutator instances with.

Complexity \(O(S!)\) where \(S\) is the size of the larger shape constituting bondIndex

See Also
makeSmallestCycleMap
Returns
Either a reason why the bond was ignored, or a BondStereopermutator placed on the suggested bond indicating that the bond should be considered.
bool Scine::Molassembler::DirectedConformerGenerator::contains ( const DecisionList decisionList) const

Checks whether a DecisionList is part of the underlying set.

Exceptions
std::logic_errorIf the result of bondList() is empty, i.e. there are no bonds to consider for directed conformer generation.

Complexity \(\Theta(N)\)

unsigned Scine::Molassembler::DirectedConformerGenerator::decisionListSetSize ( ) const

Number of conformer decision lists stored in the underlying set-like data structure.

Returns
The number of DecisionLists stored in the underlying set.

Complexity \(\Theta(1)\)

Warning
If bondList() returns an empty list, i.e. there are no bonds to consider for directed conformer generation, this always returns zero.
static unsigned Scine::Molassembler::DirectedConformerGenerator::distance ( const DecisionList a,
const DecisionList b,
const DecisionList bounds 
)
static

Calculates a distance metric between two decision lists for dihedral permutations.

The distance metric is:

\(d = \sum_i \min\left(a_i - b_i \textrm{ mod } U_i, b_i - a_i \textrm{ mod } U_i\right)\)

where \(a_i\) is the choice in a at position \(i\) and likewise for b, and \(U_i\) is the upper exclusive bound on the choice values at position \(i\).

This is akin to the shortest distance between the choices when arranged in a modular number circle.

auto a = std::vector<std::uint8_t> {{1, 5}};
auto b = std::vector<std::uint8_t> {{3, 0}};
auto bounds = std::vector<std::uint8_t> {{5, 6}};
unsigned d = distance(a, b, bounds); // Yields 2 + 1 = 3
Parameters
aThe first distance metric
bThe second distance metric
boundsUpper exclusive bound on values at each position

Complexity \(\Theta(N)\)

Returns
A distance metric between a and b.
void Scine::Molassembler::DirectedConformerGenerator::enumerate ( std::function< void(const DecisionList &, Utils::PositionCollection)>  callback,
unsigned  seed,
const EnumerationSettings settings = {} 
)

Enumerate all conformers of the captured molecule.

Clears the stored set of decision lists, then enumerates all conformers of the molecule in parallel.

Parameters
callbackFunction called with decision list and conformer positions for each successfully generated. It is guaranteed that the callback function is never called simultaneously even in parallel execution.
seedRandomness initiator for decision list and conformer generation
settingsFurther parameters for enumeration algorithms
Note
This function is parallelized. Use the OMP_NUM_THREADS environment variable to control the number of threads used. Callback invocations are unsequenced but the arguments are reproducible.
void Scine::Molassembler::DirectedConformerGenerator::enumerateRandom ( std::function< void(const DecisionList &, Utils::PositionCollection)>  callback,
const EnumerationSettings settings = {} 
)

Enumerate all conformers of the captured molecule.

Clears the stored set of decision lists, then enumerates all conformers of the molecule in parallel.

Parameters
callbackFunction called with decision list and conformer positions for each successfully generated pair. It is guaranteed that the callback function is never called simultaneously even in parallel execution.
seedRandomness initiator for decision list and conformer generation
settingsFurther parameters for enumeration algorithms
Note
This function is parallelized. Use the OMP_NUM_THREADS environment variable to control the number of threads used. Callback invocations are unsequenced but the arguments are reproducible.
This function advances the state of the global PRNG.
Result<Utils::PositionCollection> Scine::Molassembler::DirectedConformerGenerator::generateConformation ( const DecisionList decisionList,
unsigned  seed,
const DistanceGeometry::Configuration configuration = DistanceGeometry::Configuration{},
BondStereopermutator::FittingMode  fitting = BondStereopermutator::FittingMode::Nearest 
) const

Try to generate a conformer for a particular decision list.

This is very similar to the free generateConformation function in terms of what configuration will accept.

See Also
Scine::Molassembler::generateConformation()
Exceptions
std::invalid_argumentIf the passed decisionList does not match the length of the result of bondList().
DecisionList Scine::Molassembler::DirectedConformerGenerator::generateNewDecisionList ( Random::Engine engine = randomnessEngine())

Generate a new list of discrete dihedral arrangement choices.

Guarantees that the generated list is not yet part of the underlying set.

Complexity \(\Theta(N)\)

Exceptions
std::logic_errorIf the underlying set is full, i.e. all decision lists for conformers have been generated or if there are no bonds to consider.
Postcondition
The new DecisionList is part of the stored list of generated decision lists and will not be generated again. The result of decisionListSetSize() is incremented.
Note
This function advances the state of the global PRNG if the default argument for engine is chosen.
Returns
a DecisionList of length matching the number of relevant bonds.
Result<Utils::PositionCollection> Scine::Molassembler::DirectedConformerGenerator::generateRandomConformation ( const DecisionList decisionList,
const DistanceGeometry::Configuration configuration = DistanceGeometry::Configuration{},
BondStereopermutator::FittingMode  fitting = BondStereopermutator::FittingMode::Nearest 
) const

Try to generate a conformer for a particular decision list.

This is very similar to the free generateRandomConformation function in terms of what configuration will accept.

See Also
Scine::Molassembler::generateRandomConformation()
Note
Advances the state of the global PRNG. Not reentrant.
Exceptions
std::invalid_argumentIf the passed decisionList does not match the length of the result of bondList().
DecisionList Scine::Molassembler::DirectedConformerGenerator::getDecisionList ( const Utils::AtomCollection atomCollection,
BondStereopermutator::FittingMode  mode = BondStereopermutator::FittingMode::Nearest 
) const

Infer a decision list for relevant bonds from an atom collection.

For all bonds considered relevant (i.e. all bonds in bondList()), fits supplied positions to possible stereopermutations and returns the result. Entries have the value DirectedConformer::unknownDecision if no permutation could be recovered. The usual BondStereopermutator fitting tolerances apply.

Warning
This function assumes several things about your supplied positions
  • There have only been dihedral changes and no AtomStereopermutator assignment changes
  • The molecule represented in positions has not constutitionally rearranged (although a little check for matching element types does exist here. This is not a full safeguard against index permutations.)

Complexity \(\Theta(N)\) bond stereopermutator fits

Exceptions
std::logic_errorIf the element type sequence of the atom collection does not match the underlying molecule
DecisionList Scine::Molassembler::DirectedConformerGenerator::getDecisionList ( const Utils::PositionCollection positions,
BondStereopermutator::FittingMode  mode = BondStereopermutator::FittingMode::Thresholded 
) const

Infer a decision list for relevant bonds from positional information only.

For all bonds considered relevant (i.e. all bonds in bondList()), fits supplied positions to possible stereopermutations and returns the result. Entries have the value DirectedConformer::unknownDecision if no permutation could be recovered. The usual BondStereopermutator fitting tolerances apply.

Warning
This function assumes several things about your supplied positions
  • There have only been dihedral changes and no AtomStereopermutator assignment changes
  • The molecule represented in positions has not constutitionally rearranged

Complexity \(\Theta(N)\) bond stereopermutator fits

unsigned Scine::Molassembler::DirectedConformerGenerator::idealEnsembleSize ( ) const

Number of conformers needed for full ensemble.

Complexity \(\Theta(1)\)

Warning
If bondList() returns an empty list, i.e. there are no bonds to consider for directed conformer generation, this always returns zero.
bool Scine::Molassembler::DirectedConformerGenerator::insert ( const DecisionList decisionList)

Adds a decision list to the underlying set-like data structure.

Complexity \(\Theta(N)\)

Exceptions
std::logic_errorIf the result of bondList() is empty, i.e. there are no bonds to consider for directed conformer generation.
Returns
true if decisionList wasn't already part of the set

The documentation for this class was generated from the following file: