kinetica-jl / kinetica.jl Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 8.63 MB

Automated chemical reaction networking with long-timescale kinetic simulations in Julia

Home Page: https://kinetica-jl.github.io/Kinetica.jl/

License: Other

Julia 100.00%

chemical-kinetics chemical-reactions julia-package kinetic-modeling reaction-network

kinetica.jl's Issues

Inert species incorrectly handled during restarts

When restarting an IterativeExplore and loading in the current state of a network through import_network(), imported networks lack knowledge of inert species, despite networks being initially constructed with these species in mind. This is because 'raw' networks (the underlying directory tree being explored) are never made aware of any inert species, so they don't get added unless a calculator modifies the network down the line.

This leads to inconsistencies and crashes when handling solution objects based off of networks without initially setup inert species, as inert species that are added by calculators are added to the end of the active SpeciesData, while earlier solutions will have them placed just after the initial reactants.

Could handle this with a few different methods:

Create an inert.in file in rdir_head on network initialisation that is then read in and always placed first during import_network()
Pass inert species into import_network() as an optional argument, inserting them in before importing the rest of the network.

Incorrect docstring in `system_from_smiles`

Kinetica.jl/src/exploration/molecule_system.jl

Lines 249 to 263 in 2ebced5

 """ 

  system_from_smiles(smiles::String[, dmin::Float64=5.0, maxiters::Int=200]) 

  system_from_smiles(smiles::String[, saveto::String, dmin::Float64=5.0, maxiters::Int=200]) 

 Forms a single XYZ system out of the molecules in `smiles`. 

 Useful for making unified molecular systems with no overlap 

 for feeding into CDE. `smiles` should be a single `String` 

 with individual species separated by '.'. `dmin` represents 

 the minimum molecule-molecule distance that should be allowed. 

 If the argument `saveto` is provided, outputs the optimised 

 system to a file at this path. If not, returns the optimised 

 system as a single ExtXYZ dict. 

 """

system_from_smiles incorrectly states that it takes a String input, leading users to believe that this can be a single SMILES with multiple species. This method actually requires a vector of individual SMILES strings.

Docstring needs to be updated, but could consider making this also work with single SMILES strings as well.

Per-reaction atom mapping

When constructing reactant/product systems for input into TS-finding algorithms like NEB, atom indices are required to be consistent between all provided geometries. This requires that the endpoint XYZs are consistently atom mapped.

Currently, when creating an XYZ molecule system, atom indices will always be determined by the order that the component molecules are given in - atoms in molecule 2 are concatenated in after those of molecule 1, etc. Since reactant and product systems are not guaranteed to have the same atoms in each molecule, this can lead to atom mapping consistency being broken.

To resolve this, we need to introduce a procedure for atom mapping that can be used to reorder the atoms in reactant/product systems to ensure consistency. Atom maps can be constructed purely from SMILES using an approach like RXNMapper, but this relies on a predictive approach that is not always guaranteed to be accurate. Instead, we could construct atom maps directly from sampled reactions when they are read in, since CDE ensures correct atom mapping and this is only broken when separating molecules in CRN ingest.

The RxData.reacs and RxData.prods fields have been redundant for a while now, as RxData always requires a SpeciesData to function when species properties are required anyway. These could be good places to put the new atom-mapped reactants and products respectively. This should be a minimal overhaul, but requires careful handling in CRN I/O.

Loading partial networks from checkpoints

Currently, networks explored via IterativeExplore that are terminated early (due to an error, exceeding walltime, etc.) can be restarted directly from the contents of their rdir_head. However, this is only useful as long as rdir_head is always available.

If running on distributed resources like HPC, network exploration should be performed within a scratch space to allow for the currently heavy IO requirements of CDE runs. However, these scratch spaces are usually semi-volatile and in many cases cease to exist once a job is finished. This wipes the entire rdir_head directory tree, preventing restarts.

While rdir_head could be periodically backed up to non-volatile storage, this would be incredibly expensive and would nullify many of the benefits of performing exploration on a scratch space. Instead, we could use the already implemented incomplete network saves (which can be saved into a non-scratch directory) as checkpoints and allow for partial (or full) network restoration from them when rdir_head is not present (e.g. when it has been wiped by end of job). This would work as follows:

Check if rdir_head exists. If it does, the network within may either be full (present in the directory tree from the initial level) or partial (present in the directory tree only from a certain point, as it has been loaded from a checkpoint before).
If not, check if checkpoints exist. If they do, read in the latest checkpoint, establish next level seeds and create a new partial directory tree starting from this level.
If not, start a new exploration from scratch.

In step 1, when there is a full network it can be directly loaded. However, when there is only a partial network, a checkpoint file corresponding to the exploration progress made from the level(s) before those that exist in rdir_head MUST be available for exploration to continue without error.

Reaction hashing not always working

Calling KPM calculator can result in the following:

ERROR: DimensionMismatch: arrays could not be broadcast to a common size; got a dimension with lengths 9724 and 9722

where the first length is rd.nr and the second in calc.Ea. This is due to KPM identifying 2 duplicate reactions and removing them (N.B. needs an assertion in KineticaKPM for this). These reactions should have been previously identified by rhash in push!(::RxData...; unique_rxns=true) but are not, so some duplicate reactions must be generating unique hashes somehow. As this applies on fresh RxDatas reconstructed from the same reaction tree, whatever input reactions must be different enough to always randomly generate different hashes. Need to check logic for rhash construction.

kinetica-jl / kinetica.jl Goto Github PK

kinetica.jl's Issues

Inert species incorrectly handled during restarts

Incorrect docstring in `system_from_smiles`

Per-reaction atom mapping

Loading partial networks from checkpoints

Reaction hashing not always working

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	"""
	system_from_smiles(smiles::String[, dmin::Float64=5.0, maxiters::Int=200])
	system_from_smiles(smiles::String[, saveto::String, dmin::Float64=5.0, maxiters::Int=200])

	Forms a single XYZ system out of the molecules in `smiles`.

	Useful for making unified molecular systems with no overlap
	for feeding into CDE. `smiles` should be a single `String`
	with individual species separated by '.'. `dmin` represents
	the minimum molecule-molecule distance that should be allowed.

	If the argument `saveto` is provided, outputs the optimised
	system to a file at this path. If not, returns the optimised
	system as a single ExtXYZ dict.
	"""