ezpzbz / aiida-orca Goto Github PK

View Code? Open in Web Editor NEW

7.0 2.0 4.0 2.12 MB

AiiDA Plugin for ORCA

License: MIT License

Python 100.00%

computational-chemistry abinitio-simulations dft orca-quantum-chemistry

aiida-orca's Introduction

aiida-orca

AiiDA plugin for orca package

DISCLAIMER: Under heavy development!

Compatible with:

Installation

The latest release can be installed from PyPI

pip install aiida-orca

The current development version can be installed via

git clone https://github.com/pzarabadip/aiida-orca.git
cd aiida-orca
pip install .

aiida-common-workflows

The aiida-orca package is available in the aiida-common-workflow package. You may try it to have a quick setup and exploration of aiida-orca and many more packages. For further details, please check our paper on aiida-common-worlflows.

Contribution guide

We welcome contribution to the code either it is a new feature implementation or bug fix. Please check the Developer Guide in documentation for the instructions.

Issue reporting

Please feel free to open an issue to report bugs or requesting new features.

Acknowledgment

I would like to thank the funding received from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions and cofinancing by the South Moravian Region under agreement 665860. This software reflects only the authors’ view and the EU is not responsible for any use that may be made of the information it contains.

aiida-orca's People

Contributors

Stargazers

Watchers

Forkers

ltalirz danielhollas sphuber cote3804

aiida-orca's Issues

Figure out if `extra_input_keywords` serve any purpose

Convert GBW to WFN

In order to use the results to perform QTAIM analysis, we need to convert gbw to wfn or wfx file.
There are two possibilities here:

Doing it at plugin level: it means that we can define a new calculation class which takes the gbw and applies the conversion.
Doing it at workchain level: we can define the simple calculation as a calcfuntion.

ASA calculation

It is an extra orca calculation which enables simulating emission spectra.
In order to to implement this calculation, I need to first fix #3
From there, either I need to write a new calculation class which subclass the OrcaCalculation or implement it there.

[Feature] Restart and retrieve Hessian file

I need to add the possibility of having parent_calc_folder and using gbw or hess files for restarting calculations.
In the case of Hessian file, it needs to be added as input/output too as SinglefileData.

Support ORCA 5.0

Hi @pzarabadip,

I am currently attending the aiida tutorial, and was thinking that as a final project I could work on updating this plugin to support ORCA 5.0, which was just released last week. Let me know what you think. Thanks!

Implementing pytest and activation of GitHub Actions

I need to think about how to implement it.
The main issue is using orca executables. It is a free code but not public and worse than that it is 3.5GB. One option would be having it in a private docker image.
The other option would be using aiida-testing and mock codes.
The other things which I need to address in this issue is adding the codecov and automatic deployment to pypi.

Increase test coverage

The current 55% is very nice, but perhaps we should aim higher before the 1.0 release.

Thanks @pzarabadip for enabling Codecov! 👍

OrcaBaseParser should handle truncated ORCA output gracefully

When a user provides a bad input parameter for ORCA, OrcaBaseParser throws this unhelpful exception (and excepts the CalcJob)

 | [952|OrcaCalculation|on_except]: Traceback (most recent call last):
 |   File "/opt/conda/lib/python3.8/site-packages/plumpy/process_states.py", line 231, in execute
 |     result = self.run_fn(*self.args, **self.kwargs)
 |   File "/opt/conda/lib/python3.8/site-packages/aiida/engine/processes/calcjobs/calcjob.py", line 388, in parse
 |     exit_code_retrieved = self.parse_retrieved_output(retrieved_temporary_folder)
 |   File "/opt/conda/lib/python3.8/site-packages/aiida/engine/processes/calcjobs/calcjob.py", line 468, in parse_retrieved_output
 |     exit_code = parser.parse(**parse_kwargs)
 |   File "/home/aiida/plugins/aiida-orca/aiida_orca/parsers/__init__.py", line 72, in parse
 |     keywords = output_dict['metadata']['keywords']
 | KeyError: 'keywords'

Instead, we should catch this case and return some non_zero exit code, which can then be acted upon in workflows.

I'll submit a PR once I learn more about parsers and their exit codes. Basically just need to add some error handling here:

https://github.com/pzarabadip/aiida-orca/blob/5b2cba2b518837c35179b52ac1141eda27609f4b/aiida_orca/parsers/__init__.py#L72

It would also be nice to pass the ORCA error to the process report, if that is possible.

Here's the example ORCA output (without headers) where this happens (note that cclib does not throw any errors, which is the problem).

 Your ORCA version has been built with support for libXC version: 5.1.0
 For citations please refer to: https://tddft.org/programs/libxc/

 This ORCA versions uses:
   CBLAS   interface :  Fast vector & matrix operations
   LAPACKE interface :  Fast linear algebra routines
   SCALAPACK package :  Parallel linear algebra routines
   Shared memory     :  Shared parallel matrices
   BLAS/LAPACK       :  OpenBLAS 0.3.15  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell SINGLE_THREADED
        Core in use  :  Haswell
   Copyright (c) 2011-2014, The OpenBLAS Project

            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                                 INPUT ERROR
            UNRECOGNIZED OR DUPLICATED KEYWORD(S) IN SIMPLE INPUT LINE
              SVWN         
            !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[file orca_main/maininp4.cpp, line 11063]: 

[file orca_main/maininp4.cpp, line 11063]:

PS: I am now playing with aiida and aiida-orca and it generally works great! Thanks for writing this plugin @pzarabadip! ❤️
PPS: I am now using a dev version of cclib (from their master) so that I can parse ORCA-5.0 outputs.

`verdi calcjob inputcat` does not work

verdi calcjob inputcat <pk> should print the orca input file, but I currently get:

Critical: "CalcJobNode" and its process class "OrcaCalculation" do not define a default input file (option "input_filename" not found).
Please specify a path explicitly.

Same is true for verdi calcjob outputcat, which should print the output file I think.

I'll try to fix that, but need to dig into to documentation a bit. Hopefully it's a straightforward change which is backwards compatible.

Automatically serialize inputs with `to_aiida_type`

Described in docs here: https://aiida.readthedocs.io/projects/aiida-core/en/latest/topics/processes/usage.html?highlight=to_aiida_type#automatic-input-serialization

Would for example allow to pass input parameters as a simple Python dict instead of having it manually convert to AiiDA Dict node.

[GitHub Actions] Parallel running of examples

Currently, examples fail if one sets the running mode to parallel even OpenMPI is installed and working.
My guess is that the the mpirun should be declared in the prepend_text.

Calculation type identification during parsing

Currently, I only look for Opt as an example to parse the optimization job data. However, these keywords are not case sensitive in Orca and user may provide for example opt.
There are two solutions:

Using regular expressions to identify correct ones.
Converting input keyword strings to lower case while generating the input.
So far, I think we need both implementations as there are keywords like COPT too for geometry optimization and not seeing it now may result in future issues.

Input keywords

More experimenting with different types of applications show that we do not need to provide the input_keywords as dictionary. It can be provided as a list.
I will change the input_generator after a bit of more experimenting.

Improve GitHub Actions workflow

After resolving the #13
* It needs to be modified to address the final implementation

I need to add automatic deployment to pypi
the aiida-core separate installation can be removed with next release of aiida-core. I needed to do it like now as the part with aiida_local_code_factory is still not in the official release.

Adding examples exploiting different types of basis sets

Few examples need to be added to cover the implementation and usage of different basis sets including ones come with ECPs.

Change exit codes to conform to a convention

Current exit codes defined in OrcaCalculation start at 100, but that range overlaps with exit codes from scheduler plugins. See these guidelines

https://aiida.readthedocs.io/projects/aiida-core/en/latest/topics/processes/usage.html?highlight=exit%20codes#exit-code-conventions

Unfortunately, this is a breaking change so perhaps it should be done when 1.0 is released.

Support AiiDA 2.0

AiiDA 2.0 has been released couple months ago.

https://www.aiida.net/news/aiida-v2-0-0-released/

I haven't tested it yet with this plugin, but I plan to do it once they update the AiiDAlab image.

Here is the plugin migration guide.

https://github.com/aiidateam/aiida-core/wiki/AiiDA-2.0-plugin-migration-guide

Input parameter validation

We need to add validation of input parameters. It can be implemented based on discussion in AiiDA Hackathon by defining a new Data class.
This will be done once the calculation and parser are well tested.
[UPDATE] Currently, I am working on having this validation on functional and basis set by:

Having separate inputs for them instead passing them in parameters dictionary
I need a list of supported functionals by ORCA as well as its Libxc interface.
User provided functional would be checked against valid values and gets verified.
In the case of basis sets, same story applies for the ORCA internal basis sets.
In the case of basis sets, we can have a degree of automation to set proper flags if user requests RI, RICOSX, and RIJK approximations.

Adding TDDFT example

In order to cover the #8, it would be necessary to have few examples of TDDFT.

Handle known errors in OrcaBaseWorkchain

Just discovered that OrcaBaseWorkchain is trying to restart the OrcaCalculation even if it failed with one of the custom exit code (which are returned from the parser).

It looks like we need to add an explicit process handler for these, such as done in aiida-quantumespresso plugin

https://github.com/aiidateam/aiida-quantumespresso/blob/17d4bd62937c74cb88b9dfdbcc71d71e24783bbf/src/aiida_quantumespresso/workflows/pw/base.py#L508

I'll work on this soon.

🧪 Update readme

There has been a quite recent changes/improvements in the plugin. We need to update README to reflect such additions inclduing:

AiiDA v2.0 support
ORCA v5.0 support
Change coverage to the default branch (develop)
Remove heavy development disclaimer statement.
Update Acknowledgment
Add link to aiida-common-workflows paper as a use-case of the plugin
Update author lists

Anything else missing @danielhollas ?

Determine optimization run from CClib

The current approach of to determine whether structure optimization has been run is most probably not robust:

https://github.com/pzarabadip/aiida-orca/blob/5b2cba2b518837c35179b52ac1141eda27609f4b/aiida_orca/parsers/__init__.py#L79

We're trying to detect the Opt keyword, but optimization can be triggered by other means as well. Instead we could try to use a CClib field output_dict['optdone']. I need to test that first though.

Units of excited state energies

Just tested a TDDFT calculation with ORCA 4.2.0 The excited state energies and oscillator strengths are present in the output dict, but the excited state energies are in rather arbitrary units of cm^-1. I wonder if we shouldn't change the units to something saner, either eV or a.u. to make it consistent with the SCF energy. Also of note that these are energies relative to the ground state (which is fine). I guess this might be a bigger discussion about units...

btw: This package might be interesting for manipulating units.

Optimized structure retrieve

After completion of a geometry optimization, ORCA itself outputs two xyz file.

Relaxed geometry as base.xyz, herein, would be aiida.xyz
Trajectory which would be as aiida_traj.xyz.

Therefore, we can improve current reporting the optimized geometry by directly retrieving the ORCA generated one. We also can retrieve the trajectory file for possible visualization.

Data class for spectra

When we are dealing with frequency, uv-vis, and emission spectra calculations, we generate files with arrays of numeric data which can later be used for visualization.
I need to check ArrayData, BandsData, and TrajectoryData from aiida.orm to learn more about the structure of these Data types and possibly write a new one with the export possibility for these types of data that we are dealing with in orca and also gaussian. It can be called SpectraData, for instance.
Useful starting points:

https://aiida.readthedocs.io/projects/aiida-core/en/latest/working/data.html#working-data-creating-new-types

Support global %maxcore option in the input file generation

It looks like the ORCA input file format is not really consistent with itself. Usually, input blocks start with %block_name and end with end, but there are exceptions to this. One of them them is the global memory setting, which needs to be on one line like this.

%MAXCORE   4000

It seems that the current plugin does not support this format. We can either add a special input node (of type Int) that would set the global memory and then special case it in the input generator, or we can make a general way to support this special syntax.

Implementing workchains

Currently, only simple calculations are added to the plugin. As ORCA calculations are complex and in real-world usage one would combine several steps of optimization, frequency calculation and so on, I should implement few workchains to cover basic protocols of calculations.
[UPDATE]
I am working on having following workchains for the next release:

OrcaRelaxationWorkChain: It should take structure and optimizes the geometry with basic error-handling. On demand, it should be able to run consequent frequency calculation to verify if the relaxed structure is local minima or transition state.
OrcaEmissionSpectraWorkChain: It shoudl take structure, relaxe it, and perform ASA calculation to give us the emission spectra.

cclib parser fails for unrestricted optimization/frequencies ORCA output

Running optimizationwith unrestricted method with ORCA 5.0.3 for charged methane molecule (dublet) trips the cclib parser. Here's the exception

Here's the input file

! STO-3G PBE OPT MINIPRINT

* xyz 1 2
C        5.64548550       5.80995257       5.64347063
H        6.68786928       5.48595277       5.60659569
H        5.00000000       5.00000000       5.29673621
H        5.38164039       6.07147067       6.67055068
H        5.51243226       6.68238700       5.00000000
*

 |   File "/home/jovyan/aiida-orca/aiida_orca/parsers/__init__.py", line 47, in parse
 |     parsed_obj = ccread(handle)
 |   File "/home/jovyan/aiida-orca/aiida_orca/parsers/cclib/ccio.py", line 27, in ccread
 |     return log.parse()
 |   File "/home/jovyan/aiida-orca/aiida_orca/parsers/cclib/logfileparser.py", line 261, in parse
 |     self.extract(inputfile, line)
 |   File "/home/jovyan/aiida-orca/aiida_orca/parsers/cclib/orcaparser.py", line 425, in extract
 |     self._append_scfvalues_scftargets(inputfile, line)
 |   File "/home/jovyan/aiida-orca/aiida_orca/parsers/cclib/orcaparser.py", line 2219, in _append_scfvalues_scftargets
 |     rmsDP_target = self.scftargets[-1][2]
 | IndexError: list index out of range

I'll submit a PR, probably tomorrow.

Full TDDFT without TDA fails due to parser error

TDDFT calculations without TDA currently fail in the output parser because of NaNs in the etsecs field that are coming from cclib parser.

https://github.com/pzarabadip/aiida-orca/blob/4c8c962f789753e1879e2b0406252291147b07f8/aiida_orca/parsers/cclib/orcaparser.py#L1161

The aiida parser already tries to handle NaNs coming from CClib, but only in the numpy arrays, wheras etsecs is a plain list (or rather nested lists).

https://github.com/pzarabadip/aiida-orca/blob/4c8c962f789753e1879e2b0406252291147b07f8/aiida_orca/parsers/__init__.py#L65

The np.nan_to_num() function actually works for nested lists, but converts them to numpy arrays, and converts integers to floats so we can't use it in this case.

Do not always fetch gbw files to local folder

.gbw files store the molecular wavefunction and are typically quite big.

Currently we always fetch them from the remote_folder to the retrieved folder, which is not ideal because the user cannot get rid of them without deleting the whole workflow, due to AiiDAs strict provenance policy. Thus, users need to be able to specify whether this file should be stored before the calculation/workflow is submitted.

There are two design questions here:

What should be the default behaviour
How can the user change the default.

I think that actually changing the default, and fetch this file only when requested, would be an okay thing to do. Typically, users know if they need the MO files before hand, and even if they don't, they can always fetch it aposteriori from the remote folder, where the file is stored until it is cleaned. If that is the case, the users could opt in to this by specifying the the aiida.gbw file in the existing inputs.settings.additional_retrieve_list. This approach would require the minimum changes on the side of the plugin, but the big downside is that this is a breaking change, and would need to be thoroughly documented. The upside is that keeping the default, the users are never able to get rid of these files once they are fetched, as explained above.

If we want to keep the default behaviour, we need a new input. This could be a new input node, of type Bool. Alternatively, we could add a new key to the existing Dict inputs, either inputs.parameters or inputs.settings (the latter seems more appropriate to me).

@pzarabadip do you have any thoughts on this? Whatever we decide, I am happy to implement this because I for sure need this for my app. Thanks!

aiida-orca version not updated in aiida plugin registry

I noticed that the aiida plugin registry still points to version 0.5.1 as the latest version.

https://aiidateam.github.io/aiida-registry/plugins/aiida-orca.html

I suspect that it is because the registry is pointing to the master branch that hasn't been updated. I guess we can either:

merge develop to master branch
change the branch in the aiida plugin registry.

Parser fails for output from unrestricted EOM-CCSD

Input file

|  1> ### Generated by AiiDA-ORCA Plugin ###
|  2> ! EOM-CCSD def2-SVP
|  3> %scf 
|  4> 	ConvForced true
|  5> 	convergence tight
|  6> end
|  7> 
|  8> %mdci 
|  9> 	doTDM true
| 10> 	doLeft true
| 11> 	nroots 1
| 12> 	maxcore 3000
| 13> end
| 14> 
| 15> * xyzfile 0 2 aiida.coords.xyz

This is likely because unrestricted EOM-CCSD does not yet implement transition dipole moments. This also makes it less useful so I do not consider this bug particularly pressing right now.

Support for Orca 5

Currently the README states that Orca 5 is not supported. Is there any change from Orca 4 to 5 which is blocking or has the combination not been tested so far?

As far as I'm aware, there were changes in the defaults for most calculations as well as adding support for a full scripting language in the input files.

relaxed_structure StructureData should preserve atom ordering

It seems that the output node relaxed_structure of optimization runs has a different atom order that the original input structure.
I need to do a bit more testing, but probably this line from the parser needs tweaking.

https://github.com/pzarabadip/aiida-orca/blob/5b2cba2b518837c35179b52ac1141eda27609f4b/aiida_orca/parsers/__init__.py#L86

Improving tests

Just sharing my quick observations when running the tests locally. Happy to submit a PR for these later.

Pass pytest.opt_calc_pk from example_0.py to other tests via pytest cache. This is nicer that attaching to global object and also should allow to run the other tests independently as long as the first test ran at least once.
Make number of processors configurable, via a cmdline argument if possible. I don't have mpirun in my dev environment so the tests were failing for me unless I manually modified the orca input dicts in all example_?.py files. How to pass arguments via pytest cmdline: https://stackoverflow.com/questions/40880259/how-to-pass-arguments-in-pytest-by-command-line

Switch setup.json to setup.cfg

We could look at the current output of aiida plugin cookiecutter for guidance.

@pzarabadip happy to take this on in case it would help.

Documentation

I should start adding documentation gradually and make it publication ready in v1.0.0.
[UPDATE]
I started perparing the documentation. The following needs to be checked before merging to master and release:

Complete section for installation of plugin and ORCA itself
COmplete section for setting up the code and notes related to it
Example/Tutorial sections from setting up simple calculations to using workchains
Developer guide
A section on capabilities and limitations of the plugin