usnavalresearchlaboratory / nepc Goto Github PK

View Code? Open in Web Editor NEW

13.0 8.0 9.0 160.58 MB

NRL Evaluated Plasma Chemistry

License: Creative Commons Zero v1.0 Universal

Python 30.18% Jupyter Notebook 69.82%

plasma-chemistry plasma-physics plasma

nepc's People

Stargazers

Watchers

Forkers

garethtmorgan billymitchell97 littlewatkins ndisner tech-xcorp broden-wilhelmsen padamson gracetangg cepelli

nepc's Issues

replace EDA Jupyter notebook with interactive python script

Replace the EDA Jupyter notebook with a plain python script with interactive code blocks.

cross sections should be labeled by reaction

Right now the nepc database needs human interpretation of the reactions associated with the cross-section. A label should be attached to the metadata that identifies the reaction associated with the cross-section. For example, a cross-section for the excitation of N2 ground state into the triplet A3 could have the reaction "e+N2(X1Sigma) -> e + N2(A3Sigma)". This reaction tag would uniquely and clearly identify the reaction for this cross-section.

CONTRIBUTING.md references old info on the repo and conda environment

Tasks

reference repo location on GitHub
reference environment-dev.yml
update conda environments to use conda-forge

Should processes, states, and species be defined in nepc or nepc_cs?

Currently, nepc and nepc_cs (repos with production cross section data) can have different process.tsv, states.tsv, and species.tsv files. This may create confusion down the road. Perhaps the standards for these metadata should be in version control in nepc with the option to override them in nepc_cs.

Move CustomCS functionality to CS

There is no need for two separate classes (@arichar6 tried to tell me this months ago). A CS object can be created by one of three options:

calling the database and getting a "pre-made"
"from scratch" providing all data and metadata
providing data and/or metadata to augment a pre-made CS in the database

Change master to main branch

Issue

We should switch 'master' to 'main' to update to the preferred Github standard.

Make main branch
Remove master branch
Change documentation to reflect this

Switch from TravisCI to GitHub Actions

It seems we could do our CI pipeline using GitHub Actions instead of TravisCI. At first glance, it looks like it would require using several 3rd party apps in the Marketplace, but maybe this is ok.

Implement len() method for CS and Model classes

Implement len() task of #7

use poetry to manage package

Package management would be simpler and more streamlined with poetry.

Getting warnings from parser tests

parser.py uses is when it should use == when checking metadata.

Add a link from the ReadTheDocs page back to the code on GitHub

If such a link is there, I don't see it.

Implement hashes for CS and Model types

Draft requirements:

each CS and Model dataset shall have a unique hash that doesn't change; this may require a blacklist of attributes that cannot be modified (everything but background and ref?)
hash is stored in curation script and used to verify data integrity
user can access CS or Model with hash alone
hashes allow for updates to CS and Model datasets while keeping legacy datasets for reproducibility
hashes allow for changes to non-blacklisted metadata without affecting reproducibility of data required for plasma simulations

How do we properly implement references?

We have the ref string in the cs metadata table, but it doesn't link to any reference data yet...it's only a placeholder. Two possible options to implement references for datasets:

a table that contains the bibtex info for each of the references
the ref metadata is a list of DOIs to references. This would require a DOI exist when adding data to NEPC. We could throw a warning if it doesn't to annoy users to add one.

move MYSQL.md to docs (and readthedocs)

All links on README should point to external sites, not markdown files in the repo. This makes it easier to maintain the README and documentation including the README on PyPI.

Add API methods to get details of existing models or subsets of the database including species, processes, and background information

There should be a standard way to get a list of the models that exist in the database. Something like nepc.get_models(cursor), which would do something like this:

def get_models(cursor):
    cursor.execute("SELECT name FROM models")
    models = cursor.fetchall()
    return [m[0] for m in models]

Interface with bolos to generate reaction rates and transport parameters for a Model

bolos can be used to obtain reaction rates and transport parameters for a given set of cross sections at a specified reduced electric field and gas temperature.

The proposal is for a new feature that provides reaction rates and transport parameters for a nepc Model at a given reduced electric field and gas temperature:

cnx, cursor = nepc.connect()
n_phelps = nepc.Model(cursor, "phelps")
n_phelps.rates(En=120 Te=300)

I think there are two possible approaches:

create a method in parser to write a Model as an LXCat formatted file. Model.rates writes the LXCat formatted file using this parser, then uses the approach in bolos sample file single.py to compute the rates.
use the approach in single.py, but bypass writing the LXCat formatted file. This will require making an interface in nepc to bolos.solver.BoltzmannSolver..

Update documentation and tests for curation templates

contributing.md
docstrings in nepc.curate, including module docstring
run curation script in tests?

QDB curation can handle only one cross section in a qml file

Need to generalize the QDB curation template to handle multiple cross sections in a qml file.

refactor reaction_latex_l(r)hs

nepc.reaction_latex_lhs() and nepc.reaction_latex_rhs() are essentially identical. Refactor into just one method that takes a side parameter to determine which side of the equation to return.

replace nepc conda environment with nepc-dev environment; update CONTRIBUTING to reference dev env

Some processes are labeled as "excitation_total" and some are just labeled as "excitation"

Some excitation processes in the metadata are labeled as "excitation_total" while some are just labeled as "excitation". For example, the process for the reaction e+N2(X1Sigma)->e+N2(A3Sigma_v0-v4) is labeled as "excitation_total" while the process for the reaction for e+N2(X1Sigma)->e+N2(B3Pi) is just labeled as "excitation". By total, I think that you mean that the cross-sections are summed over vibrational modes. If this is the case, then the e+N2(X1Sigma)->e+N2(B3Pi) should be an "excitation_total" too since this cross-section is not vibrationally resolved.

build database for tests

In order for many of the tests to pass, the build script on TravisCI must be able to reach a NEPC database.

Investigate the following: add to the build script the startup of a MySQL server and build the test nepc database.
Set the $NEPC_HOME variable for the tests.

Implement repr methods for CS and Model classes

Addresses __repr__ tasks for #7

For CS, return "CS(cursor, cs_id)".

For Model, return "Model(cursor, model_name)".

add curation template for generated data

define format for generated raw cross section data and metadata
define format for verification data and metadata (same as raw data?)
create template
include verify step in template
include tests
include documentation

docs failing

It appears RTD changed it's default build to require pip 20.2 and its "robust beta" resolver --use-feature=2020-resolver.

Need to update the version of pip in the base nepc environment.yml file to 20.2.

MySQL Connector doesn't work in WSL with Python 3.8

Need to specify python==3.7 in nepc-dev conda environment. Should also probably add some additional hints on how to get MySQL working in WSL and nepc and nepc-test databases installed.

Add tests for CS class metadata

There are FIXME's in test_nepc.py...this issue will remove the FIXME for testing metadata types.

Move `reaction_latex` function to the CS class

nepc/nepc/nepc.py

Line 922 in 9f789d2

# FIXME: move this method to the CS Class

get_filelist should ask to add all files to queue, not one at a time

When called from a curation script, there's no reason that the get_filelist function should ask about each individual file. It should ask whether to add all of the files in a directory to the queue.

Update documentation for nepc and MySQL installation and configuration

missing semicolon in nepc_user_script.sql on line 12
add reminder that mysql server must be running to execute npc_user_script.sql
add reminder to set appropriate permissions on $HOME/.mysql/defaults
update readthedocs and README with simple "getting started", including setting environment variables and setting up a nepc database
clarify the framework that nepc provides...curation templates, data/metadata structure, data building script, API

Gaussian convolved cross sections

Some simulations require high fidelity cross section data (e.g. with detailed resonances), and some applications need cross section data with less detail but reduced in some systematic and reproducible way. One way to make the data more compact is to convolve with a Gaussian.

Fix docs for nepc_cs

There are some old references to NEPC_DATA_HOME, and some of the documentation for building a production database could be improved.

metadata for LXCat download date

Perhaps tool to diff on current and past data within LXCat

Add template for curating QuantemolDB data

I am not understanding what the specie metadata is

I am not sure what information the specie metadata is trying to convey. I think that this is likely not useful information and can be removed. If the specie metadata is kept, then at least it should be spelled correctly. Species is its own singular and plural.

Summary functions for CS and Model classes

A list of reactants and products for each reaction would be more useful than the LHS_A, LHS_B, RHS_A, RHS_B notation that is currently used in nepc. The LHS/RHS notation is limiting since it assumes only two reactions of the form A+B->A'+B'.

For example, this notation doesn't cover simple ionization where there are two reactants but three products (2 electrons and an ion). An ionization event e+N2(X1Sigma) -> 2e+N2(X2Sigma) could have ["e", "N2(X1Sigma)"] as reactants and ["e", "e", "N2+(X2Sigma)"] as products and these could be stored as metadata.

The reactant and product lists are most useful when updating rate equations. Each reactant species will have a negative contribution as they are used up in the reaction and each product will have a positive contribution as they are created.

Better manage environment variables and configuration

nepc requires a few environment variables to access data. Consider using decouple instead. It seems to be more elegant than reading environment variables.

Implement str() methods for CS and Model classes

Partially addresses #7

Proposal for CS.str():
Provide list of certain metadata - specie, process, threshold, ref (if set), text-formatted reaction, background

Example for cs_id=1 of test data:

specie: N2
process: excitation
reaction: e- + N2(X1Sigmag+) -> N2(X1Sigmag+)_jSCHULZ + e-
threshold: 0.02 eV
ref: N/A
background: This contribution to rotational excitation (resonance - USING SUM OF SCHULZ VIBRATION IN A SINGLE-LEVEL APPROXIMATION) is not part of the complete set.  Use of this cross section in addition to the CAR approximation could be used in place of the single level approximation for rotation which is presently part of the complete set.

Energy range of validity and extrapolation/interpolation approaches

Cross section data is only valid over a particular energy range, and extrapolations beyond the ranges of validity must be done with care. Also, there are valid ways to interpolate.

Some things we might add to the schema:

range of valid energy (e.g. velow, vehigh?) - or is this simply the range of energies included in the data?
extrapolation - methods that can be used to extrapolate
interpolation - methods that can be used to interpolate

Then the appropriate methods would need to be added for extrapolating and interpolating the data.

Implement appropriate special methods for CS and Model classes

There are several special methods that would be useful to have implemented for the CS and Model classes. We need to evaluate all that are available and implement the appropriate ones. For sure len(), __repr__, slice(), __iter__.

Split reaction_latex into two functions, one for LHS and one for RHS

nepc/nepc/nepc.py

Line 941 in 9f789d2

lhsA_text = cs.metadata['lhsA_long']

Seems like there might be cases where it's useful to have just the LHS (or RHS) of the reaction latex. Seems like it should be simple enough to refactor this function into two separate functions.

Thoughts?

Remove hard-wired DIR_NAMES from mysql/build.py

Right now, DIR_NAMES is set to a hard-wired list of directories in mysql/build.py. We should probably build this list by searching for directories in data/cs that contain valid nepc .dat, .met, and .mod files.

Use template method design pattern for importing cross sections and creating models

Right now, .dat, .met, and .mod files are created in unstructured Jupyter Notebooks in methods or process folders. We need a set of structured templates for importing and curating cross sections of various types (e.g. LxCAT). The template method design pattern looks appropriatre.

Add CURATING.md

Create Documentation
We need a guide to curating data aligned with our new curation templates. This guide will be referenced in an updated template for adding curated data to nepc_cs.

Implement templates for issues and PRs

Implementing templates for issues and PRs would help with workflow.

add --test option to setup.py

Right now, we have to comment/uncomment the lines that specify whether the package is to be uploaded to the real PyPI or the test PyPI server. It would be nice to have a flag instead. Also, it would be nice if a random string were added to version so that we wouldn't have to edit the nepc/__version__.py file every time we wanted to try a new version of the package on the test server.