dgbowl / yadg Goto Github PK

View Code? Open in Web Editor NEW

33.0 33.0 12.0 69.87 MB

yadg: yet another datagram

Home Page: https://dgbowl.github.io/yadg

License: GNU General Public License v3.0

Python 56.75% Rich Text Format 43.25%

batteries catalysis chromatography data-science electrochemistry parser

yadg's People

Contributors

Stargazers

Watchers

Forkers

peterkraus vetschn cambiegroup galessiorob jac28 ileu nukp ajay-bhargava chris-price19 generomuga empaeconversion arndkoeppe

yadg's Issues

Agilent LC timestamps are for sample analysis, not sample creation.

The timestamps contained in Agilent's .dx and .ch files correspond to sample analysis time, not to sample creation time. While for GC this is fine, for LC these times may differ significantly with a variable offset.
One possible solution was supplying an Agilent log file via the externaldate interface. However, there are several issues with the Agilent LC logfile:

the logfile doesn't seem to update automatically, so the whole functionality is questionable
the user would supply a folder full of .dx or .ch files, with one timestep in each; the logfile contains data for multiple timesteps which have to be matched to the .dx and .ch files somehow.
the logfile contains any pre-run and post-run injections; those are usually not supplied using .dx or .ch files, as they use a different method, which means the number of timesteps in the logfile may be different than the number of supplied data files

I'm in contact with Agilent trying to resolve this issue.

General query

I applaud the authors for developing a software package with a focus on the FAIR guiding principles. Keep in mind I have no access to any of the laboratory equipment/ data generators mentioned in the article, which mean my bring me to some of the following query

Is the package developed with a focus on data generated from Chromatography, and Electrochemistry at EMPA.
How easy would it be for an experimentalist/ alternate set up located in another facility to use yadg ?
If this is indeed the case would you be able to link up some user focused examples ?
How does yadg handle the non standardized intermediate step (Figure 1)? Would the use of proprietary file formats pose a challenge for FAIR ?
Kindly provide a brief statement in the article regarding how this software compares to other commonly-used packages? (JOSS requirement)

`electrochem`: Analyzing executables

While doing some quick googling in light of #97 just now, I came across an interesting development over at echemdata/galvani#80 where user @ilka-schulz seems to be disassembling/decompiling the executable itself and finding some interesting data (like an array of all the data column names).

Maybe moving into territory of questionable legality in this case, but an interesting and promising looking approach as far as I'm concerned. May also work for other software and could speed up the process significantly.

Felt like this was worth noting somewhere.

`qftrace`: Rename columns

The current naming schema of Re(Γ) and Im(Γ) for the real and imaginary part of the is inconvenient, as the Γ character is not compatible with ANSI and other encodings, and the columns cannot be accessed via dgpost.

Rework uncertainties for `yadg-5.0`

The addition of explicit uncertainty values bloats the json data significantly. Consider reworking.

Output names with preset -p.

Supplying an [outfile] with yadg preset -p does not work.

Missing Module while parsing mpr file

The attached mpr file contains an addition model called VMP ExtDev.
test.zip

EC-Lab 11.50 files not compatible with yadg

@PeterKraus. I tried to use yadg5 to prase the echem data (GCPL and PEIS data in mpr format) into netcdf file. From what I looked at the documentation of yadg, these two techniques should be supported by yadg. Nevertheless, when I tried to use yadg on the attached files with the attached yaml data.zip I encountered the following error.

Traceback (most recent call last):
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\Scripts\yadg.exe\__main__.py", line 7, in <module>
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\main.py", line 201, in run_with_arguments
    args.func(args)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\subcommands.py", line 152, in preset
    datagram = core.process_schema(ds, strict_merge=not args.ignore_merge_errors)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\core\__init__.py", line 107, in process_schema
    fvals = handler(
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\parsers\electrochem\__init__.py", line 83, in process
    return eclabmpr.process(**kwargs)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\parsers\electrochem\eclabmpr.py", line 582, in process
    settings, params, ds, log, loop = process_modules(mpr)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\parsers\electrochem\eclabmpr.py", line 502, in process_modules
    settings, params = process_settings(module_data)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\parsers\electrochem\eclabmpr.py", line 256, in process_settings
    raise NotImplementedError("Unknown parameter offset or technique dtype.")
NotImplementedError: Unknown parameter offset or technique dtype.

This is quite surprise to me as both technique should be supported by yadg. I also tried to prase only PEIS file as I know I can prase PEIS file for Franesco just fine. But I encountered similar error as shown above despite trying only 1 PEIS (mpr) file. When I swap this PEIS file from the battery cycling experiment with the one from Francesco (catalysis experiment) yadg works just fine and can create the nc file with no problem. I tried to prase GCPL and PEIS from many other battery cycling experiment and encountered the same error. Could you please have a look into this issue? Thank you.

References

Kindly fix the following references in the article

(Drawl, 2020) Draxl, C. (2020, July). FAIRmat. 2. NFDI Conference. (Incomplete)

Versioned documentation

The docs should be built for all versions of the project, perhaps using sphinxcontrib-versioning or similar.

Error: Unsipported locale settings

Hello,

I try doing:

extract(filetype="eclab.mpt", path="myfile.mpt")

and get:
Error: unsupported locale setting

My data file contains data decimal separator as comma. This should be changed to dots.

Thank you in advance :)

`chromtrace`: Valve position is stored per-step as opposed per-timestep

Valve position in the fusion.json parser of chromtrace is stored once in each step as opposed to once in each timestep.

I ranges not understood

In some of Francesco's data, I range values higher than those listed below are found:

yadg/src/yadg/parsers/electrochem/eclabtechniques.py

Line 980 in 3226e87

param_map = {

Wrong Filename extension in extract subcommand?

When invoking the yadg extractor without an explicit outfile name, the default filename extension is .json but the resulting file is an HDF5 file.

I'm assuming this isn't intentional, right?

Move specification of datagram and schema to YAML.

The specifications for datagram and schema should be written in YAML, and validated using strictyaml, to keep consistent with dgpost and reduce the documentation effort. This will be an interface-breaking change, hence it's tagged for version 5.0.

The schema files themselves also should be YAML files.

`flowdata`: Handle multi-day `drycal` measurements

As no date is stored in the drycal return format, in an overnight experiments, the hour counter resets but the date counter does not increase.

Raised Validation Error under 4.2.1

I am trying to implement yadg into our data-workflow. Great project!

Following the provided example from Binder for Biologic files I fail due to various problems under yadg version 4.2.1 dependent on the provided files and schema content.
As schema the structure from Binder (see below) was used. For parsing .mpt-file exported from my EC-Lab the second step was added.
Since the Binder example states "schema_version": "4.0.0" both 4.0.0 and 4.2.1 was tested. Within all tests, the example files gcpl.mpt, ocv.mpt and peis.mpr were parsed.

 {
    "metadata": {
        "provenance": "manual",
        "schema_version": "4.0.0"
    },
    "steps": [
        {
            "parser": "electrochem",
            "import": {
                "files": ["data/ocv.mpt", "data/gcpl.mpt"],
                "encoding": "windows-1252"
            },
            "parameters": {"filetype": "eclab.mpt"}
        },
		{
            "parser": "electrochem",
            "import": {
                "folders": ["data"],
				"suffix" : ".mpt",
                "encoding": "windows-1252"
            },
            "parameters": {"filetype": "eclab.mpt"}
        },
        {
            "parser": "electrochem",
            "import": {
                "folders": ["data"],
                "suffix": ".mpr"
            },
            "parameters": {"filetype": "eclab.mpr"}
        }
    ]
}

To sum up the gathered results:

schema_version	individual `.mpt` Export	Result
4.0.0	x	TypeError: unsupported operand type(s) for -: 'str' and 'str', solved in #88 ?
4.0.0	o	Success
4.2.1	x	126 validation errors for DataSchema, `unexpected value; permitted: 'dummy'`
4.2.1	o	85 validation errors for DataSchema, `unexpected value; permitted: 'dummy'`

The unexpected value; permitted: -message is raised for every parsers.

Since the error is independent of my .mpt file, it seems that the version of the provided schema_version cause some kind of bug.

Installation

The pip install yadg worked flawlessly on my brand new apple silicon Mac book pro.

It would be nice to list the install instruction in the GitHub readme docs.

A side note . The project page url listed in the readme seems to be broken : https://dgbowl.github.io/master/index.html

404 File not found

Potential localization problem with parsing *.mpt files

I recognized that importing my own Biologic exports (default settings) results in errors while parsing the example files from binder works like a charm.
As initally mentioned in #93 the E_ranges will read as str due to a "," instead of a "." as decimal seperator and therefore l.318 in eclabmpt.py will throw an error. I was able to fix this issue locally by the following "quick-and-dirty" solution:

            E_range_max = float(el.get("E_range_max", "inf").replace(',','.')) if isinstance(el.get("E_range_max", "inf"), str) else el.get("E_range_max", "inf")
            E_range_min = float(el.get("E_range_min","-inf").replace(',','.')) if isinstance(el.get("E_range_min", "inf"), str) else el.get("E_range_min", "inf")

I have not fully understood the yadg- workflow, yet, thus this might break something.
Please find attached a sample .mpt file, as well as the corresponding .mps-file. To my best knowledge the "," was not set by purpose anywhere in EC-lab software.
TestScript_YADG_II_01_MB_C01.mpt.txt
TestScript_YADG_II.mps.txt

Integration of chromtrace is unstable

Integration of some peaks using chromtrace is unstable - for unknown reasons, the peak edges are often shifted by several baseline points. Originally, I thought this was caused by the default smoothing parameters of the Savigny-Golay filter. Needs further investigation.

Issue with yadg yaml recipe file

Hi @PeterKraus
As promised, I am now retrying to upgrade the default yadg version for the catalysis lab from yadg4.2 to yadg 5.0.

I first created new conda env and install the latest version of yadg using pip. The current version of the installed library is as follows:
yadg 5.0.2
pydantic 2.6.4
dgbowl-schemas 116
dgpost 2.1.1

I first try runing yadg /dgpost using the example files and command provided by you last summer.
yadg-5.0a5-pipeline.zip. The script works fine. The netcdf was created and dgpost work correctly.

Nevertheless, we nolonger use drycal's software to measure the flow to ease the issue with piston stucking during the measurement. I wrote a script to control the flow meter. We are now using the script exclusively during the measurement. The introduction of multiplex system (running 8 cells at the same time) also requires some pre-processing of the flow data/ pressure data and temperature data before processing using yadg/dgpost.

I have made the script that will pre-process the flow data, pressure data and temperature data before processing using yadg/dgpost. The pre-processed files will be called 'flow_for_yadg.csv', 'pressure_for_yadg.csv' and 'temperature_for_yadg.csv' respectively. You can find the data after pre-processing step
here I have tried to modify yadg yaml recipe file for these files yadg.preset.francesco_v5-EDLC_mod1.yaml.zip but after I tried to run yadg on this pre-process data using this modified yaml, I got the error below: It seems that extractor (which is a new feature introduced in yadg5) is required, but I am not quite sure how this work. I think the issue might stem from how I made the yaml file. Could you please have a look into this. Thank you.

  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\Scripts\yadg.exe\__main__.py", line 7, in <module>
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\main.py", line 201, in run_with_arguments
    args.func(args)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\subcommands.py", line 144, in preset
    schema = to_dataschema(**preset)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\dgbowl_schemas\yadg\__init__.py", line 40, in to_dataschema
    schema = Model(**kwargs)
  File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\pydantic\main.py", line 171, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 56 validation errors for DataSchema
steps.10.Dummy.parser
  Input should be 'dummy' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.BasicCSV.parser
  Input should be 'basiccsv' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.BasicCSV.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.MeasCSV.parser
  Input should be 'meascsv' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.MeasCSV.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.FlowData.parser
  Input should be 'flowdata' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.FlowData.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.FlowData.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ElectroChem.parser
  Input should be 'electrochem' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.ElectroChem.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ElectroChem.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ChromTrace.parser
  Input should be 'chromtrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.ChromTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ChromTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ChromData.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ChromData.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.MassTrace.parser
  Input should be 'masstrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.MassTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.MassTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.QFTrace.parser
  Input should be 'qftrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.QFTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.QFTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.XPSTrace.parser
  Input should be 'xpstrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.XPSTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.XPSTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.XRDTrace.parser
  Input should be 'xrdtrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.XRDTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.XRDTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.Dummy.parser
  Input should be 'dummy' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.BasicCSV.parser
  Input should be 'basiccsv' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.BasicCSV.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.MeasCSV.parser
  Input should be 'meascsv' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.MeasCSV.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.FlowData.parser
  Input should be 'flowdata' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.FlowData.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.FlowData.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ElectroChem.parser
  Input should be 'electrochem' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.ElectroChem.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ElectroChem.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ChromTrace.parser
  Input should be 'chromtrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.ChromTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ChromTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ChromData.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ChromData.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.MassTrace.parser
  Input should be 'masstrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.MassTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.MassTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.QFTrace.parser
  Input should be 'qftrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.QFTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.QFTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.XPSTrace.parser
  Input should be 'xpstrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.XPSTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.XPSTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.XRDTrace.parser
  Input should be 'xrdtrace' [type=literal_error, input_value='chromdata', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.XRDTrace.extractor
  Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.XRDTrace.parameters.filetype
  Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
    For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden```

Implement `locale` support for text file types.

Related to #95 and the solution in #99 :

The text files read by yadg may be localised (MM/DD/YYYY instead of DD.MM.YYYY in dates, or worse, X,XXX.yy vs X'XXX.yy or X.XXX,yy in floats). Trying to guess a locale of the given file is really impossible and should be a user-controlled parameter, along with things like timezone, encoding, or other file metadata.

Archiving functionality.

yadg should be able to zip-archive the data specified in the dataschema for archiving along with the datagram.

Question: seeing the full parsed output

Hi,

Is it possible to run something like yadg process myFile.mpr myFile.json in order to have a look at the result? (eclab file)

I read the docs in here and it looks like I should run yadg process mySchema outfile.json.

PD: I finally did it using the version 4.0.0 and the following schema:

{
    "metadata": {
        "provenance": "manual",
        "schema_version": "4.0.0"
    },
    "steps": [
        {
            "parser": "electrochem",
            "import": {
                "files": ["myFile.mpr"],
                "encoding": "windows-1252"
            },
            "parameters": {"filetype": "eclab.mpr"}
        }
    ]
}

Not sure what the update is for 4.2.

BTW I am not sure whether the file always stores a single technique? I think that is implied by the code right now?
Thanks.

Implement parsing of `.picolog` files

An example file in the attachment.

20220723-porosity-study-15p-Cu-200mA-longrun-07.zip

Implement parsing of `ezchrom.dat` files

As requested by @schumannj. I have a set of .dat.asc and corresponding .dat files available for reverse-engineering.

Unified handling of binary files.

Currently, each parser handles reading from binary files separately. The offending parsers should be rewritten to use the functions in yadg.dgutils.btools.

Impedance Accuracy Correction

The measuremnt accuracy of a impedance measurement should be changed to:

1% for the magnitude
1 degree for the phase

Implement support for `CVA` technique

vetschn/eclabfiles#15

As mentioned by @ScottSoren, including the example mpt and mpr files provided in the above link.

05_O2dose_COox_02_CVA_C01.zip

Uncertainties in XPS traces

Determining the uncertainty of the counts per second signal in xps traces from the phispe parser should be done in a better way.

I set it to be a constant, hard-coded value of 12.5 as the counts per second only seem to be taking values in steps of 12.5cps in the .spe files I've inspected (should investigate this further).

This counts per second signal should somehow be Poisson distributed, i.e. the uncertainty should probably be something like sqrt(n) but I know too little about XPS to really tell.

Unused imports and variable definitions

The following appear to be unused imports

yadg/src/yadg/core/validators.py

Line 1 in 19ec4dc

import numpy as np

yadg/src/yadg/parsers/chromtrace/agilentcsv.py

Line 29 in 19ec4dc

import yadg.dgutils

yadg/src/yadg/parsers/chromtrace/fusionjson.py

Line 30 in 19ec4dc

import yadg.dgutils

yadg/src/yadg/parsers/chromtrace/main.py

Line 2 in 19ec4dc

import numpy as np

yadg/src/yadg/parsers/chromtrace/main.py

Line 4 in 19ec4dc

import uncertainties as uc

yadg/src/yadg/parsers/chromtrace/main.py

Line 5 in 19ec4dc

import uncertainties.unumpy as unp

yadg/src/yadg/parsers/electrochem/eclabmpr.py

Line 160 in 19ec4dc

import math

yadg/src/yadg/parsers/electrochem/eclabmpt.py

Line 44 in 19ec4dc

import math

yadg/src/yadg/parsers/electrochem/eclabmpt.py

Line 45 in 19ec4dc

import os

yadg/src/yadg/parsers/flowdata/main.py

Line 5 in 19ec4dc

import yadg.dgutils

yadg/src/yadg/parsers/qftrace/main.py

Line 1 in 19ec4dc

import os

yadg/src/yadg/parsers/qftrace/main.py

Line 6 in 19ec4dc

import uncertainties as uc

yadg/src/yadg/parsers/qftrace/main.py

Line 7 in 19ec4dc

from uncertainties import unumpy as unp

yadg/src/yadg/parsers/qftrace/main.py

Line 10 in 19ec4dc

import yadg.dgutils

yadg/src/yadg/parsers/qftrace/labviewcsv.py

Line 15 in 19ec4dc

import os

yadg/src/yadg/parsers/qftrace/labviewcsv.py

Line 19 in 19ec4dc

import yadg.dgutils

yadg/src/yadg/parsers/qftrace/prune.py

Line 15 in 19ec4dc

from uncertainties import unumpy

In the following line, math is an unused import

yadg/src/yadg/dgutils/dgutils.py

Line 4 in 19ec4dc

from uncertainties import ufloat, umath

These variables as defined don't appear to be used anywhere; consider deleting them

yadg/src/yadg/parsers/chromtrace/agilentdx.py

Line 70 in 19ec4dc

common = None

yadg/src/yadg/parsers/qftrace/fit.py

Line 173 in 19ec4dc

dia = agam1.min()

yadg/src/yadg/parsers/qftrace/fit.py

Line 202 in 19ec4dc

Fsq = F.conj().T @ (p * F)

yadg/src/yadg/parsers/qftrace/fit.py

Line 206 in 19ec4dc

diam1 = 2 * abs(g[1] * g[2] - g[0]) / abs(g[2].conj() - g[2])

yadg/src/yadg/parsers/qftrace/fit.py

Line 216 in 19ec4dc

gam1com = (g[0] * ft.T + g[1]) / (g[2] * ft.T + 1)

yadg/src/yadg/parsers/qftrace/fit.py

Line 229 in 19ec4dc

coupls = (1 / rr2 - 1) / (1 / rr1 - 1 / rr2)

yadg/src/yadg/parsers/qftrace/fit.py

Line 230 in 19ec4dc

rs = 2 / dr2 - 1

yadg/src/yadg/parsers/qftrace/fit.py

Line 242 in 19ec4dc

avequi = equi.mean()

yadg/src/yadg/parsers/qftrace/fit.py

Line 244 in 19ec4dc

sddia1st = sdequi * 2

Reading cycle number from GCPL mpr file

I have been using yadg 5.03 to prase GCPL.mpr data (files from before Biologic 11.50 update) and notice that the cycle number in the resulting netcdf file tend to be 1 and nan otherwise. When using EC-lab program to open the file, one can easily select the cycle number. I found that in other technique, for example cyclic voltammetry (CV), yadg can correctly extract the cycle number. I am suspecting there might be a bug in this parsing? The cycle number is very important parameter in processing GCPL data and it would make data analysis much more easy.

Example data.zip

loss of column in the converted json from the mpr file in GCPL technique using yadg 4

Background
When using yadg 4 to convert mpr file to json file, serveral data columns went missing although these columns are present in the mpt file generated from Biologic. Unfortunately, these missing data column, /mA, in particular is crucial for plotting the data from GCPL experiment. The user has requested that this column is included in the json file.

Investigation done so far

I have suspected that the issue may have come from the eclabmpr parser. In particular, this parser uses _parse_columns function to read the column header in the mpr files before reading them and compile them into a datapoint which is subsequently written into a json file. I have tried to print the names, dtypes, units, flags which is returned from the _parse_columns and found that the headers matches what is found in the json file. Strangely, the name stored in the names parameters only cover parts of the data column in the resulting mpt file. In particular, only from mode to Capacitance discharge/µF column.
I have tried to modify the script (I folked the project and run in a separate conda environment), and manually add names, dtypes, units for  so that the script will extract this column from the mpr file. But, I got an error about not enough numpy buffer. I tried to add the attibute that I am sure not in the mpr file (by mispelling the name) and got a similar error. This led to my conclusion that the  column may not exits in the mpr file in the first place and the yadg was performing correctly.

My suspision so far

From the investigation I have done earlier, I now think that Biologic may have derive and generate the data from /mA to cycle number (in the mpt file) during mpr to mpt conversion. These data columns may not have present in the original mpr file.

/mA, for example, can be derived from dq/mA.h. Nevertheless, I have not yet verified this speculation with the user whoes have experince with battery electrochemistry.

This speculation is in line with the issue reported earlier (#41) that there will be more column stored in the mpr file after the user perform the conversion to mpt.

Corresponding files
https://drive.switch.ch/index.php/s/AGzUx3wPHscdWnq

nvm

nvm!

Rework GitHub Actions

The GitHub Actions should be reworked for a better reuse of individual actions. Also, several of the actions should be updated.

Resolution in electrochemistry files.

The resolution of the data in electrochemistry files has currently two issues:

derived quantities (#11): currently use fallback of math.ulp, which should be removed and the correct uncertainty propagated from raw data.
raw quantities: currently, hard-coded values from VMP3 specification are used. However, these may be device-dependent.

Bug?: parser errors out

Hi, I wrote a simple script to convert the mpr files to json, they are all taken from the repo:

//logs
['./test/ca.mpr', './test/cv.mpr', './test/cp.mpr', './test/wait.mpr', './test/test.mpr', './test/zir.mpr', './test/lsv.mpr']
./test/ca-params.json
./test/cv-params.json
./test/cp-params.json
./test/wait-params.json
Traceback (most recent call last):
  File "/data/convertMPRtoJSON.py", line 21, in <module>
    data, meta, date  = p.process("./"+filename)
  File "/home/sm/yadg/src/yadg/parsers/electrochem/eclabmpr.py", line 834, in process
    settings, params, data, log, loop = _process_modules(mpr)
  File "/home/sm/yadg/src/yadg/parsers/electrochem/eclabmpr.py", line 771, in _process_modules
    Eranges.append(el["E_range_max"] - el["E_range_min"])
KeyError: 'E_range_max'

This is how I execute it:


files = glob.glob("./*/*.mpr")
print(files)
for filename in files:
    newfilename = filename.split(".mpr")[0]+"-params.json"
    print(newfilename)
    data, meta, date  = p.process("./"+filename)
    totest = meta["params"]
    fo = open(newfilename, "w")
    fo.write(json.dumps(totest))
    fo.close()

Is it a bug in the code ? I think there should be something else or it wouldn't pass the tests ?

The rest of the files are converted to json and look fine.

I tried with both 4.2.0 and 4.0.0

Check differences in mpr files before & after opening in EC-Lab.

I dont know if this is the right place for it.
But I just noticed that if you just open an mpr and chose export as text, EClab will generate automatically additional data columns and save them inside the mpr file. The two attached files are the same file, before and after clicking and export as text.
OpeningFiles.zip

Originally posted by @ileu in #37 (comment)

More verbose DataSchema parsing

In yadg-4.0, we introduced parsing of the external DataSchema via the dgbowl-schemas library, based on Pydantic. Unfortunately, when the parsing of the input DataSchema fails for whatever reason, the verbose output message is rather unclear, and raises a million warnings/errors, as all versions of all schemas are validated and fail.

Separate install requirements based on use

currently the setuptools installs all dependencies for both testing and production; it would be better however if we utilized extras_require so that unnecessary dependencies aren't installed for end users on production.

i.e.

setuptools.setup(
    install_requires=[
        "numpy",
        "scipy",
        "pint",
        "uncertainties",
        "striprtf",
        "pytest",
        "tzlocal",
        "python-dateutil",
    ],
    ...
)

should be

setuptools.setup(
    install_requires=[
        "numpy",
        "scipy",
        "pint",
        "uncertainties",
        "striprtf",
        "tzlocal",
        "python-dateutil",
    ],
    extras_require={
        "testing": [
            "pytest"
        ]
    },
    ...
)

test installations now are done via python -m pip install -e ./[testing] so that pytest is included, and now pytest isn't installed when a user is just running this on production.

Parse through mpr and mpt settings header.

The param_map in eclabtechniques.py is incomplete for many parameters. It should be possible to complete it using a systematic set of mpr/mpt files.

Use module-level loggers.

The logging in yadg should be done using module-level loggers (https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial). This will allow us to remove the function prefixes from the log messages, and keep the logging more consistent between modules & during code refactors.

Traces in PEIS/GEIS files are not split into timesteps.

Currently, when a mpr or mpt file contains multiple PEIS traces (separated by cycle number), they all appear within raw->traces->*. Instead, each of these traces should correspond to a separate timestep.

`xr.concat`: Dealing with conflicting metadata in `attrs`

The concatenation of partial Datasets introduced in yadg-5.0a2 is currently only working when the attrs of the datasets are identical. This concatenation is done whenever multiple files are merged into one step, but also when the parsed file is itself a "zip" file and contains multiple separate files that are processed and merged.

It is reasonable to assume that we want to preserve as much of the metadata as possible, hence drop is not an option.
When parsing a "zip" file, the assumption is that the metadata is consistent between files, so identical should be the only possible option - any merge that fails means the classification of data/metadata is wrong in the parser.
When parsing a bunch of files, the expectation may be relaxed, as the metadata might be very different (e.g. different file paths etc.); therefore drop_conflicts is reasonable. A further, stricter mode (no_conflicts or identical) could be enabled via command line.

Error in parsing `.mpr` files with `CA` technique:

vetschn/eclabfiles#14

From @ScottSoren:

I am the main author of an open source project "ixdat" which also includes many python parsers for experimental data formats, including Biologic's .mpt. I'm trying to add a .mpr parser which makes use of eclabfiles for parsing the binary file.

eclabfiles works for some files but not others. For details, see this PR: ixdat/ixdat#134

Briefly, it seems to work for LSV measurements, but not CA or CVA. It's a different error message, so I'll put CA here and CVA in a separate issue.

The error I get is "Error: field '(Q-Qo)' occurs more than once". I get this same error whether using data, meta = eclabfiles.process("05_O2dose_COox_04_CA_C01.mpr") or df = eclabfiles.to_df("05_O2dose_COox_04_CA_C01.mpr")

The file is here, along with plots of the data (made with ixdat.Measurement.read("05_O2dose_COox_04_CA_C01.mpt").plot() as demo'd in "plot_data.py"): https://www.dropbox.com/scl/fo/cl0cnovmik7pjgcls2l9h/h?rlkey=v93snkrt2rq3uf95au26qdi0o&dl=0

Happy for any help or

I have downloaded a copy of the .mpt and .mpr files from the above link:

05_O2dose_COox_04_CA_C01.zip
suggestions!

Harcoded String Handling

Hi,

This appears to be a hardcoded line of code. In strings where a split(',')results in more elements in a list than 2, this causes an error.

Derived values in electrochemistry files.

The electrochem interface should be fixed in version 5.0. Ideally, only minimum values would be parsed into the "raw" data:

control currents and voltages
measured voltages and currents
frequency, phase & magnitude

All other values, such as impedances, capacities, etc. should be computed by yadg and stored in the derived fields.

`datagram` format `DataGram`-schema for `yadg-5.0`

So the current way of doing things in yadg has the following drawbacks:

the datagram-v4.x is just a json file, without any accompanying schema. Readable, but not really FAIR. (#4)
the datagram-v4.x is insanely bulky due to it being a text file format as well as recording uncertainties in a stupid fashion (#69)

The situation is further complicated by the introduction of Extractors into yadg-5.x, meaning that we ought to have a way of dumping extracted data (currently for the foreseeable future: a single Extractor processes a single file into one object) as well as parsed data (using DataSchema, i.e. multiple Extractors processing multiple files into one object).

Additional complication is the kinds of data we might have to store. There are:

float: standard data (quantities comprising floats, accompanied with units and uncertainties)
list: trace data (essentially lists of quantities that form single objects, like chromatographic or spectroscopic traces)
dict: nested/categorised data (for example, two detectors in a GC make two separate traces, or concentrations of species in a mixture have to be assigned the species label somehow)
the necessity to store uncertainties further complicates these things, as we might have to somehow store two "tables" per file

I invite opinions from friends of this project, such as @EmpaEconversion, @vetschn, @ileu, @ml-evs, @ramirezfranciscof

Option 0: `JSONSchema + JSON`

comfortable and easy to implement, as we really just need the JSONSchema part
a mechanism for recording sigmas in a more sensible way can be developed
at least the JSON part can be opened in Firefox
can avoid Python -> C -> Python round-tripping penalty if done without pandas

However:

does not ultimately help with file size
essentially means rolling our own standard, even if the individual tables could be read using pd.DataFrame.from_dict()
the above means we'd need to provide loading functions

Therefore: good as an optional export option; not great as default

Option 1: `pd.DataFrame` with `pd.MultiIndex`

we can have a top-level column-index splitting our data into values and sigmas
it is possible to use a row-index as a category for e.g. species
2-dimensional data is no problem

However:

while pd.DataFrame have support for df.attrs, round-tripping those into anything but HDF5 is, as far as I know, impossible.
the pd.MultiIndex is really unwieldy if you want to mix quantities which have categories with those that do not. Including the top-level values index, your columns have to be addressed using ("values", "concentration", "O2"), which is fine, but also ("values", "temperature", None) if it's in the same table and ("values", "temperature") if it's on its own, which is not fine.
not useful for datagrams

Therefore: unworkable, sorry.

Option 2: `xarray.Dataset`

we can use and re-use one coord for the timestamp, and many arbitrary ones for things like frequency in VNA traces (for 2D data)
the great thing here is that such coords become an actual index so you don't have to guess relationships!
per-variable annotation possible. including units and descriptions
multi dimensional data is a piece of cake
categorised data can be also represented in its normal form using a coord - the nice thing here is that e.g. in chromatography, we can access all variables (e.g. concentration, peak area, etc.) for one species using a single slice
the xr.Dataset itself can be annotated with metadata
round tripping of everyting via NetCDF should work fine
can be used in pandas with methods such as xr.Dataset.to_pandas() or .to_dataframe()

However:

while suitable for most cases, the issue with sigmas remains. They can be encoded per-column using abs and rel within the same object, but in some cases we may require a second table. A second table can be saved into a NetCDF file in a separate named group, but then it this information needs to be encoded somehow somewhere.
it's a fairly heavy dependency
while it solves the Extractor issue, it's borderline useless on its own for datagrams, unless we go the named-group way above.

Therefore: currently my preferred option for Extractors

Option 3: `xarray-contrib.Datatree`

all the above benefits apply
trivially resolves the issue with datagrams and could also resolve issues with sigmas

However:

still under development, meaning potentially unstable API
imposes a / separated namespace for Steps in a DataSchema

Therefore: in general, this looks like a good solution

Option 4: `HDF5` and `NeXus`

I cannot really comment on this one, as I didn't have time to look up how it differs from NetCDF / xarray. It seems inevitable that yadg and dgpost will have to support NeXus at some point somehow, but maybe the time is not yet there.

`basic.csv`: `/` substitution screws up units

The column for the flow data in the flow_for_yadg.csv is called 'Flow (nml/min)'. I found that the resulting netcdf file register this value as 'Flow (nml_min)' and did not extract the unit despite this is being instructed in the yaml file.

Originally posted by @NukP in #142 (comment)

@NukP: could you please attach an example flow_for_yadg.csv file and the relevant basiccsv section of the dataschema? I'll try to backport this to 5.0.x and make sure it's fixed in 5.1 too.

Add link to docs in README.md?

Congrats on the nice project!

As a minor note, it was a bit hard for me to find the docs on GitHub pages at https://dgbowl.github.io/yadg/master/index.html since I couldn't immediately see any link from the README file. However, I noticed the docs in the repo, the GH pages branch and a deploy GH action so I looked deeper. In general, adding a link to the docs to the README and on pypi can be beneficial for potential new users

dgbowl / yadg Goto Github PK

yadg's People

Contributors

Stargazers

Watchers

Forkers

yadg's Issues

Option 0: JSONSchema + JSON

Option 1: pd.DataFrame with pd.MultiIndex

Option 2: xarray.Dataset

Option 3: xarray-contrib.Datatree

Option 4: HDF5 and NeXus

Recommend Projects

Recommend Topics

Recommend Org

Option 0: `JSONSchema + JSON`

Option 1: `pd.DataFrame` with `pd.MultiIndex`

Option 2: `xarray.Dataset`

Option 3: `xarray-contrib.Datatree`

Option 4: `HDF5` and `NeXus`