dgbowl / yadg Goto Github PK
View Code? Open in Web Editor NEWyadg: yet another datagram
Home Page: https://dgbowl.github.io/yadg
License: GNU General Public License v3.0
yadg: yet another datagram
Home Page: https://dgbowl.github.io/yadg
License: GNU General Public License v3.0
The timestamps contained in Agilent's .dx and .ch files correspond to sample analysis time, not to sample creation time. While for GC this is fine, for LC these times may differ significantly with a variable offset.
One possible solution was supplying an Agilent log file via the externaldate interface. However, there are several issues with the Agilent LC logfile:
the logfile doesn't seem to update automatically, so the whole functionality is questionable
the user would supply a folder full of .dx or .ch files, with one timestep in each; the logfile contains data for multiple timesteps which have to be matched to the .dx and .ch files somehow.
the logfile contains any pre-run and post-run injections; those are usually not supplied using .dx or .ch files, as they use a different method, which means the number of timesteps in the logfile may be different than the number of supplied data files
I'm in contact with Agilent trying to resolve this issue.
I applaud the authors for developing a software package with a focus on the FAIR guiding principles. Keep in mind I have no access to any of the laboratory equipment/ data generators mentioned in the article, which mean my bring me to some of the following query
Is the package developed with a focus on data generated from Chromatography, and Electrochemistry at EMPA.
How easy would it be for an experimentalist/ alternate set up located in another facility to use yadg ?
If this is indeed the case would you be able to link up some user focused examples ?
How does yadg handle the non standardized intermediate step (Figure 1)? Would the use of proprietary file formats pose a challenge for FAIR ?
Kindly provide a brief statement in the article regarding how this software compares to other commonly-used packages? (JOSS requirement)
While doing some quick googling in light of #97 just now, I came across an interesting development over at echemdata/galvani#80 where user @ilka-schulz seems to be disassembling/decompiling the executable itself and finding some interesting data (like an array of all the data column names).
Maybe moving into territory of questionable legality in this case, but an interesting and promising looking approach as far as I'm concerned. May also work for other software and could speed up the process significantly.
Felt like this was worth noting somewhere.
The current naming schema of Re(Γ)
and Im(Γ)
for the real and imaginary part of the is inconvenient, as the Γ character is not compatible with ANSI and other encodings, and the columns cannot be accessed via dgpost.
The addition of explicit uncertainty values bloats the json
data significantly. Consider reworking.
Supplying an [outfile]
with yadg preset -p
does not work.
The attached mpr file contains an addition model called VMP ExtDev.
test.zip
@PeterKraus. I tried to use yadg5 to prase the echem data (GCPL and PEIS data in mpr format) into netcdf file. From what I looked at the documentation of yadg, these two techniques should be supported by yadg. Nevertheless, when I tried to use yadg on the attached files with the attached yaml data.zip I encountered the following error.
Traceback (most recent call last):
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\Scripts\yadg.exe\__main__.py", line 7, in <module>
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\main.py", line 201, in run_with_arguments
args.func(args)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\subcommands.py", line 152, in preset
datagram = core.process_schema(ds, strict_merge=not args.ignore_merge_errors)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\core\__init__.py", line 107, in process_schema
fvals = handler(
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\parsers\electrochem\__init__.py", line 83, in process
return eclabmpr.process(**kwargs)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\parsers\electrochem\eclabmpr.py", line 582, in process
settings, params, ds, log, loop = process_modules(mpr)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\parsers\electrochem\eclabmpr.py", line 502, in process_modules
settings, params = process_settings(module_data)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\parsers\electrochem\eclabmpr.py", line 256, in process_settings
raise NotImplementedError("Unknown parameter offset or technique dtype.")
NotImplementedError: Unknown parameter offset or technique dtype.
This is quite surprise to me as both technique should be supported by yadg. I also tried to prase only PEIS file as I know I can prase PEIS file for Franesco just fine. But I encountered similar error as shown above despite trying only 1 PEIS (mpr) file. When I swap this PEIS file from the battery cycling experiment with the one from Francesco (catalysis experiment) yadg works just fine and can create the nc file with no problem. I tried to prase GCPL and PEIS from many other battery cycling experiment and encountered the same error. Could you please have a look into this issue? Thank you.
Kindly fix the following references in the article
(Drawl, 2020) Draxl, C. (2020, July). FAIRmat. 2. NFDI Conference. (Incomplete)
The docs should be built for all versions of the project, perhaps using sphinxcontrib-versioning
or similar.
Hello,
I try doing:
extract(filetype="eclab.mpt", path="myfile.mpt")
and get:
Error: unsupported locale setting
My data file contains data decimal separator as comma. This should be changed to dots.
Thank you in advance :)
Valve position in the fusion.json
parser of chromtrace
is stored once in each step as opposed to once in each timestep.
In some of Francesco's data, I range values higher than those listed below are found:
When invoking the yadg extractor without an explicit outfile
name, the default filename extension is .json
but the resulting file is an HDF5 file.
I'm assuming this isn't intentional, right?
The specifications for datagram
and schema
should be written in YAML, and validated using strictyaml
, to keep consistent with dgpost
and reduce the documentation effort. This will be an interface-breaking change, hence it's tagged for version 5.0.
The schema
files themselves also should be YAML files.
As no date is stored in the drycal
return format, in an overnight experiments, the hour counter resets but the date counter does not increase.
I am trying to implement yadg
into our data-workflow. Great project!
Following the provided example from Binder for Biologic files I fail due to various problems under yadg version 4.2.1 dependent on the provided files and schema
content.
As schema the structure from Binder (see below) was used. For parsing .mpt
-file exported from my EC-Lab the second step was added.
Since the Binder example states "schema_version": "4.0.0"
both 4.0.0
and 4.2.1
was tested. Within all tests, the example files gcpl.mpt
, ocv.mpt
and peis.mpr
were parsed.
{
"metadata": {
"provenance": "manual",
"schema_version": "4.0.0"
},
"steps": [
{
"parser": "electrochem",
"import": {
"files": ["data/ocv.mpt", "data/gcpl.mpt"],
"encoding": "windows-1252"
},
"parameters": {"filetype": "eclab.mpt"}
},
{
"parser": "electrochem",
"import": {
"folders": ["data"],
"suffix" : ".mpt",
"encoding": "windows-1252"
},
"parameters": {"filetype": "eclab.mpt"}
},
{
"parser": "electrochem",
"import": {
"folders": ["data"],
"suffix": ".mpr"
},
"parameters": {"filetype": "eclab.mpr"}
}
]
}
To sum up the gathered results:
schema_version | individual .mpt Export |
Result |
---|---|---|
4.0.0 | x | TypeError: unsupported operand type(s) for -: 'str' and 'str', solved in #88 ? |
4.0.0 | o | Success |
4.2.1 | x | 126 validation errors for DataSchema, unexpected value; permitted: 'dummy' |
4.2.1 | o | 85 validation errors for DataSchema, unexpected value; permitted: 'dummy' |
The unexpected value; permitted:
-message is raised for every parsers.
Since the error is independent of my .mpt
file, it seems that the version of the provided schema_version
cause some kind of bug.
The pip install yadg
worked flawlessly on my brand new apple silicon Mac book pro.
It would be nice to list the install instruction in the GitHub readme docs.
A side note . The project page url listed in the readme seems to be broken : https://dgbowl.github.io/master/index.html
404 File not found
I recognized that importing my own Biologic exports (default settings) results in errors while parsing the example files from binder works like a charm.
As initally mentioned in #93 the E_ranges will read as str
due to a "," instead of a "." as decimal seperator and therefore l.318 in eclabmpt.py
will throw an error. I was able to fix this issue locally by the following "quick-and-dirty" solution:
E_range_max = float(el.get("E_range_max", "inf").replace(',','.')) if isinstance(el.get("E_range_max", "inf"), str) else el.get("E_range_max", "inf")
E_range_min = float(el.get("E_range_min","-inf").replace(',','.')) if isinstance(el.get("E_range_min", "inf"), str) else el.get("E_range_min", "inf")
I have not fully understood the yadg
- workflow, yet, thus this might break something.
Please find attached a sample .mpt
file, as well as the corresponding .mps
-file. To my best knowledge the "," was not set by purpose anywhere in EC-lab software.
TestScript_YADG_II_01_MB_C01.mpt.txt
TestScript_YADG_II.mps.txt
Integration of some peaks using chromtrace is unstable - for unknown reasons, the peak edges are often shifted by several baseline points. Originally, I thought this was caused by the default smoothing parameters of the Savigny-Golay filter. Needs further investigation.
Hi @PeterKraus
As promised, I am now retrying to upgrade the default yadg version for the catalysis lab from yadg4.2 to yadg 5.0.
I first created new conda env and install the latest version of yadg using pip. The current version of the installed library is as follows:
yadg 5.0.2
pydantic 2.6.4
dgbowl-schemas 116
dgpost 2.1.1
I first try runing yadg /dgpost using the example files and command provided by you last summer.
yadg-5.0a5-pipeline.zip. The script works fine. The netcdf was created and dgpost work correctly.
Nevertheless, we nolonger use drycal's software to measure the flow to ease the issue with piston stucking during the measurement. I wrote a script to control the flow meter. We are now using the script exclusively during the measurement. The introduction of multiplex system (running 8 cells at the same time) also requires some pre-processing of the flow data/ pressure data and temperature data before processing using yadg/dgpost.
I have made the script that will pre-process the flow data, pressure data and temperature data before processing using yadg/dgpost. The pre-processed files will be called 'flow_for_yadg.csv', 'pressure_for_yadg.csv' and 'temperature_for_yadg.csv' respectively. You can find the data after pre-processing step
here I have tried to modify yadg yaml recipe file for these files yadg.preset.francesco_v5-EDLC_mod1.yaml.zip but after I tried to run yadg on this pre-process data using this modified yaml, I got the error below: It seems that extractor (which is a new feature introduced in yadg5) is required, but I am not quite sure how this work. I think the issue might stem from how I made the yaml file. Could you please have a look into this. Thank you.
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\Scripts\yadg.exe\__main__.py", line 7, in <module>
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\main.py", line 201, in run_with_arguments
args.func(args)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\yadg\subcommands.py", line 144, in preset
schema = to_dataschema(**preset)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\dgbowl_schemas\yadg\__init__.py", line 40, in to_dataschema
schema = Model(**kwargs)
File "C:\Users\plnu\Anaconda3\envs\test_yadg5\lib\site-packages\pydantic\main.py", line 171, in __init__
self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 56 validation errors for DataSchema
steps.10.Dummy.parser
Input should be 'dummy' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.BasicCSV.parser
Input should be 'basiccsv' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.BasicCSV.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.MeasCSV.parser
Input should be 'meascsv' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.MeasCSV.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.FlowData.parser
Input should be 'flowdata' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.FlowData.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.FlowData.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ElectroChem.parser
Input should be 'electrochem' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.ElectroChem.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ElectroChem.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ChromTrace.parser
Input should be 'chromtrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.ChromTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ChromTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.ChromData.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.ChromData.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.MassTrace.parser
Input should be 'masstrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.MassTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.MassTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.QFTrace.parser
Input should be 'qftrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.QFTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.QFTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.XPSTrace.parser
Input should be 'xpstrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.XPSTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.XPSTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.10.XRDTrace.parser
Input should be 'xrdtrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.10.XRDTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...sion.zip'}, 'tag': 'GC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.10.XRDTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='fusion.zip', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.Dummy.parser
Input should be 'dummy' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.BasicCSV.parser
Input should be 'basiccsv' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.BasicCSV.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.MeasCSV.parser
Input should be 'meascsv' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.MeasCSV.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.FlowData.parser
Input should be 'flowdata' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.FlowData.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.FlowData.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ElectroChem.parser
Input should be 'electrochem' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.ElectroChem.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ElectroChem.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ChromTrace.parser
Input should be 'chromtrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.ChromTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ChromTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.ChromData.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.ChromData.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.MassTrace.parser
Input should be 'masstrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.MassTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.MassTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.QFTrace.parser
Input should be 'qftrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.QFTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.QFTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.XPSTrace.parser
Input should be 'xpstrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.XPSTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.XPSTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden
steps.11.XRDTrace.parser
Input should be 'xrdtrace' [type=literal_error, input_value='chromdata', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/literal_error
steps.11.XRDTrace.extractor
Field required [type=missing, input_value={'parser': 'chromdata', '...alc.xlsx'}, 'tag': 'LC'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/missing
steps.11.XRDTrace.parameters.filetype
Extra inputs are not permitted [type=extra_forbidden, input_value='empalc.xlsx', input_type=str]
For further information visit https://errors.pydantic.dev/2.6/v/extra_forbidden```
Related to #95 and the solution in #99 :
The text files read by yadg
may be localised (MM/DD/YYYY
instead of DD.MM.YYYY
in dates, or worse, X,XXX.yy
vs X'XXX.yy
or X.XXX,yy
in floats). Trying to guess a locale of the given file is really impossible and should be a user-controlled parameter, along with things like timezone
, encoding
, or other file metadata
.
yadg
should be able to zip-archive the data specified in the dataschema for archiving along with the datagram.
Hi,
Is it possible to run something like yadg process myFile.mpr myFile.json
in order to have a look at the result? (eclab file)
I read the docs in here and it looks like I should run yadg process mySchema outfile.json
.
PD: I finally did it using the version 4.0.0 and the following schema:
{
"metadata": {
"provenance": "manual",
"schema_version": "4.0.0"
},
"steps": [
{
"parser": "electrochem",
"import": {
"files": ["myFile.mpr"],
"encoding": "windows-1252"
},
"parameters": {"filetype": "eclab.mpr"}
}
]
}
Not sure what the update is for 4.2.
BTW I am not sure whether the file always stores a single technique? I think that is implied by the code right now?
Thanks.
An example file in the attachment.
As requested by @schumannj. I have a set of .dat.asc
and corresponding .dat
files available for reverse-engineering.
Currently, each parser handles reading from binary files separately. The offending parsers should be rewritten to use the functions in yadg.dgutils.btools
.
The measuremnt accuracy of a impedance measurement should be changed to:
As mentioned by @ScottSoren, including the example mpt
and mpr
files provided in the above link.
Determining the uncertainty of the counts per second signal in xps traces from the phispe parser should be done in a better way.
I set it to be a constant, hard-coded value of 12.5 as the counts per second only seem to be taking values in steps of 12.5cps in the .spe files I've inspected (should investigate this further).
This counts per second signal should somehow be Poisson distributed, i.e. the uncertainty should probably be something like sqrt(n) but I know too little about XPS to really tell.
The following appear to be unused imports
yadg/src/yadg/core/validators.py
Line 1 in 19ec4dc
yadg/src/yadg/parsers/electrochem/eclabmpr.py
Line 160 in 19ec4dc
yadg/src/yadg/parsers/qftrace/main.py
Line 1 in 19ec4dc
yadg/src/yadg/parsers/qftrace/main.py
Line 6 in 19ec4dc
yadg/src/yadg/parsers/qftrace/main.py
Line 7 in 19ec4dc
yadg/src/yadg/parsers/qftrace/main.py
Line 10 in 19ec4dc
yadg/src/yadg/parsers/qftrace/prune.py
Line 15 in 19ec4dc
In the following line, math
is an unused import
yadg/src/yadg/dgutils/dgutils.py
Line 4 in 19ec4dc
These variables as defined don't appear to be used anywhere; consider deleting them
yadg/src/yadg/parsers/qftrace/fit.py
Line 173 in 19ec4dc
yadg/src/yadg/parsers/qftrace/fit.py
Line 202 in 19ec4dc
yadg/src/yadg/parsers/qftrace/fit.py
Line 206 in 19ec4dc
yadg/src/yadg/parsers/qftrace/fit.py
Line 216 in 19ec4dc
yadg/src/yadg/parsers/qftrace/fit.py
Line 229 in 19ec4dc
yadg/src/yadg/parsers/qftrace/fit.py
Line 230 in 19ec4dc
yadg/src/yadg/parsers/qftrace/fit.py
Line 242 in 19ec4dc
yadg/src/yadg/parsers/qftrace/fit.py
Line 244 in 19ec4dc
I have been using yadg 5.03 to prase GCPL.mpr data (files from before Biologic 11.50 update) and notice that the cycle number in the resulting netcdf file tend to be 1 and nan otherwise. When using EC-lab program to open the file, one can easily select the cycle number. I found that in other technique, for example cyclic voltammetry (CV), yadg can correctly extract the cycle number. I am suspecting there might be a bug in this parsing? The cycle number is very important parameter in processing GCPL data and it would make data analysis much more easy.
Background
When using yadg 4 to convert mpr file to json file, serveral data columns went missing although these columns are present in the mpt file generated from Biologic. Unfortunately, these missing data column, <I>/mA
, in particular is crucial for plotting the data from GCPL experiment. The user has requested that this column is included in the json file.
Investigation done so far
_parse_columns
function to read the column header in the mpr files before reading them and compile them into a datapoint which is subsequently written into a json file. I have tried to print the names, dtypes, units, flags
which is returned from the _parse_columns
and found that the headers matches what is found in the json file. Strangely, the name stored in the names
parameters only cover parts of the data column in the resulting mpt file. In particular, only from mode
to Capacitance discharge/µF
column.names, dtypes, units
for <I>
so that the script will extract this column from the mpr file. But, I got an error about not enough numpy buffer. I tried to add the attibute that I am sure not in the mpr file (by mispelling the name) and got a similar error. This led to my conclusion that the <I>
column may not exits in the mpr file in the first place and the yadg was performing correctly.My suspision so far
<I>/mA
to cycle number
(in the mpt file) during mpr to mpt conversion. These data columns may not have present in the original mpr file.<I>/mA
, for example, can be derived from dq/mA.h
. Nevertheless, I have not yet verified this speculation with the user whoes have experince with battery electrochemistry.Corresponding files
https://drive.switch.ch/index.php/s/AGzUx3wPHscdWnq
nvm!
The GitHub Actions should be reworked for a better reuse of individual actions. Also, several of the actions should be updated.
The resolution of the data in electrochemistry files has currently two issues:
math.ulp
, which should be removed and the correct uncertainty propagated from raw data.Hi, I wrote a simple script to convert the mpr files to json, they are all taken from the repo:
//logs
['./test/ca.mpr', './test/cv.mpr', './test/cp.mpr', './test/wait.mpr', './test/test.mpr', './test/zir.mpr', './test/lsv.mpr']
./test/ca-params.json
./test/cv-params.json
./test/cp-params.json
./test/wait-params.json
Traceback (most recent call last):
File "/data/convertMPRtoJSON.py", line 21, in <module>
data, meta, date = p.process("./"+filename)
File "/home/sm/yadg/src/yadg/parsers/electrochem/eclabmpr.py", line 834, in process
settings, params, data, log, loop = _process_modules(mpr)
File "/home/sm/yadg/src/yadg/parsers/electrochem/eclabmpr.py", line 771, in _process_modules
Eranges.append(el["E_range_max"] - el["E_range_min"])
KeyError: 'E_range_max'
This is how I execute it:
files = glob.glob("./*/*.mpr")
print(files)
for filename in files:
newfilename = filename.split(".mpr")[0]+"-params.json"
print(newfilename)
data, meta, date = p.process("./"+filename)
totest = meta["params"]
fo = open(newfilename, "w")
fo.write(json.dumps(totest))
fo.close()
Is it a bug in the code ? I think there should be something else or it wouldn't pass the tests ?
The rest of the files are converted to json and look fine.
I tried with both 4.2.0
and 4.0.0
I dont know if this is the right place for it.
But I just noticed that if you just open an mpr
and chose export as text, EClab
will generate automatically additional data columns and save them inside the mpr file. The two attached files are the same file, before and after clicking and export as text.
OpeningFiles.zip
Originally posted by @ileu in #37 (comment)
In yadg-4.0
, we introduced parsing of the external DataSchema
via the dgbowl-schemas library, based on Pydantic. Unfortunately, when the parsing of the input DataSchema
fails for whatever reason, the verbose output message is rather unclear, and raises a million warnings/errors, as all versions of all schemas are validated and fail.
See also an user report in #93.
currently the setuptools installs all dependencies for both testing and production; it would be better however if we utilized extras_require
so that unnecessary dependencies aren't installed for end users on production.
i.e.
setuptools.setup(
install_requires=[
"numpy",
"scipy",
"pint",
"uncertainties",
"striprtf",
"pytest",
"tzlocal",
"python-dateutil",
],
...
)
should be
setuptools.setup(
install_requires=[
"numpy",
"scipy",
"pint",
"uncertainties",
"striprtf",
"tzlocal",
"python-dateutil",
],
extras_require={
"testing": [
"pytest"
]
},
...
)
test installations now are done via python -m pip install -e ./[testing]
so that pytest is included, and now pytest isn't installed when a user is just running this on production.
The param_map
in eclabtechniques.py
is incomplete for many parameters. It should be possible to complete it using a systematic set of mpr/mpt
files.
The logging in yadg
should be done using module-level loggers (https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial). This will allow us to remove the function prefixes from the log messages, and keep the logging more consistent between modules & during code refactors.
Currently, when a mpr
or mpt
file contains multiple PEIS traces (separated by cycle number
), they all appear within raw->traces->*
. Instead, each of these traces should correspond to a separate timestep.
The concatenation of partial Datasets
introduced in yadg-5.0a2
is currently only working when the attrs
of the datasets are identical
. This concatenation is done whenever multiple files are merged into one step
, but also when the parsed file is itself a "zip" file and contains multiple separate files that are processed and merged.
drop
is not an option.identical
should be the only possible option - any merge that fails means the classification of data/metadata is wrong in the parser.drop_conflicts
is reasonable. A further, stricter mode (no_conflicts
or identical
) could be enabled via command line.From @ScottSoren:
I am the main author of an open source project "ixdat" which also includes many python parsers for experimental data formats, including Biologic's .mpt. I'm trying to add a .mpr parser which makes use of
eclabfiles
for parsing the binary file.
eclabfiles
works for some files but not others. For details, see this PR: ixdat/ixdat#134Briefly, it seems to work for LSV measurements, but not CA or CVA. It's a different error message, so I'll put CA here and CVA in a separate issue.
The error I get is "Error: field '(Q-Qo)' occurs more than once". I get this same error whether using
data, meta = eclabfiles.process("05_O2dose_COox_04_CA_C01.mpr")
ordf = eclabfiles.to_df("05_O2dose_COox_04_CA_C01.mpr")
The file is here, along with plots of the data (made with
ixdat.Measurement.read("05_O2dose_COox_04_CA_C01.mpt").plot()
as demo'd in "plot_data.py"): https://www.dropbox.com/scl/fo/cl0cnovmik7pjgcls2l9h/h?rlkey=v93snkrt2rq3uf95au26qdi0o&dl=0Happy for any help or
I have downloaded a copy of the .mpt
and .mpr
files from the above link:
05_O2dose_COox_04_CA_C01.zip
suggestions!
Hi,
This appears to be a hardcoded line of code. In strings where a split(',')
results in more elements in a list than 2, this causes an error.
The electrochem
interface should be fixed in version 5.0. Ideally, only minimum values would be parsed into the "raw" data:
All other values, such as impedances, capacities, etc. should be computed by yadg and stored in the derived fields.
So the current way of doing things in yadg
has the following drawbacks:
datagram-v4.x
is just a json
file, without any accompanying schema. Readable, but not really FAIR. (#4)datagram-v4.x
is insanely bulky due to it being a text file format as well as recording uncertainties in a stupid fashion (#69)The situation is further complicated by the introduction of Extractors
into yadg-5.x
, meaning that we ought to have a way of dumping extracted data (currently for the foreseeable future: a single Extractor
processes a single file into one object) as well as parsed data (using DataSchema
, i.e. multiple Extractors
processing multiple files into one object).
Additional complication is the kinds of data we might have to store. There are:
float
: standard data (quantities comprising floats
, accompanied with units and uncertainties)list
: trace data (essentially lists of quantities that form single objects, like chromatographic or spectroscopic traces)dict
: nested/categorised data (for example, two detectors in a GC make two separate traces, or concentrations of species in a mixture have to be assigned the species label somehow)I invite opinions from friends of this project, such as @EmpaEconversion, @vetschn, @ileu, @ml-evs, @ramirezfranciscof
JSONSchema + JSON
JSONSchema
partsigmas
in a more sensible way can be developedJSON
part can be opened in FirefoxPython -> C -> Python
round-tripping penalty if done without pandas
However:
pd.DataFrame.from_dict()
Therefore: good as an optional export option; not great as default
pd.DataFrame
with pd.MultiIndex
values
and sigmas
However:
pd.DataFrame
have support for df.attrs
, round-tripping those into anything but HDF5
is, as far as I know, impossible.pd.MultiIndex
is really unwieldy if you want to mix quantities which have categories with those that do not. Including the top-level values
index, your columns have to be addressed using ("values", "concentration", "O2")
, which is fine, but also ("values", "temperature", None)
if it's in the same table and ("values", "temperature")
if it's on its own, which is not fine.datagrams
Therefore: unworkable, sorry.
xarray.Dataset
coord
for the timestamp, and many arbitrary ones for things like frequency
in VNA traces (for 2D data)coords
become an actual index so you don't have to guess relationships!coord
- the nice thing here is that e.g. in chromatography, we can access all variables (e.g. concentration, peak area, etc.) for one species using a single slicexr.Dataset
itself can be annotated with metadataNetCDF
should work finepandas
with methods such as xr.Dataset.to_pandas()
or .to_dataframe()
However:
sigmas
remains. They can be encoded per-column using abs
and rel
within the same object, but in some cases we may require a second table. A second table can be saved into a NetCDF
file in a separate named group, but then it this information needs to be encoded somehow somewhere.Extractor
issue, it's borderline useless on its own for datagrams
, unless we go the named-group way above.Therefore: currently my preferred option for Extractors
xarray-contrib.Datatree
datagrams
and could also resolve issues with sigmas
However:
/
separated namespace for Steps
in a DataSchema
Therefore: in general, this looks like a good solution
HDF5
and NeXus
I cannot really comment on this one, as I didn't have time to look up how it differs from NetCDF
/ xarray
. It seems inevitable that yadg
and dgpost
will have to support NeXus
at some point somehow, but maybe the time is not yet there.
The column for the flow data in the flow_for_yadg.csv is called 'Flow (nml/min)'. I found that the resulting netcdf file register this value as 'Flow (nml_min)' and did not extract the unit despite this is being instructed in the yaml file.
Originally posted by @NukP in #142 (comment)
@NukP: could you please attach an example flow_for_yadg.csv
file and the relevant basiccsv
section of the dataschema? I'll try to backport this to 5.0.x
and make sure it's fixed in 5.1
too.
Congrats on the nice project!
As a minor note, it was a bit hard for me to find the docs on GitHub pages at https://dgbowl.github.io/yadg/master/index.html since I couldn't immediately see any link from the README file. However, I noticed the docs in the repo, the GH pages branch and a deploy GH action so I looked deeper. In general, adding a link to the docs to the README and on pypi can be beneficial for potential new users
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.