xenonnt / straxen Goto Github PK
View Code? Open in Web Editor NEWStreaming analysis for XENON
License: BSD 3-Clause "New" or "Revised" License
Streaming analysis for XENON
License: BSD 3-Clause "New" or "Revised" License
See #60
For commissioning it might be nice to save all raw_records_nv.
When processing runs with the latest strax + straxen, I get these errors in 4/~450 runs (7769, 7770, 7775, 7795):
ValueError: Attempt to create chunk [007769.lone_hits: 1589849417sec 999999000 ns - 1589849423sec 499999000 ns, 912904 items, 6.6 MB/s] whose data ends late at 1589849423499999280
ValueError: Attempt to create chunk [007775.lone_hits: 1589870842sec 999999000 ns - 1589870848sec 499999000 ns, 922094 items, 6.7 MB/s] whose data ends late at 1589870848499999130
They appear at some seemingly random time in the run, and given how rare they are, this could be an edge case in one of the algorithms. It would be surprising though; in strax, hits shouldn't be able to extend beyond the record they are part of. (Peak(let)s could, but for peaks PulseProcessing explicitly clips them to chunk boundaries.)
Since three of the failing runs are close together, maybe there is simply something wrong with the records for these runs, and the error will disappear in future reprocessings?
If bootstrax fails a run, it retries it several times. If for some reason it continues to fail for a given run we change the target to raw_records to prevent us from trying using a broken event_basics for many times. Additionally, we should give it more time and less cores/max mailbox messages (as that makes it more reliable).
Suggested by @tunnell: since mini-analyses have a common interface, doing a no-crash test for them should be especially easy.
Due to some of the updates on peak processing / clustering the scope of targeting records (e.g. in bootstrax) raises an error (added below). I suspect that there is a cross reference somewhere to peaks-like structure since processing up to raw_records or peaks does work. Processing up to records however does not.
Concluding, this works:
python bootstrax.py --target peaks --cores -1 --process 6331
python bootstrax.py --target raw_records --cores -1 --process 6331
This does not work:
python bootstrax.py --target records --cores -1 --process 6331
The error:
(py36) xedaq@eb3:~/joran$ cat last_bootstrax_exception.txt
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/xedaq/miniconda/envs/py36/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/xedaq/miniconda/envs/py36/lib/python3.6/site-packages/npshmex.py", line 141, in shm_wrap_f
result = f(*args, **kwargs)
File "/home/xedaq/software/strax/strax/plugin.py", line 639, in do_compute
return self._fix_output(results)
File "/home/xedaq/software/strax/strax/plugin.py", line 312, in _fix_output
self._check_dtype(result)
File "/home/xedaq/software/strax/strax/plugin.py", line 276, in _check_dtype
raise ValueError(f"Plugin {pname} expects {expect} as dtype??")
ValueError: Plugin ParallelSourcePlugin expects {'records': dtype([(('Channel/PMT number', 'channel'), '<i2'), (('Time resolution in ns', 'dt'), '<i2'), (('Start time of the interval (ns since unix epoch)', 'time'), '<i8'), (('Length of the interval in samples', 'length'), '<i4'), (('Integral in ADC x samples', 'area'), '<i4'), (('Length of pulse to which the record belongs (without zero-padding)', 'pulse_length'), '<i4'), (('Fragment number in the pulse', 'record_i'), '<i2'), (('Baseline in ADC counts. data = int(baseline) - data_orig', 'baseline'), '<f4'), (('Level of data reduction applied (strax.ReductionLevel enum)', 'reduction_level'), 'u1'), (('Waveform data in ADC counts above baseline', 'data'), '<i2', (110,))]), 'diagnostic_records': dtype([(('Channel/PMT number', 'channel'), '<i2'), (('Time resolution in ns', 'dt'), '<i2'), (('Start time of the interval (ns since unix epoch)', 'time'), '<i8'), (('Length of the interval in samples', 'length'), '<i4'), (('Integral in ADC x samples', 'area'), '<i4'), (('Length of pulse to which the record belongs (without zero-padding)', 'pulse_length'), '<i4'), (('Fragment number in the pulse', 'record_i'), '<i2'), (('Baseline in ADC counts. data = int(baseline) - data_orig', 'baseline'), '<f4'), (('Level of data reduction applied (strax.ReductionLevel enum)', 'reduction_level'), 'u1'), (('Waveform data in ADC counts above baseline', 'data'), '<i2', (110,))]), 'aqmon_records': dtype([(('Channel/PMT number', 'channel'), '<i2'), (('Time resolution in ns', 'dt'), '<i2'), (('Start time of the interval (ns since unix epoch)', 'time'), '<i8'), (('Length of the interval in samples', 'length'), '<i4'), (('Integral in ADC x samples', 'area'), '<i4'), (('Length of pulse to which the record belongs (without zero-padding)', 'pulse_length'), '<i4'), (('Fragment number in the pulse', 'record_i'), '<i2'), (('Baseline in ADC counts. data = int(baseline) - data_orig', 'baseline'), '<f4'), (('Level of data reduction applied (strax.ReductionLevel enum)', 'reduction_level'), 'u1'), (('Waveform data in ADC counts above baseline', 'data'), '<i2', (110,))]), 'veto_regions': dtype([(('Channel/PMT number', 'channel'), '<i2'), (('Time resolution in ns', 'dt'), '<i2'), (('Start time of the interval (ns since unix epoch)', 'time'), '<i8'), (('Length of the interval in samples', 'length'), '<i4'), (('Integral in ADC x samples', 'area'), '<i4'), (('Index of sample in record in which hit starts', 'left'), '<i2'), (('Index of first sample in record just beyond hit (exclusive bound)', 'right'), '<i2'), (('Internal (temporary) index of fragment in which hit was found', 'record_i'), '<i4')]), 'pulse_counts': dtype([(('Lowest start time observed in the chunk', 'time'), '<i8'), (('Highest endt ime observed in the chunk', 'endtime'), '<i8'), (('Number of pulses', 'pulse_count'), '<i8', (248,)), (('Number of lone pulses', 'lone_pulse_count'), '<i8', (248,)), (('Integral of all pulses in ADC_count x samples', 'pulse_area'), '<i8', (248,)), (('Integral of lone pulses in ADC_count x samples', 'lone_pulse_area'), '<i8', (248,))])} as dtype??
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "bootstrax.py", line 628, in run_strax
max_workers=args.cores)
File "/home/xedaq/software/strax/strax/context.py", line 885, in make
save=save, max_workers=max_workers, **kwargs):
File "/home/xedaq/software/strax/strax/context.py", line 811, in get_iter
allow_rechunk=self.context_config['allow_rechunk']).iter():
File "/home/xedaq/software/strax/strax/processor.py", line 254, in iter
raise exc.with_traceback(traceback)
File "/home/xedaq/software/strax/strax/processor.py", line 196, in iter
yield from final_generator
File "/home/xedaq/software/strax/strax/mailbox.py", line 316, in _read
res = msg.result(timeout=self.timeout)
File "/home/xedaq/miniconda/envs/py36/lib/python3.6/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/home/xedaq/miniconda/envs/py36/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
ValueError: Plugin ParallelSourcePlugin expects {'records': dtype([(('Channel/PMT number', 'channel'), '<i2'), (('Time resolution in ns', 'dt'), '<i2'), (('Start time of the interval (ns since unix epoch)', 'time'), '<i8'), (('Length of the interval in samples', 'length'), '<i4'), (('Integral in ADC x samples', 'area'), '<i4'), (('Length of pulse to which the record belongs (without zero-padding)', 'pulse_length'), '<i4'), (('Fragment number in the pulse', 'record_i'), '<i2'), (('Baseline in ADC counts. data = int(baseline) - data_orig', 'baseline'), '<f4'), (('Level of data reduction applied (strax.ReductionLevel enum)', 'reduction_level'), 'u1'), (('Waveform data in ADC counts above baseline', 'data'), '<i2', (110,))]), 'diagnostic_records': dtype([(('Channel/PMT number', 'channel'), '<i2'), (('Time resolution in ns', 'dt'), '<i2'), (('Start time of the interval (ns since unix epoch)', 'time'), '<i8'), (('Length of the interval in samples', 'length'), '<i4'), (('Integral in ADC x samples', 'area'), '<i4'), (('Length of pulse to which the record belongs (without zero-padding)', 'pulse_length'), '<i4'), (('Fragment number in the pulse', 'record_i'), '<i2'), (('Baseline in ADC counts. data = int(baseline) - data_orig', 'baseline'), '<f4'), (('Level of data reduction applied (strax.ReductionLevel enum)', 'reduction_level'), 'u1'), (('Waveform data in ADC counts above baseline', 'data'), '<i2', (110,))]), 'aqmon_records': dtype([(('Channel/PMT number', 'channel'), '<i2'), (('Time resolution in ns', 'dt'), '<i2'), (('Start time of the interval (ns since unix epoch)', 'time'), '<i8'), (('Length of the interval in samples', 'length'), '<i4'), (('Integral in ADC x samples', 'area'), '<i4'), (('Length of pulse to which the record belongs (without zero-padding)', 'pulse_length'), '<i4'), (('Fragment number in the pulse', 'record_i'), '<i2'), (('Baseline in ADC counts. data = int(baseline) - data_orig', 'baseline'), '<f4'), (('Level of data reduction applied (strax.ReductionLevel enum)', 'reduction_level'), 'u1'), (('Waveform data in ADC counts above baseline', 'data'), '<i2', (110,))]), 'veto_regions': dtype([(('Channel/PMT number', 'channel'), '<i2'), (('Time resolution in ns', 'dt'), '<i2'), (('Start time of the interval (ns since unix epoch)', 'time'), '<i8'), (('Length of the interval in samples', 'length'), '<i4'), (('Integral in ADC x samples', 'area'), '<i4'), (('Index of sample in record in which hit starts', 'left'), '<i2'), (('Index of first sample in record just beyond hit (exclusive bound)', 'right'), '<i2'), (('Internal (temporary) index of fragment in which hit was found', 'record_i'), '<i4')]), 'pulse_counts': dtype([(('Lowest start time observed in the chunk', 'time'), '<i8'), (('Highest endt ime observed in the chunk', 'endtime'), '<i8'), (('Number of pulses', 'pulse_count'), '<i8', (248,)), (('Number of lone pulses', 'lone_pulse_count'), '<i8', (248,)), (('Integral of all pulses in ADC_count x samples', 'pulse_area'), '<i8', (248,)), (('Integral of lone pulses in ADC_count x samples', 'lone_pulse_area'), '<i8', (248,))])} as dtype??
You can get either:
Other mini-analyses, such as event_scatter, seem to be working fine. No doubt there is some underlying strax issue, perhaps related to AxFoundation/strax#181.
Does it make sense to make this installable as a package
Like #174 there are downsides of having a lot of plugins for processing (all of our) sub-detectors. Due to e.g. high energy #161, neutron veto #86 and muon veto #173 plugins there are trade-offs between having a lot of small steps in between the plugins (like e.g. peaklets -> peaks) and saving all of those and the computational effort to keep all of that data stored somewhere.
We may need to - at some point - carefully think about which datatypes we want to store/compute. As a starters some plugins could be better of not being saved e.g. by setting https://github.com/AxFoundation/strax/blob/master/strax/plugin.py#L24 or by merging it with another plugin.
Issue:
More specifically, the function st.is_stored(run_id,data_type)
will always return true if there is any data of that data_type and run in the rucio database. The lineage is not checked.
Example:
import straxen
st = straxen.contexts.xenonnt_online()
st.set_config(dict(s1_max_rise_time=50000,
peak_split_filter_wing_width=1000000))
st.is_stored('009149','peak_basics')
st.get_array('009149','peak_basics')
The data belonging to this configuration does not exist. However since is_stored will return true anyway, get_array will retrieve the data from rucio despite it having different settings
Add reading strax_fragment_length from the DAQ db to change this value for processing raw-records. Should be added in DAQ-reader and bootstrax.
Since AxFoundation/strax#277 it's possible to optimize the chunksize per plugin. This raises the default to 200 MB (uncompressed) but we could also use another value depending on the type of plugin.
For example, we can make a diagnostic plugin like pulse_counts very small.
As mentioned in #48 (comment), you currently get a ZeroDivisionError when trying to use the holoviews display. We should post the full error trace here and then try to figure out what (upstream?) change caused this to appear.
When processing data we need to specify what our targets are (https://github.com/XENONnT/straxen/blob/master/bin/bootstrax#L1064).
Up to now our life has been easy, we just consider processing the TPC data and the targets involved therein. However as soon as we also include high energy #161, neutron veto #86 and muon veto #173. What if - for some reason - either of these plugins are not processing correctly or do not finish. Do we as a last resort not process all of the plugins? This doesn't seem seems preferable.
To this end I propose to write a schema here:
Where at first we assume that we start processing all of the (sub) detectors to their latest plugins (please note that these cannot be set in a single st.make call as they are of different datakinds). If this fails we change the targets (e.g. to replace this line https://github.com/XENONnT/straxen/blob/master/bin/bootstrax#L1174 and e.g. ) The downside is that if any of these plugins have an error somewhere, it crashes bootstrax. A second try would lower the requirements to the TPC to its latest plugin e.g. providing event_info. Alternatively we might argue that live_processing should only deal with the TPC and not care about the other sub-detectors (which is not preferred, think e.g. about the raw_records_prenv
).
We could also make eb2 proces the data for the NV, MV and HE.
Vanilla solutions are obviously also options (and probably preferred).
Yesterday runs 9236, 9237, 9238 were irrecoverably lost from the event builders. The big question is how it can be that we at some point seem to have lost the data stored on /data/xenonnt_processed
or have somehow passed line 586 in Ajax
.
The reason turned out to be a faulty logic in this line that said that that are less than two hours old should be deleted.
RECONSTRUCTION
These are the events that happened (focusing on runs 9236).
"2020-08-31T18:05:47.776Z" - Processing finished
Bootstrax
successfully processed the run and then deleted the live_data at this time which is only done after we have successfully stored all the data (there is a check in set_status_finished
that makes sure the data has been written to disk). Furthermore, one can see from the deleted entries that we have saved .
"2020-08-31T19:.." - Ajax
deletes the 'unregistered' data
The bug was here
7208899 MainThread root clean_unregistered:: found 398 runs stored on/data/xenonnt_processed/. Checking that each is in the runs-database
7209768 MainThread root remove_if_unregistered:: run 9236 is NOT registered in the runDB
7209768 MainThread root No data for 009236 found! Double checking /data/xenonnt_processed/!
7209769 MainThread root Cleaning /data/xenonnt_processed/009236-raw_records_nv-rfzvpzj4mf
7209770 MainThread root Cleaning /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf
"2020-08-31T20:18:53.934Z" - Ajax
removes entries from runs-database
In the clean_database routine, ajax
notices that this run is stored for >2 h and that processing has finished. For this we check if the data is actually stored on this host on line 586. The corresponding output from ajax
is added below:
10812030 MainThread root Loop finished, take a 3600 s nap
14412139 MainThread root clean_unregistered:: found 396 runs stored on/data/xenonnt_processed/. Checking that each is in the runs-database
14412978 MainThread root clean_abandoned:: No more matches in rundoc
14413442 MainThread root clean_database:: delete entry of data from 9236 at /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf as it does not exist
14413442 MainThread root deleting /data/xenonnt_processed/009236-raw_records_aqmon-rfzvpzj4mf finished
14413442 MainThread root changing data field in rundoc
14413442 MainThread root update with {'host': 'eb5.xenon.local', 'type': 'raw_records_aqmon', 'file_count': 36, 'at': datetime.datetime(2020, 8, 31, 20, 18, 53, 934849, tzinfo=<UTC>), 'by': 'eb5.xenon.local.ajax'}
...
"2020-08-31T20:19:05" - Bootstrax
notices that all processed data is now removed and fails the run
Please note that this further substantiates that the processing did occur as needed.
At the moment we stop at raw_records_mv. If we want to reproduce the 1T-like muon-veto the bare minimum is that we build some muon_veto regions where timestamps in between are vetoed. This should be possible with the existing peak-finding.
If an nfs mount is not correctly configured microstrax may not be able to read the from the datadirectory where the latest data is. We just noticed that remounting the disk solved the issue.
When starting microstrax we should check that all folders that are being registered are actually accessible for microstrax as otherwise it raises the complaint that the metadata is not available for said run.
The proposed one DAQ one DAQreader solution of the linked-mode requires a change of the DAQreader. Currently the dt field is specified as an option which wont work any longer.
In strax, the left and right boundaries of a hit are set by the region that actually crossed the threshold. To ensure we include the full area in integrations, we extend the boundary of peaks outwards by some amount.
For lone hits, we do not do this, but instead report the hit area directly. Thus, unless the pulse compression filter is activated, the lone hit areas will be biased downwards very significantly compared to actual 1 PE areas. With the pulse compression filter active, the lone hit area is instead biased slightly upwards, since the filter can cause 1PE pulses to become slightly negative around their maxima.
We could accept this, change the hit definition, or compute lone hit integrals with the left/right extension. The latter should probably be a separate function rather than changing the hitfinder, unless we want to apply the extensions without regard to neighboring peaks/hits or record breaks.
Currently pulse_counts and veto_regions area not rechunked on saving. That meaks we get a lot of small files, which is problematic for data storage.
If we would rechunk it, we couldn't use it for online monitoring of the pulse rate anymore (though the website doesn't support this yet) since it would only write one chunk to disk at the end of the run.
The easiest solution seems to be to have two savers: one without rechunking (for use in monitoring) and one with (for storage). The alternative would be to re-pack the data after writing it but before transferring it.
Currently bootstrax doesn't compute the veto_regions available since #207.
There are two ways to do this:
save_when.NEVER
. This will simplify strax and a single st.make
would be needed.Hello,
In attempting to walk through beginning tutorial steps, I came across a tensorflow failure to find graph elements at straxen/plugins/peak_processing.py", line 222
. E.g.,
ValueError: Tensor Tensor("dense_6/BiasAdd:0", shape=(?, 2), dtype=float32) is not an element of this graph.
Upon investigation, I see that special cases were put into place for the peak_processing
plugin to use tensorflow v2. Unfortunately, it looks like the condition for checking for v2 is insufficient.
straxen/straxen/plugins/peak_processing.py
Line 176 in 62d964d
Perhaps one could use parse_version(tf.__version__) >= parse_version('2.0.')
, since 2.0 is still pre-release?
pulse_counts are very useful to monitor the PMT count rate in our TPC. To be able to cross check if a higher PMT rate is caused by either an increased signal rate or change in noise, I would like to propose to add two additional fields to pulse_counts. The first field stores the average baseline value per PMT and in the second field we store the average baseline rms per PMT. Just as a comparison:
Loading pulse_counts of a single 1h nitrogen run takes 1.10 s while loading the baseline and baseline_rms values stored in records of the very same run takes about 10 min.
Bootstrax writes warnings and messages to the daq database such that they can be displayed on the website. Change the following:
infermode: .... lowering mode to ...
should be a message not a warningThe data frames come with "comments" st.data_info('event_info'); If we had a separate column "units", that would allow scripts to automatically pull the correct units for axis labels.
The DAQ provides timestamps in ns since run start with resolution of the ADC sample size. This needs to be converted to 'ns since unix epoch'. This is hard to do in the DAQ but maybe it can be done in the DAQReader plugin at the stage where all the sub-files from each readout thread are combined.
In the current straxen master, if I run:
import strax
import straxen
print('strax', strax.__version__)
print('straxen', straxen.__version__)
st = straxen.contexts.xenon1t_dali()
run_id = '170204_1410'
st.waveform_display(run_id, seconds_range=(0, 0.15))
I get the following error:
---------------------------------------------------------------------------
DataNotAvailable Traceback (most recent call last)
<ipython-input-3-95868e942342> in <module>
----> 1 st.waveform_display(run_id, seconds_range=(0, 0.15))
/opt/conda/envs/strax-dev/lib/python3.6/site-packages/straxen-0.9.0-py3.6.egg/straxen/mini_analysis.py in wrapped_f(context, run_id, **kwargs)
113 config=kwargs.get('config'),
114 register=kwargs.get('register'),
--> 115 storage=kwargs.get('storage', tuple()))
116
117 # If user did not give time kwargs, but the function expects
/opt/conda/envs/strax-dev/lib/python3.6/site-packages/strax/context.py in get_array(self, run_id, targets, save, max_workers, **kwargs)
905 max_workers=max_workers,
906 **kwargs)
--> 907 results = [x.data for x in source]
908 return np.concatenate(results)
909
/opt/conda/envs/strax-dev/lib/python3.6/site-packages/strax/context.py in <listcomp>(.0)
905 max_workers=max_workers,
906 **kwargs)
--> 907 results = [x.data for x in source]
908 return np.concatenate(results)
909
/opt/conda/envs/strax-dev/lib/python3.6/site-packages/strax/context.py in get_iter(self, run_id, targets, save, max_workers, time_range, seconds_range, time_within, time_selection, selection_str, keep_columns, _chunk_number, **kwargs)
755 save=save,
756 time_range=time_range,
--> 757 chunk_number=_chunk_number)
758
759 # Cleanup the temp plugins
/opt/conda/envs/strax-dev/lib/python3.6/site-packages/strax/context.py in get_components(self, run_id, targets, save, time_range, chunk_number)
631
632 for d in targets:
--> 633 check_cache(d)
634 plugins = to_compute
635
/opt/conda/envs/strax-dev/lib/python3.6/site-packages/strax/context.py in check_cache(d)
551 to_compute[d] = p
552 for dep_d in p.depends_on:
--> 553 check_cache(dep_d)
554
555 # Should we save this data? If not, return.
/opt/conda/envs/strax-dev/lib/python3.6/site-packages/strax/context.py in check_cache(d)
551 to_compute[d] = p
552 for dep_d in p.depends_on:
--> 553 check_cache(dep_d)
554
555 # Should we save this data? If not, return.
/opt/conda/envs/strax-dev/lib/python3.6/site-packages/strax/context.py in check_cache(d)
539 # other requested data types is not.
540 raise strax.DataNotAvailable(
--> 541 f"Time range selection assumes data is already "
542 f"available, but {d} for {run_id} is not.")
543 if '*' in self.context_config['forbid_creation_of']:
DataNotAvailable: Time range selection assumes data is already available, but peaklets for 170204_1410 is not.
We should update
Line 280 in 88d3e4e
See https://straxen.readthedocs.io/en/latest/reference/datastructure.html. However, pulse_counts and lone_hits are listed.
An idea by @darrylmasson is to have ebs0-2 would only start processing in case eb3-5 are processing extremely high data rates. To this end we can have the older ebs query the runs-database to see in the bootstrax collection if the new ebs (eb3-5) are already in some runs for some time. If that is the case, also use these older ebs only contribute to processing in case of extreme data rates. Otherwise they wouldn't do much (as they are slower in processing the runs, see https://xe1t-wiki.lngs.infn.it/doku.php?id=xenon:xenonnt:dsg:daq:eb_speed_tests_update#conclusion).
Could be added to #74.
I've saved run 9282 as an example under /live_data/009282.
It simply says it's stuck.
Maybe it would be a nice feature to add a function clear_all_pycache
or the like to delete all __pycache__
folders. Sometimes (especially after doing a git pull) Numba throws some errors that can simply be solved by removing everything stored in several pycache_ folders.
On the other hand I’m not sure if this is a feature that strax/straxen should have. Even better of course would be if numba wouldn’t be failing (than this monkey-patch would be obsolete).
If others see merrit in this too I'll come up with a simple function to add to straxen.
After hit finding it might be possible to end up with emtpy records (data fields completely ZLE). Removing these records would gain us a bit in terms of speed and performance.
In our current implementation of the rundb.py straxen raises an error when fried rice is down:
~/mymodules/straxen/straxen/rundb.py in _find(self, key, write, allow_incomplete, fuzzy_for, fuzzy_for_options)
153 'protocol': 'rucio'}}}
154 doc = self.collection.find_one({**run_query, **dq},
--> 155 projection=dq)
156 if doc is not None:
157 datum = doc['data'][0]
# (some long mongo trace back)
ServerSelectionTimeoutError: fried.rice.edu:27017: timed out
This issue was introduced with https://github.com/XENONnT/straxen/pull/164/files .
In most cases analyzers will work with locally stored data and hence do not need to access anything via the rundb storage system. Hence I propose rather to throw a warning and to drop the rundb storage system from the registered storage rather than raising an error and stopping any ongoing analysis.
In #36, peak_classification was removed. This seems to be used in st.waveform_display
but I am not sure of the appropriate fix.
Additionally, the demos should be updated to reflect peak_classification missing. For example: https://github.com/XENONnT/straxen/blob/master/notebooks/tutorials/strax_demo.ipynb
See:
st.waveform_display('170204_2111', seconds_range=(0, 0.15))
KeyError Traceback (most recent call last)
<ipython-input-11-3f3d31936807> in <module>
----> 1 st.waveform_display('170204_2111', seconds_range=(0, 0.15))
/dali/lgrandi/strax/straxen/straxen/mini_analysis.py in wrapped_f(context, run_id, **kwargs)
91 if len(requires):
92 deps_by_kind = strax.group_by_kind(
---> 93 requires, context=context, require_time=False)
94 for dkind, dtypes in deps_by_kind.items():
95 if dkind in kwargs:
/dali/lgrandi/strax/strax/strax/utils.py in group_by_kind(dtypes, plugins, context, require_time)
472 if context is None:
473 raise RuntimeError("group_by_kind requires plugins or context")
--> 474 plugins = context._get_plugins(targets=dtypes, run_id='0')
475
476 if require_time is None:
/dali/lgrandi/strax/strax/strax/context.py in _get_plugins(self, targets, run_id)
413 plugins = collections.defaultdict(get_plugin)
414 for t in targets:
--> 415 p = get_plugin(t)
416 # This assignment is actually unnecessary due to defaultdict,
417 # but just for clarity:
/dali/lgrandi/strax/strax/strax/context.py in get_plugin(d)
359
360 if d not in self._plugin_class_registry:
--> 361 raise KeyError(f"No plugin class registered that provides {d}")
362
363 p = self._plugin_class_registry[d]()
KeyError: 'No plugin class registered that provides peak_classification'```
Written with @tunnell
The majority of the plugins omit a description. For documentation we should add a description to all plugins.
For a plugin description see e.g. the DAQreader:
https://github.com/XENONnT/straxen/blob/master/straxen/plugins/daqreader.py#L62
Perhaps @WenzDaniel can help with the NVeto plugins
Observation from @jpienaar13: it would be useful if straxen had an enum (see https://docs.python.org/3/library/enum.html#intenum) that encodes that 1 corresponds to S1, 2 to S2, and higher numbers to potential future peak types we might want to add.
#128 loads data and shows the graph. But the waveform_display graph has a little problem: Notice the PMT number goes from 0 to 1, when it should go from 0 to number of channels.
The code to reproduce the plot is:
import straxen
st = straxen.contexts.xenon1t_dali(build_lowlevel=True)
run_id = '170204_1710'
df = st.get_array(run_id, "event_info")
event = df[4]
st.waveform_display(run_id, time_within=event)
Luca pointed out the a problem with run 8675 where the number of files was uploaded to the rundb by eb5. However eb4 claimed to be the one that correctly processed the run but didn't include the filecount.
Reconstruction of events
It's really a conglomerate of bad screw-ups and quite unlikely events. I'm going to make corresponding issues on bootstrax/strax. Let me summarize what happened:
All of these things seem very unlikely but somehow it all happened to this one run.
Bottomline: it was bootstraxs' fault and didn't update the filecount on eb4.
Issues to fix
Currently, Travis builds spend a long time compiling some C++ modules, according to a long stream of warnings like
cc1plus: warning: command line option ‘-std=gnu99’ is valid for C/ObjC but not for C++
It's not super clear what the main culprit is, maybe grpcio? Probably we can install some things via conda to avoid this.
We need to make sure to update pulse_length in
https://github.com/XENONnT/straxen/blob/master/straxen/plugins/nveto_pulse_processing.py#L69 as it might lead to funny behavior down the chain.
When strax registers new rundb entries, the host field
Line 158 in 4c59975
We should also include the lineage hash of the data (e.g. in the meta field under lineage_hash) rather than just the lineage. This will make searching a lot easier.
For multi-output plugins, currently only the configuration options of their parent plugins are displayed in the straxen docs. This should be an issue with build_datastructure_doc.py.
For online monitoring we do want to have the high level plugins. These will not be pulled by admix and preferably also not deleted by admix. As such, we need to do our bookkeeping. We would have to delete data that:
Problem
Something goes wrong when plotting small time ranges in st.plot_peaks.
(start time is 1594353486000000000)
Traceback
https://gist.github.com/jorana/c0b8e36486250ca38e25b96d59237624
See runsdatabase
MergedS2s is causing problems when processing an empty set of peaklets. It doesn't seem to get to the compute part https://github.com/XENONnT/straxen/blob/master/straxen/plugins/peaklet_processing.py#L170, hence it also does not get to the if
statement at the end of compute (https://github.com/XENONnT/straxen/blob/master/straxen/plugins/peaklet_processing.py#L219).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.