coderdj / kodiaq Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 3.0 7.45 MB

Readout software for digitizers

License: Other

Shell 1.04% C++ 88.94% Makefile 8.48% M4 1.55%

kodiaq's People

Contributors

Stargazers

Watchers

Forkers

tunnell omar-challouf

kodiaq's Issues

Avro output

XENON1T will apparently be using avro for file storage. Avro is a serialization package which stores data according to a schema. The schema is then embedded into the file, meaning avro can read the file again later without having any idea what's in it.

Pax is writing avro already. We should write with the same schema so that the pax input plugin can read our data files directly. There's also a bit deeper issue here. Right now we don't do any 'event grouping' logic in the output script. As in there is no attempt to combine data from different boards that come from the same trigger/time window (well sort of there is for protobuf output, but it's barely tested). This might be useful in standalone deployments so maybe it makes sense to add such logic.

Required changes:
* Add new options 'avro output' change current 'file output' to 'protocol buffer output', though this format seems to be dead it is already working so we will leave support in
* Add new avro derived class to DAQRecorder. This handles the actual writing.
* Add new configure logic to test for avro libraries and compile this in as needed.

Smart BUSY handling

In saturation conditions the busy could become an issue. Imagine this: we run with 32 digitizers in a high rate. A digitizer fills up and emits a busy stopping input. We clear the digitizer and very quickly another digitizer fills, emitting another busy. In between busies there was just a tiny amount of time and some events have to be discarded since the busy cuts into them.

What we would rather do is give the maximum amount of time possible between BUSY signals, which means always clearing full boards last. This means that at the time the BUSY is cleared every board is guaranteed to have an empty buffer and acquisition can run for the maximum time possible before the next BUSY is emitted.

I don't know if this will work in practice. But there is a read-only register for the BUSY where the board's BUSY status can be queried. We could read this register before reading out each board. If it happens to be positive we set a boolean flag in the board object (bSeenAsBusy or something) to true and continue cycling through the boards. If we come across a full board where bSeenAsBusy is already true we set it to false and read the board out.

This is medium priority but it should be done before production (flagged for 1.2 release).

Feature Request: Standalone version for small deployments

It should be possible to install a standalone version of the DAQ which runs on one PC and writes files directly. An example use case would be for a PMT test station somewhere where a board or two need to be read out via one PC.

The design requirements are:
* Write to file instead of to mongodb
* Provide a tool to read the files (or at least a class)
* Remove mongodb dependency from the slave module for this mode
* Easily editable options file which is available in the top directory
* Easily executed run script in the top directory
* Simple UI (probably console text based)
* Should compile with a ./configure --litemode or similar option
* Needs docs to give a step-by-step guide to deployment

Problem adjusting the DC offset

We (Maria and I) have tried to adjust the DC offset of the V1724 (and the V1730) in order to preserve the full dynamic range, but we weren't able to change it at all. We used the register 8098 and tried several values (including the examples in the config file, such as 1000hex, 2000hex, 0hex and F000hex) but the offset in the recorded waveforms does not change at all. For the V1730, the baseline corresponds to roughly 8200 counts (so unipolar) and for the V1724, it is stuck at the maximum value of 16383 (so overflow). NIM pulses are recorded fine.
We contacted CAEN about this issue, and they say the following:

"It should be necessary to implement:

Read access to the Channel n Status register (n = the index of the channel you want to apply the setting to)
Check the bit[2].
If bit[2] = 1, then the DAC register must not be written, so perform point 1 again.
If bit[2] = 0, then you can write.
Write the desired value in the Channel n DAC register (n = the index of the channel you want to apply the setting to) .
Read in the same register to check the vaule has been updated (optional).
Insert a delay (e.g. a sleep) of few seconds in order to wait for the ADC outputs to get stabilized.
Start run the acquisition and check the signal offset is changed accordingly to you setting."

i.e. they refer to the register manual where it says that the value of this bit must be checked, because otherwise, the writing process might not work properly. Is there such a check in kodiaq? Is it necessary, or might it be some other problem?

Asthetic change in Mongo address format

In the run database documents, there is a field at 'reader.storage_buffer'. Currently, MongoDB addresses are given with the keys "dbaddr", "dbname", and "dbcollection". For consistency with pax and clarity, can we rename this to "address", "database", and "collection" respectively? It's not critical, but just means that we don't have to map between the two naming conventions much. If you think the kodiaq convention is clearer, we can also just change it in pax.

-Chris and @ErikHogenbirk

Registers 8098, 8028 and 8024 in config file

Hi,

When reading the kodiaq config file we encountered three registers that are not listed in the V1724/V1730 register manual from CAEN:

8098 DAC configuration for each channel
8028 Zero Suppression look back and look forward
8024 Zero suppression over threshold

Is there a xenon firmware documentation where we can look their meaning up?

Mongo collection based on run number

Right now, in the ini file, the Mongo collection is set manually. However, this leads to an issue at least in standalone mode where we start a new run, and the data collides with the event building processing from the previous run. How are we doing this in XENON1T?

I would say that the ideal situation is that kodiaq writes to a database called 'raw' or something. The collection name, that is specified in run doc, is then just the run number. Is this already implemented? Would this be hard to implement? Or do you have a better idea?

Deployment of the full system with single board

Hi,

I've been trying to get the full system running (so the master and slave module, with the web interface) on a single PC, where both the master and slave live. I have followed the installation procedure here:
http://xenon1t.github.io/kodiaq/deployment.html
and everything seems to be properly installed. According to the procedure, I should now start the slave daemon, and then the master. This is where I run into problems. First of all,

"The connection to the master must also be defined. Right now this is hard-coded in the koSlave.cc file. The line fNetworkInterface.Initialize(...) must be edited to give the proper address of the master. It is forseen to put this in a config file."

The line that I have in koSlave.cc is
"fNetworkInterface.Initialize("xedaq02",2002,2003,4,"xedaq04")".
I don't really see what this means, and what I should put here instead.

If I just ignore this and try to run koSlave, nothing seems to happen. It executes something, but gives no output at all. This does not change if I include an ini file in the folder or if there is no digitizer connected.

Then, when I run koMaster, it gives
"getaddrinfo("xedaq01") failed: Name or service not known
Error reading initialization file MasterConfig.ini! Closing."
This name xedaq01 also seems to be hardcoded, in MasterMongodbConnection.cc, line 30. I suppose this has to be the same address as in the koSlave.cc?

Create event builder search index during collection creation

If the index is created in the event builder, it will freeze up Mongo until the index is created (and stall the DAQReader). Therefore, can you create this index in your output collection:

[('time', -1), ('module', -1), ('_id', -1)]

Race condition at start of event

If kodiaq writes to the run db that there is data waiting_to_be_processed then crashes before it actually writes the data, then the trigger will freeze. I'll add a timeout on the trigger, but maybe also have it update the run_doc once the data starts coming with the location of data and that the trigger is requested.

Exception handling slave module

The slaves need to fail better. A few things:

mongo connection
CAEN functions (initializing digis, reading BLTs, read/write registers)
file output

Failures of these components should ideally not bring the whole thing down.

Caen Read Error (ENUM = -2)

Seeing this error message:

    2014.11.26 [17:10:56] -  [!ERROR!] Board 876 reports read error -2

    2014.11.26 [17:11:11] -  [!ERROR!] Board 876 reports read error -2

    2014.11.26 [17:11:26] -  [!ERROR!] Board 876 reports read error -2

Continued roughly every 30 seconds, all night. Did not notice until stopping the DAQ when it lagged out for a while before shutting down the board properly. The run was started 26.11.2014 at 17:10, so these errors started immediately.

Two things:

Should find out why this is happening and stop it if possible. Caen ENUM -2 is a communication error. It is reported back when the board is read instead of '0', which means no error. This has happened before but not in the last two months (log file goes back this far).
If this does happen the first instance, or maybe one per hour, should raise a warning in the UI. Actually these errors were not present in the online error log at all, just in the offline one.

Times of data documents begin at 56 minutes

Possible problem in XeProcessor.cc. The 'time' field doesn't start from zero. Here is an example:

  {
  "_id": { "$oid" : "5319e81cb8314feb044b9842" },
  "channel": 6,
  "insertsize": 26,
  "override": false,
  "module": 770,
  "time": 389643835126,
  "zipped": true,
  "datalength": 184,
  "data": { "$binary" : "jAI4gT6CPoI+fz6BPoE+gz6ABQIIgj6DBQQAgwUIASAJJhEGAR4BFgEEETgIgT6EDUgAgg0oCII+gBVU8EDWNbcopCU1JUYlZCV9JX0laSVUJUclRiVJJVAlWSViJWwlcyVxJWMlTyU6JTUlPiVTJWwleyVxJV0lsihUOa89aQVMAXAAhAV+CZoAgQ0SAIUFFAm2CQYBIACDBSABCgCCBaYBLgEUEQgBDDSEPoI+gj6DPoM+gT6CPg==", "$type" : "0" }
}

This is in contrast to the field 'starttime' in the control doc. Either 'starttime' in the control doc should be set to the true start time, these should start from zero, or the 'strarttime' should be moved and left to cito.

Snappy not included in installation docs

It's just forgotten in the installation docs.

Crate controller must be controlled by DAQ master

This is unfortunate but there seems to be no other solution.

EDIT: Actually there is another solution. The muon veto group can just buy a V2718. Then the problem is solved.

We need two things to work in order to read out the muon veto:

The muon veto and TPC must work in "joint" mode (started/stopped at the exact same time)
The muon veto and TPC must work completely independently
The muon veto and TPC should both be started with an s-in in order to synchronize clocks

The current setup does not allow bullets 2 and 3 simultaneously. The issue is the CAEN interface. CAENVMElib allows only 1 interface to the PCI card, even if the PCI card has multiple optical links. This means that if the crate controller is to be sent commands independently of the rest of the DAQ it has to sit by itself on the PCI card of the dispatcher.

The most straightforward way to do this (though it requires at least a new class and quite a bit of code) is to have an interface to the V2718 through the master daemon. The .ini file for the dispatcher must then contain the information of which LEMO output on the V2718 corresponds to which detector (thankfully the idea of 'detectors' was already defined for the master). When starting a run the master will send the start command to the proper detectors, wait for them to report back running, then put the s-in up. In case they don't report back running it should time out and return an error and blah blah blah.

The master also handles starting and stopping the LED.

Run mode files should probably define which LEMO output gets set (always negative-on at start of run and off at end). So there would be a new line in the options files called something like NIM_ON {int} which tells that port {int} gets toggled at start/end. We would allow multiple NIM_ON per ini file since we need 2 for LED modes.

Open question: Will the DAQ also control the muon veto LEDs? In this case all 4 slots on the front of the CC are used with no spares. If we need additional ones we need something like a V513 I/O register (but not that one since the operating voltage is wrong for our VME crates).

Additional open question: Should the CAEN stuff go into common? At least the VMEBoard and V2718 classes should I guess. This adds a dependency to common but should mean no duplicating code. Since there is no client library any more it makes sense that the CAENVMElib dependency is required for everything.

Run data document inserted by master

In order to save general run information, it would be helpful to insert a document containing such information into the mongodb corresponding to the run. This would be done by the master.

Information to be included:
runtype - the mode
latesttime - the most recent time stamp that all boards have seen (this is complicated and will be included in a separate issue)
compressed - are you using snappy to compress the raw data?
user - who started the run
ini file - include the entire initialization file for storage with the run data
event builder options - yet to be determined, possibly a .ini file for event builder
slow control parameters - any interesting monitoring parameters

eventually:
messages - allow user to insert a special message that is saved with the run
errors - either the reader or event builder could set an error flag that is checked by the other. This could be a way to communicate issues. Since there is a reasonably sized buffer (mongo) it shouldn't matter if there is some latency between when an error is set and when it is read.

This involves:
-- Adding a dependency on mongodb to the master. Also update the configure.ac file to check for mongo in case the master is included in compilation.
-- Adding a class to the master to handle communication with mongo (before this enhancement the master is very simple, just a single .cc file)
-- Parsing the options file with the master. This is already done in a limited way to read the DDC-10 data. Update this function to read also the mongodb connection information.
-- Adding a timer to the main event loop to update latest times, etc.

Baseline determination with new firmware

Baseline determination is extremely unstable with the new firmware. This is probably due to changing registers on the boards.

Want a routine that works for all firmware versions.

Ubuntu 14.04 and CAEN PCI Cards

I've been having issues with the Ubuntu Kernel's compatibility with Caen's PCI cards. I already requested Caen update the drivers for the A2818 (one of the PCI card models), which they then patched to be compatible with Linux 3.10. Everything worked for a while but I ran a normal system update on one of the readers yesterday and it caused the Caen PCI card (this time A3818) to brick the kernel every time it was activated. I would like to install the DAQ at LNGS with the most updated software available at installation time and then freeze updates once we have a working system unless there is some incredibly compelling reason to upgrade something.

The weird thing is the Ubuntu 14.04 LTS is using Linux 3.13, which is a non-LTS kernel. So once official patches stop coming for this kernel I assume Canonical plans on keeping it up to date themselves for the rest of the 14.04 support period. Every other server distro is using an older LTS kernel, even the ones that have launched more recently.

So what I will do is contact Caen about updating their PCI card drivers for 3.13. Since there is a lead time of at least a few weeks on this and I wanted an un-bricked reader in the short term I had the choice of either downgrading the kernel updates on the Ubuntu system or (what I chose) trying to install the DAQ on an all new OS running the most recent supported kernel for the PCI card, which is 3.10. This allowed me to test installation and operation on a new OS, which went perfectly except for the issue with the DDC10.

Event length depends on channel mask

This is not an issue with the DAQ code but with the V1724 firmware (original).

Setting a channel mask (Reg 8120) to anything other than 0xff (all channels) changes the size of the acquisition window. More precisely, disabling all but one channel seems to shrink the acquisition window by a factor of 8.

Say we just put data on channel 2 with register 800C set to 9 (1k samples) and no custom event size.

If 8120 == 0x4
Channel 2 has 128 samples
No other channels in event

If 8120 == 0xff
All channels have 1000 samples (including channel 2)

This is a CAEN bug apparently. Can report to their tech support.

Create option for mongodb output format

We now have a new output format required by the event builder. This is less useful for, for example, our standalone pmt test setup. There should be an option to toggle between formats. Right now I just commented the 1 line insert at DAQRecorder.cc line 173 and use the more complex insert directly following that. These should be wrapped in if checked of the new parameter in the options object.

Update to koMaster interface/usability

The master interface needs an update. At the bare minimum it must:

Show how many (and which) slaves are connected
Show the rate and status of each slave
Show the data file and output format
Show some logging messages including any errors reported by the slaves
(possibly) allow independent or joint control of detectors (this is harder from a UI standpoint but should probably be in)

Basically it should look something like the new interface for the standalone slave (can move this class to common and generalize it maybe) but with some added features.

This is important since we will want to control the DAQ in networked mode via the master without a web interface during some tests and during a lot of commissioning. We should have an idea in this case what is going on with the system. The information is already there in the dispatcher, it just has to be displayed.

Flagged for v1.2 release.

Writing files super slow

While the libpbf class has been tested up to over 100 MB/s write rate, the implementation in kodiaq must be off. Write rates are about 1MB/s.

Writing to mongo is must faster.

Reading out V1724 via front optical link

Hi,
I created this issue so we can more easily keep track of our discussion about reading out the V1724 via front optical link.
The state now is that kodiaq is able to communicate with the digitizer ("Found a board with link 0 and crate 0¨ is printed in the terminal) and after looking into the code I understand now why it is crucial to define a VME bridge in the config file.
Right now we still only see "Rate: 0MB/s Freq: 0Hz". Is it right that for an external trigger we only need a NIM/TTL pulse? Also the default config file should allow data acquisition already, is that right?
Thanks,
Maria

Speed limit ~54 MB/s

Program is getting hung up on one of the mutexes. Suspect is the one in CBV1724 (when trying to lock data buffer).

Check CBV1724::RequestDataLock to make sure it isn't doing anything silly.

Does kodiaq support V1730?

It's essentially a V1724, but faster.

Noise spectra

In our package of monitoring tools we somehow forgot noise spectra. Oops. This is a combined job for the reader and online monitor program.

It should basically go like this. When you start a run the DAQ does the baseline determination (already implemented). It should then just take a noise spectrum automatically. Do like so:

User starts run
Baselines determines
Noise spectrum recorded to special DB on buffer machine
Acquisition starts normally
In the meantime a script copies the noise data to the online database (has to be written to UG buffer first because we want full operation to work even if connection to surface is cut, otherwise could write directly to surface)
User can browse noise spectra on web interface to see just how long a channel was noisy before the shifters noticed. Though actually it should be so easy to view the spectra that the shifters habitually check them after starting a run.

This will require a new view in the online monitor but it should be able to re-use some methods from the 'waveform viewer' view. A separate issue will be filed there once it is decided exactly how to implement this from the reader side.

Reasonable digitization of NIM signals

Not sure if this should be done in kodiaq or not. We have some channels in the acquisition monitor board that will only receive NIM signals as data. We are really only interested in when these signals start and end and everything in between is not needed.

Since we already look into the data online we could possibly flag these channels as NIM channels. Then we look at the data and determine when the NIM goes positive. At this point we create a data doc with the time, module, channel, and some flag NIM_ON. We continue scanning the waveform and at the end of the NIM (when it goes back to baseline) we make another doc with NIM_OFF. Then we only record NIM_ON and NIM_OFF and not all the useless samples of data.

There is a technical problem here for any NIM that is longer than 1ms or so. We absolutely must use the self-triggering firmware for the A.M. data, but this firmware can ONLY read out events less than around 1250 mus. We simply don't get events longer than this due to a limitation of the firmware (they are not read out). Therefore our readout cannot support NIMs that are longer and we will have to use something like a dual-edge discriminator to convert this to a shorter signal in hardware. It should be clearly written that signals longer that 1250 mus are NOT supported for A.M. data.

This is flagged for 1.2 release since it will be needed for production. The event builder should also be aware of this special data. We will also need to decide how to put it in the data stream but that's not a kodiaq issue (standalone file output does not and will not support acquisition monitor data).

Crate controller pulsing output

We need to provide the LED driver with a clock of something like ~100kHz. The V2718 crate controller should be able to do this. The following features are needed:

koOptions:

Option to turn pulser on/off on a specific V2718 output
Option to set the frequency of the pulser

CBV2718:

Functionality to interpret those options in the Initialize function.

A very general interface where we can set each output to NIM/pulse/off could be explored but it probably isn't necessary. Using a simple, static setup the V2718 would then have a configuration like:
Output_0: s-in to fan
Output_1: trigger gate (NIM true if trigger mode, false if veto mode) (needed?)
Output_2: LED pulser
Output_3: Muon veto gate (needed?)
Output_4: undefined (edit: see next post for proper config)

Reset counter gets confused at high rates

The counter to keep track of how many times the digitizer timers reset seems to get confused at high rates. A race condition is ruled out because there is a mutex on the counter itself (there is one global counter which several threads have to access).

Possibly at high rates the buffer gets very large and the processing is delayed (that's why we run 8 processing threads anyway) so that chunks of data that are seconds apart are being processed simultaneously.

Need a better way to keep track of counter resets.

Tell dispatcher which nodes belong to which detectors

The dispatcher needs to know which readers are 'tpc' and which are 'muon_veto'. There are a few ways to do this. Simplest is probably to just add a table in the .ini file such as:

    DETECTOR 0 tpc
    DETECTOR 1 tpc
    .
    .
    DETECTOR 9 muon_veto

Then the dispatcher has to store this in the DAQ manager. This allows the following features to be implemented:

Start and stop runs for detectors independently (see XENON1T/OnlineMonitor#14)
Provide status data for each detector independently (note: add 'other' field in case some node was not defined with a detector?)

Unable to set DC offset

We are unable to set the DC offset in kodiaq standalone mode. There is an option in the config file:

8098 DAC configuration for each channel
(1000 = negative unipolar + offset, 8000=bipolar, F000=positive
register 8098 8000

The value set here does not influence the baseline at all (tried different values). We try different options for:

baseline_mode {int}: 0 - no baselines (baseline from file), 1 - at start of each run
baseline_mode 1

including option 2 in the newer version of kodiaq, which does nothing with the baselines. Options 0 and 2 give a bipolar range (0V = 8000 ADC counts), option 1 sets the DAC offset to unipolar negative (0V = 16200 ADC counts).

Maybe I don't get the config, but does 'determine baseline' here just mean 'measure baseline' or also 'set baseline'? Is it supposed to have such behaviour?

Parsing modes should take channel mask into account

The XENON1T imitation parsing modes (which separate channels) only work if the channel mask is FF. Email from Cyril:

"So indeed if I use register 8120 FF in .ini file, then using processing_mode 3 works nicely, I see my waveforms in paxer.

But the problem comes if I set the register to 1.
The thing is that I found that in DataProcessor::SplitChannels, you do:

  if(!ZLE) channelSize = (((*buffvec)[x][idx]&0xFFFFFF)-4)/8;

But this only works if the mask is FF, right ?
If my mask if 1 (one channel active), then the channelSize is wrong.
Should not this be parametrized to take the value of the mask ? "

This should be reasonably simple logic but it has to be included.

Number of samples enabled depends on number of enabled channels

When using kodiaq in standalone mode, we set the number of samples in one waveform by setting the buffer organization:

which corresponds to 5 kSamples / waveform on the V1730 (and 500 Samples / waveform for the V1724). When we turn off half of the number of channels (i.e. register 8120 0F instead FF), the number of samples in one waveform halves as well (so 2.5 kS for V1730, 250 samples for V1724). This is a strange effect, especially since it looks like we're directly writing the registers here. I checked if this problem also occurs in wavedump, but it doesn't, so I don't think it's a firmware issue.

System crashes at high data rates

When using kodiaq in standalone mode, the DAQ PC freezes when we have a high data rate. The optical link supports data rates up to 80 MB/s, but when we have such a high data rate, the PC is unable to write to MongoDB, the RAM quickly fills and the system crashes.

Also, when there is a high data rate for a short period of time, the memory of the PC fills, but when the data rate is reduced after that, the memory does not free up any more until we stop kodiaq. Shouldn't the buffer be emptied somehow?

Is there any way how this can be fixed, or should we just limit the trigger rate somehow?

libtool makes koUser into wrapper script

Only when using ./configure --enable-all libtool makes the koUser exe into a wrapper script instead, which turns out to be useless.

Possible solution: drop autotools since it's a complicated piece of garbage

Another solution: spend a day debugging

Random occaisional seg faults

The worst bugs are the ones that rarely happen.
#0 __memmove_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:1517
#1 0x000000000041d414 in __copy_m (__result=,

__last=<optimized out>, __first=<optimized out>)  
at /usr/include/c++/4.8/bits/stl_algobase.h:372

#2 __copy_move_a<false, unsigned int*, unsigned int*> (__result=,

__last=<optimized out>, __first=<optimized out>)
at /usr/include/c++/4.8/bits/stl_algobase.h:390

#3 __copy_move_a2<false, unsigned int*, unsigned int*> (__result=,

__last=<optimized out>, __first=<optimized out>)  
at /usr/include/c++/4.8/bits/stl_algobase.h:428

#4 copy<unsigned int*, unsigned int*> (__result=,

__last=<optimized out>, __first=<optimized out>)
at /usr/include/c++/4.8/bits/stl_algobase.h:460

#5 DataProcessor::SplitDataChannelsNewFW (this=this@entry=0x7043a0,

buffvec=@0x7ffff5a0edb0: 0x6fa8e0, sizevec=@0x7ffff5a0edb8: 0x6fa900, 
timeStamps=timeStamps@entry=0x7ffff00020f0, 
channels=channels@entry=0x7ffff0000fe0) at DataProcessor.cc:278

#6 0x000000000041ee2e in DataProcessor_mongodb::Process (this=0x7043a0)

at DataProcessor.cc:389

#7 0x000000000041cae1 in DataProcessor::WProcess (data=0x7043a0)

at DataProcessor.cc:49

#8 0x00007ffff6a1f182 in start_thread (arg=0x7ffff5a0f700) at pthread_create.c:312
#9 0x00007ffff5f2c30d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Installation fail due to 'build/build-aux/compile' not found

I did::

./configure --enable-lite
make

Then with the make step I get

configure.ac:11: error: required file 'build/build-aux/compile' not found

Update to run numbers instead of time/date strings

We want to number our XENON1T runs starting at zero up to whatever. This is more intuitive in many ways than the date/time string used in XENON100 and currently used in kodiaq.

However the ability to keep the date/time functionality should be kept for standalone purposes! The run numbers are only for XENON1T and will be determined by doing a DB query on the runs database for the most recent run and incrementing it. So this should be done in a reasonable way.

Set milestone to v1.2 but basically this just has to be done before XENON1T takes "run 0".

ddc10 code crashes upon initialization

The ddc10 initialization function crashes with a seg fault.

The function responsible is exp_popen (line 62 in ddc_10.cc). A quick google indicates that this can happen if some variables are not initialized properly. For some reason this does not happen in ubuntu but it does happen when using centos7 (which we are testing as a possible distro to use for other reasons).

Connection to mongo fails regularly

Error:

2014-07-25T15:03:18.321+0200 warning: Failed to connect to 130.92.139.92:27017, reason: errno:4 Interrupted system call

This just started happening recently (since updating xedaq machines to new OS and mongo). Google indicates this could be something like a version mismatch between the server and clients (but really why should that matter, both are 2.6.x).

Generalization of .ini system

Explanation:

Case 1: Say you have a boards attached to one reader PC running the XENON1T custom firmware and the boards on another PC running the default firmware. You want to run them both but they require different processing modes for data parsing. So you have to be able to send one data parsing mode to one PC and one to another.

Case 2: You want to start just slave 3, not 1, 2, or 4. Right now you send a start command and all slaves are started. Is it possible? To make it more difficult let's say that while slave 3 is running you want to start slave 4. Then you want to stop slave 4 but leave 3 running. Basically the start command needs some specificity (maybe a list field giving the nodes that it applies to where blank means all?).

Digitizer unresponsive after long run

After running for like 4 days a slave node froze upon run stop.

Last thing it said: Writing cvOutRegClear with: 1992

Nothing in the log. No other errors. Detached screen session was completely frozen. Will try to reproduce.

MongoDB does not fill new collections on 32-bit machine

We used the standalone kodiaq reader to make MongoDB collections and found out that if there are multiple collections in one database, the second collection is not filled. We traced the problem down to the 10 GB cap size that is hardcoded in the file 'DAQRecorder.cc', line 137. We are running Mondo onb a 32-bit machine, and the data size of one collection is limited to 2 GB. When we reduced the cap size to 1 GB and reinstallec kodiaq, the problems disappeared.

Only use capped collections

kodiaq creates the collection in Mongo that the event builder reads from. For speed considerations, we want a capped collection. See:

http://docs.mongodb.org/manual/tutorial/use-capped-collections-for-fast-writes-and-reads/

It is not possible for the event builder to do this because (if there is data within the collection) then this operation would block the DAQ reader. You have to do this from the master to avoid race conditions.

Dispatcher: In case of multiple detectors run end times not correctly updated

Leftover from single detector mode. Class MasterMongodbConnection always stores the OID of the last document inserted into the runs DB so that it can update the run as 'finished acquisition' when needed. Obviously this doesn't work with two detectors.

The update end time function should take a run name or similar as argument.

Most recent time collected by master

General

In order for the event builder to function correctly, it has to know that it has received all the data for a certain time range that will come from the reader. Otherwise it could be that the event builder tries to build some event while one slave PC is still processing and inserting the data.

One way around this is if the event builder knows for sure some time stamp before which all data has for sure been received. Due to the multiple levels of parallelization within kodiaq this is not trivial, but it should be possible.

Implementation in the slave nodes

The first step is that the slave nodes have to know the most recent time stamp that has been parsed by all boards. This means that a mutex-protected map has to be created which is updated by each processing thread right before inserting with the last time stamp included in the insert. The most recent time seen by all boards is simply the lowest value in this map. This map would be {int_16}:{int_64}, or boardID:timestamp.

Update to the communications protocol

Of course this information must be shared with the master. It is sufficient to do this once per update cycle (usually 1 second) when the rate and other information is transmitted. So the send and receive functions for this update have to be changed to include the time information. There also currently exists no method to send a 64-bit integer so this will have to be included, probably by sending two 32-bit ints and concatenating them.

Update to the master module

The master will then have to keep track of these updates, probably by storing them in the existing structs that store the node information. Periodically, probably on each screen update, the master should look for the smallest time stored and update the run information doc for the current run with this time stamp.

Behavior on end of run

At the end of the run this "end time stamp" field for the run should be set to 0xFFFFFFFFFFFFFFFF, or the maximum value for a 64-bit integer. This tells the event builder to process all times up to infinity.

Command line options

The following command line options should be implemented:

koSlave (networked):
--id={int} id number
--port={int} communication port
--dataport={int} data port
--server={string} server location

koMaster:
--online={string} server holding online db
--ini={string} path to .ini file (if appicable)

koSlave (standalone):
--ini={string} path to .ini file

Crashes when V1724 reboots

A segmentation fault occurs when the boards are rebooted (while the DAQ is in idle mode) and then the DAQ is armed. The exact line seems to be when trying to clear the existing buffer in the CBV1724 class before starting a run.

Steps to reproduce:

Start the DAQ, stop it, put it into idle(maybe optional)
Power cycle the VME crate
Try to arm the DAQ

This bug makes no sense actually, so it needs some looking into.

"libsnappymissing" error during kodiaq installation

Hi Dan,

there is one issue I'd like to report that occured continously while we installed kodiaq (independent of Ubuntu versions). When executing the command './configure --enable-lite' the following error message occured, even though libsnappy was installed:
_checking for ZN6snappy11RawCompressEPKcmPcPm in -lsnappy... no
configure: error: libsnappymissing.

We were only able to fix this by replacing the compiled variable ZN6snappy11RawCompressEPKcmPcPm in the configure file by ZN6snappy11RawCompressEPKcjPcPj that corresponds to our version of libsnappy.

We also experienced that while the 'make' command is executed the configure file is changed back and this error message shows up:
configure: error: libsnappymissing.
make: ** [config.status] Error 1*

Overhaul of master module

The master module is a mess. It needs to be overhauled. A few new features:

Class to handle slave network connections
Better interface with slaves (mongo?)
Improve interface to web
Slow control interface

coderdj / kodiaq Goto Github PK

kodiaq's People

Contributors

Stargazers

Watchers

Forkers

kodiaq's Issues

8098 DAC configuration for each channel

8028 Zero Suppression look back and look forward

8024 Zero suppression over threshold

General

Implementation in the slave nodes

Update to the communications protocol

Update to the master module

Behavior on end of run

Recommend Projects

Recommend Topics

Recommend Org