chimeratk / applicationcore Goto Github PK
View Code? Open in Web Editor NEWCore library for creating applications based on the ControlSystemAdapter and DeviceAccess.
License: GNU Lesser General Public License v3.0
Core library for creating applications based on the ControlSystemAdapter and DeviceAccess.
License: GNU Lesser General Public License v3.0
DeviceToControlSystem variables can conceptually not be presisted by the control system. The application has to take care of this in case the values cannot be re-calculated when re-starting the server.
Task: The Output should get an option to persist this variable (off by default). When enabled, the variable is always written to file when a write is called. When the application is started/initialised, the variable content is read from the file.
Details:
This ticket is a child of #10 and depends on #46.
Variables which are Constants or outputs of the ConfigReader and are connected to a DeviceModule should be written in an initialisation handler (see #46). Currently they are written in ConfigReader::pepare() etc., which might block the application initialisation if an exception occurs in the process of writing these variables.
Hints for implementation:
If only one single instance of the ConfigReader exists, this instance should be available through a member function of ApplicationModule inside the constructor of each module.
Something like this is not necessary to write:
std::unique_ptr<ctk::ScalarOutput<std::string> > message;
Instead of initialising the pointer in the constructor, use a default constructed ScalarOutput and use message.setMetadata() to set its name etc.
This ticket is a child of #10 and depends on ChimeraTK/ControlSystemAdapter#18.
The data fault flag introduced with ChimeraTK/ControlSystemAdapter#18 needs to be propagated through ApplicationModule. This can be done similar to the propagation of the VersionNumber:
In a read operation, when a set data fault flag is received, a data flag for the module will be set (this flag for the module needs to be introduced first). This flag will be cleared once all input variables of the module have their data fault flag no longer set (a logic how this is achieved needs to be planned first - please keep in mind a module might have many variables, so scanning all data fault flags at each operation is not an option!).
In a write operation, the module's data fault flag status will be attached to the variable to write.
All computations shall be executed normally, whether the data fault flag is set nor not, unless the application author specifically tests for the flag. There might be use cases for this, so please allow testing the module's flag by the application user code.
Write a StatusMonitor module
The reported state is an integer with the following values (this is actually an enum, but enums are not supported by ApplicationCore. Please define them as a weakly typed enum and make the PV a plain int):
Not all of these states need to be used for each flavour. E.g. 0 is only used by the on/off state checker described above.
Hints for design/interface/usage:
Definition of Complete:
This is a child task of #10 and depends on task #11. The idea is to provide a first rather simple exception handling to investigate how to develop the idea further.
Find all places where a ChimeraTK::runtime_error could be thrown in read and write operations (ignore open for now). This is (list may not be complete);
Catch the exceptions there and feed the error state into the DeviceModule through the new function DeviceModule::reportException() (see #11). Retry the failed operation after reportException() returns.
Please put the ticket for review by @mhier when done.
Add test for generating triggers in an ApplicationModule and distributing it through a FanOut. This is currently not covered, thus the following bug was not detected:
Document the exact behaviour of handling and reporting of runtime_error
thrown by Device and its accessors.
** A central thread is waiting for exceptions to be reported by all accessors of the device.
** If an exception is received it is reported and the device status goes to 1
** The thread tries in a loop to call Device::open(), which is supposed to recover the device, until the device reports isFunctional() again.
** After this the initialisaton seqeuence is run. If it fails this itself reports an exception and the recovery restarts.
** The accessor tries to execute the failed action (read/write) again. If it succeeds the device status reported to the CS is set to OK and the error message from the exception is cleared.
Describe the seen behaviour in case of Devices which
** do not (or cannot) report error states (isFunctional()
is always true) and one has to retry the read/write to see if the device is working again
** implement a recovery procedure and can report isFunctional()
as false while the device is known to be not operational.
Describe when the initialisation will be executed
Describe the difference of the blocking read() in contrast to readNonBlocking() and readLatest()
This ticket depends on #13 - read that ticket first for background information.
Implement a StatusAggregator module
Hints for implementation:
prepare()
is too late, because the variable household is then fixed and inputs from the aggregated monitors can not be added anymore.Hints for design/interface/usage:
Definition of Complete:
Many tests are missing. Here is an incomplete list of missing tests (so less obvious scenarios do not get forgotten):
Currently, exceptions (in particular ChimeraTK::runtime_error) cannot be handled. If a device throws a ChimeraTK::runtime_error, it is not possible to catch this exception by the user code, since often it is thrown inside a FanOut. Ideally, exceptions should be handled by ApplicationCore in a way that the application developer does not have to care much about it. This means:
Currently the ExceptionHandlingDecorator is treating all cases equal: They set the data invalid flag and block until the error condition is resolved.
Correct behaviour of write()
Related with #56 and (discussion about blocking read).
To prevent a race condition with deadlocks, reportException the releases TestableMode mutex after increasing the testable_mode_counter, but before pushing to the error queue, which is protected by another mutex. This causes the race condition that the test sees an inconsistent state (counter increased but empty queue) and stalls the execution.
Both mutexes must be held when modifying the queue and the counter to keep them consistent. Use "try_lock" for the second mutex, and release the first mutex with a spinning re-lock to avoid the deadlock.
Currently it is already possible to modify the apparent hierarchy structure by setting the flag "eliminateHierarchy" for modules. This concept should be made more flexible and readable by changing the boolean flag into a strongly-typed enum called HierarchyModifier. Default will be HierarchyModifier::none, which is does not change the hierarchy (same as eliminateHierarchy=false). HierarchyModifier::hide will be same as eliminateHierarchy=true (I think this is a much better name then "eliminate").
A new feature will be enable by HierarchyModifier::moveToRoot, which will let the module appear at the root level of the hierarchy tree. All children will be moved along with the group (while keeping the tree structure below the moved group of course).
Currently it is not possible to connect the PeriodicTrigger output tick to the MicroDAQ module. This is because MicroDAQ is expecting an int and PeriodicTrigger delivers uint_64
Assuming the following code:
DeviceModulde dev;
ApplicationModule mod;
…
mod.connectTo(dev);
And also assuming that mod has a push-type input and dev a corresponding output, the current code will generate a poll-type output on the dev.
VirtualModule::connectTo()
calls operator(std::string)
on the device module (see https://github.com/ChimeraTK/ApplicationCore/blob/master/src/VirtualModule.cc#L70) which generates poll-type by default (see https://github.com/ChimeraTK/ApplicationCore/blob/master/src/DeviceModule.cc#L37). The following operator>>
call will then connect the poll-type dev variable to the push-type input of mod, even though the device would support that variable as push-type (e.g. being a DOOCS ZMQ variable)
Seen in testExceptionHandling: Currently the trigger has a device variable as consumer. If a separate variable is created, just to trigger, the exception from the device which is being triggered is not handled.
Missing: provide example code to reproduce the issue
Module instances (in particular VariableGroup instances) should get a description, which will become part of the process variable description published to the control system.
Typical example: Module depending on configuration parameters is initialised only in the constructor body of the application. By mistake a connection to that module is made before the module is replaced with the final (properly configured) instance. In the replacement (assignment operation) the destructor of the previous instance is called, which should throw if connections to variables inside the module have already been made. Only during the shutdown phase of the application modules may be destroyed which have connections.
Also default constructed modules should not allow making any connection in the first place.
I guess that should not happen when using ApplicationCore in the correct way, but currently my application crashes at this point since owner is a nullptr
in my case:
ApplicationCore/include/Module.h
Line 146 in 43b4e47
Better throw an exception and if possible give a hint how to solve the problem.
The examples should be stand-alone so one can compile them out of the box against a system-installation of ApplicationCore. For this, they each need a CMakeLists.txt.
When defining a ConfigReader variable with a different type than reading it in the application, the ConfigReader reports that the variable cannot be found. Instead it should report that the type is incorrect and which type is expected by the application.
This ticket is a child of #10.
Background: If a device is recovered after an exception, it might need to be reinitialised (e.g. because it was power cycled). Currently, device initialisation is completely up to the application. This needs to be changed so devices can be automatically reinitialised after recovery.
Task:
Note: multiple handlers are required for the implementation of #47 (as it will add initialisation handlers to devices which might already have a user-defined handler).
DoD:
If building ApplicationCore at home I get error:
libChimeraTK-ApplicationCore.so.01.04.00: Nicht definierter Verweis auf `boost::gregorian::greg_month::as_short_string() const'
adding that in CMakeLists.txt fixed that for me:
FIND_PACKAGE(Boost COMPONENTS date_time REQUIRED)
what's your situation?
The device error reporting scheme is broken. As requested in #11 / #12 , reportError() blocks until the device error is resolved. This is conceptually wrong for the following reasons:
Task:
Consider the following for changes in the ExceptionHandlingDecorator:
The class Application combines too much functionality in a single class. Some ideas to improve the structure:
I guess closing the file in the exception handling does not work if the disk is full.
ApplicationCore/Modules/src/MicroDAQ.cc
Line 360 in a8bafe0
See backtrace:
#0 0x00007f0134f0c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f0134f0e02a in __GI_abort () at abort.c:89
#2 0x00007f013554684d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f01355446b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f0135544701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f0135544969 in __cxa_rethrow () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007f013750a6d2 in ChimeraTK::detail::H5storage::writeData() () from /usr/lib/libChimeraTK-ApplicationCore.so.01.02xenial2
#7 0x00007f013750ad34 in ChimeraTK::detail::H5storage::processTrigger() () from /usr/lib/libChimeraTK-ApplicationCore.so.01.02xenial2
#8 0x00007f013750d00b in ChimeraTK::MicroDAQ::mainLoop() () from /usr/lib/libChimeraTK-ApplicationCore.so.01.02xenial2
#9 0x00007f01374e2b20 in ChimeraTK::ApplicationModule::mainLoopWrapper() () from /usr/lib/libChimeraTK-ApplicationCore.so.01.02xenial2
#10 0x00007f0136c2f5d5 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#11 0x00007f01360086ba in start_thread (arg=0x7f0021ffb700) at pthread_create.c:333
#12 0x00007f0134fde41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Connecting a device module to a control system module e.g.,
dev(RegisterName, ctk::UpdateMode::push) >> csModule[configLocation](RegisterName);
fails with an exeception
terminate called after throwing an instance of 'ChimeraTK::logic_error'
what(): The feeding node has zero (or undefined) length!
Reason is that the VariableNetworkNode operator() in DeviceModule sets the nElements = 0 by default.
Currently, the ConfigReader module supports only a flat list of variables. This can be inconvenient for more complex applications. A well-structured hierarchy of configuration variables could also be mapped directly onto the structure of an application.
Add the option to the ConfigReader to create submodules with configuration variables (and more submodules) inside. The config file could then look like this:
<configuration>
<variable name="variableName" type="int32" value="42"/>
<variable name="anotherVariable" type="string" value="Hello world!"/>
<module name="foo">
<variable name="variableInModule" type="int32" value="120"/>
<variable name="anotherVariable" type="string" value="Hello world!"/>
<module name="bar">
<variable name="someName" type="int32" value="666"/>
</module>
</module>
</configuration>
The values can be obtained through get() like this: config.get<int>("foo/bar/someName")
, or like this: config["foo"]["bar"].get<int>("someName")
. When connecting the ConfigReader module with application modules, the square bracket operator behaves in the same way as for any other module with submodule, and connectTo() can also be used to connect the hierarchy on to a hierarchy in another module. For unit testing, it should be sufficient to use: config.connectTo(cs)
, where "cs" is the ControlSystemModule. The hierarchy of variables should then appear in the control system (with its constant values).
Definition of done:
This is a child task of #10 and follows up on task #12. It can be done in parallel with #25, but it depends on ChimeraTK/DeviceAccess#50
Current situation:
runtime_error exceptions coming from devices are now caught and made visible to the control system. While the device is in the exception state, the DeviceModule
will try opening the device until it succeeds. On the other hand, the device is already opened during launch of the application. If that fails (because the device is not reachable during server start), the application will currently not start.
Task:
Initial opening of the device should take place in the DeviceModule
itself, inside the exception handling loop. That way the device can go to the error state right at the beginning and the server can start despite not all its devices are available.
Since the device is currently opened in some cases already while the connections are made (in Application::makeConnections()
), to obtain the device catalogue if connectTo()
is used to connect an entire device with directly the control system, we need to make use of the new Device
constructor introduced in ChimeraTK/DeviceAccess#50 there instead.
DoD:
In Application::optimiseConnections(), networks which have the same Device variable as feeder are merged. This merging fails, if both networks contain the same consumer (typically a ControlSystem variable). Those duplicates need to be detected first and only one copy is kept.
This happens e.g. if a (sub-)module with HierarchyModifier::moveToRoot is connected in several instances to both the device and the control system. This is often done when using tags.
Write first a test with a VariableGroup HierarchyModifier::moveToRoot with tags "CS" and "DEV", instantiated two times. The tag "CS" is connected with the control system, "DEV" with the device. This should initially fail. Then try to fix this problem.
DoD:
This requires to specify the server version somehow e.g. in the constructor of Application.
a4c2e03 has fixed a bug in the move assignment (which is used by the constructor). A test is missing, which would have shown this bug earlier. The test should verify that an application which uses multiple DeviceModules (e.g. moved to a std::vector) still operates normally. Also the DeviceModules need some testing (e.g. subscript operator etc.) after they all have been created.
Introduce additional HierarchyModifiers:
This ticket is a child of #10 and depends on #38.
If a read or write operation results in a ChimeraTK:runtime_error, the module's data fault flag should be set and propagated to all outputs of the module. When the operation completes after clearing the exception state, the flag should be cleared as well.
Improve tests of the implementation of #12 . Currently only exceptions that occur on write() are properly tested. This should be extended so any possible runtime_error which can be thrown by a backend is tested. This will require to change the ExceptionBackend used for the tests into a backend not based on the NumericAddressedBackend, so the test can control where exactly the exception is thrown. Then it is possible to also test exceptions in preRead()/postRead()/preWrite()/postWrite().
It is probably also useful to have fine control in the ExceptionBackend in which function an exception is thrown (rather everywhere or nowhere right now). The test can then check one place after the other.
Also the test is currently lacking to verify that the read or write operation is retried, i.e. the data is not lost.
In case of runtime_errors thrown by Device accessors, the error reporting blocks all actions until they finally succeed. This is wrong for readNonBlocking and readLatest.
The shutdown test is supposed to cover all scenarios where an accessor could block, and check that the application destructor can join all threads and shutdown without blocking.
However, DeviceAccessors can be placed directly into all kinds of objects by the connection code: The control system module, user modules, ConcuminFanOuts, FeedingFanOuts, ThreadedFanOuts, TriggerFanOuts. Please review the test if there are other classes which are not covered, and if all of the above are handled by the test. (The TriggerFanOut is not, but the mechanism is not in place in Device for tests, and probably this will not throw but just not send in case of error).
When using the "trick" to make ApplicationModule to ApplicationModule connections by connecting them both to the same variables in the ControlSystem, mistakes can be easily made e.g. when variables are renamed in refactoring at only one end of the internal connection. Add an optional sanity check which helps detecting those problems:
Definition of done:
The device status is always reported as OK after 500 ms.
The recovery mechanism for failed devices is to call open() again and wait until the Device reports isFunctional() again. Some devices (network based without permanent connection) don't take any recovery action during open() and always report isFunctional(). Currently all devices do this. This leads to the unwanted behaviour that the device state is always reported as OK after 500 ms (if the device does not have an initialisation procedure which itself throws again).
Correct behaviour:
This behaviour works well with connection-less backends with unstable network, where sometimes network interruptions occur and just go away, as well as with backends which actively have to resolve the error. The latter happens in the DeviceModule thread, which wakes up the failing accessors to retry.
Implementation:
Details are not fully worked out. The DeviceModule thread needs to get notified that the error condition has been resolved. So it after being woken up there can be two actions: send error or resolve error. Do we have two different sleeping points? But what to do with additional errors that come in? Or just one sleeping point? Do we need an extra queue or extend the meaning of the predicate for the condition variable?
When #56 is already implemented, make sure that readNonBlocking and readLatest have the correct behaviour:
In the code below, somehow the ch7 (possibly others, but the data is usually constant) was not updated properly after the readAll().
The values are connected to the module it lives in using the trigger from the PeriodicTrigger
struct : ctk::VariableGroup {
using ctk::VariableGroup::VariableGroup;
ctk::ScalarPushInput<int16_t> ch1{this, "CAR_ADC_CH1", "", ""};
ctk::ScalarPushInput<int16_t> ch2{this, "CAR_ADC_CH2", "", ""};
ctk::ScalarPushInput<int16_t> ch3{this, "CAR_ADC_CH3", "", ""};
ctk::ScalarPushInput<int16_t> ch4{this, "CAR_ADC_CH4", "", ""};
ctk::ScalarPushInput<int16_t> ch5{this, "CAR_ADC_CH5", "", ""};
ctk::ScalarPushInput<int16_t> ch6{this, "CAR_ADC_CH6", "", ""};
ctk::ScalarPushInput<int16_t> ch7{this, "CAR_ADC_CH7", "", ""};
ctk::ScalarPushInput<int16_t> ch8{this, "CAR_ADC_CH8", "", ""};
ctk::ScalarPushInput<int16_t> ch9{this, "CAR_ADC_CH9", "", ""};
// 10, 11, 12 handled elsewhere
ctk::ScalarPushInput<int16_t> ch12{this, "CAR_ADC_CH12", "", ""};
ctk::ScalarPushInput<int16_t> ch13{this, "CAR_ADC_CH13", "", ""};
ctk::ScalarPushInput<int16_t> ch14{this, "CAR_ADC_CH14", "", ""};
ctk::ScalarPushInput<int16_t> ch15{this, "CAR_ADC_CH15", "", ""};
} input{this, "DeviceIn", "ADC input channels", ctk::HierarchyModifier::hideThis, {"ULOG_IN"}};
...
void mainLoop() {
input.readAll();
...
}
This is a child task of #10. The idea is to provide a first rather simple exception handling to investigate how to develop the idea further. The implementation should look as follows:
Add two error state variables:
to DeviceModule. Place them together in a VariableGroup called "DeviceError", but do not list this group in the subModules map, so it doesn't get automatically connected with connectTo(). Instead one needs to connect this explicitly to the control system.
Add a thread safe function to the DeviceModule which sets the error state called "reportException". This requires an internal thread of the DeviceModule which fills these variables. To communicate with this thread, we can directly use cppext::future_queue, as it is multi producer (in contrast to the ProcessArray).
reportException() shall block until the error state has been resolved. This means the internal thread periodically tries to reopen the device. Once this is successfully done, all threads currently blocking reportException() (keep in mind this can be many!) should be unblocked. This can be realised through a std::condition_variable.
After reportException() returns, the callee must retry the last read or write operation (since it has failed). This should be noted in the documentation.
Add test cases for this implementation:
Also add documentation in form of Doxygen descriptions for the new function and the VariableGroup, as well as a Doxygen page about exception handling explaining this functionality for users (i.e. Application developers).
Please put the ticket for review by @mhier when done.
When directly connecting a device with just a plain list of registers (test.map for instance with /REG1, /REG2 etc.) they show up as /REG1/REG1, /REG2/REG2 etc. I was excepting just /REG1 etc.
When there is a hierarchy in the map file (test2.map with /MyModule/actuator) no additional hierarchy is created.
Could this be converted to something like a warning:
Currently the https://github.com/ChimeraTK/ServerMockup is not working anymore because of this assertion. I just tested what happens when removing this assertion. The ServerMockup runs fine again and all the ApplicationCore test succeed.
It would be very inconvenient (if possible at all) to reproduce the variable tree using ModuleGroups and VariableGroups.
There is no test for the propagation of the VersionNumber through ApplicationModules and FanOuts etc.
This should be an independent project (i.e. create new repository on github). This very simple application just has one DeviceModule opening a single device. This DeviceModule is then connected to a ControlSystemModule with connectTo(). Since many devices have only poll-type variables, there should also be a configurable trigger. Use a PeriodicTrigger module for this purpose and specify its tick variable in the connectTo() call as second argument. Both the tick and the period of the trigger should be visible in the control system inside a directory "/trigger".
The idea if this application is of course to use it in combination with the LogicalNameMapping backend, so the registers which should be visible in the control system can be selected and properly named. Also register plugins can be used e.g. to convert raw values to physical units.
The application can also double as a starting point for writing new, more complex applications. It is therefore useful to document the code well and use a structure of the project which is suitable for extension, i.e. put a header file into the include directory and the source file into the src directory and follow best practises in CMakeLists.txt. The project should link against a user-selected control system adapter, refer to the LLRF server for an example how to do that (maybe: put that code into a cmake module in project-template and use that module).
DoD:
With improved warning flags there are some "shadow" warnings. These should be removed to get rid of the compiler warnings.
This is a child task of #10 and follows up on task #12. Can be done in parallel with #26.
Current situation:
runtime_error exceptions coming from devices are now caught and made visible to the control system. ApplicationModules which read those variables will be blocked until the exception state is resolved. TriggerFanOuts read multiple poll-type device variables when a trigger arrives. Those variables might come from different devices. If one of the devices fails, currently the entire TriggerFanOuts will block until the device is recovered.
Task:
Change the connection making code (in class Application) so it creates a separate TriggerFanOut for each device. This makes not only sure that faulty devices will not block readout of other devices, but also will parallelise the readout of multiple devices, which is a performance improvement (completely disconnected from exception handling).
DoD:
Due to spurious wakeups, condition_variable::wait must always be used with a boolean predicate. In the DeviceModule the mechanism just uses the wait() without checking a conditon.
Introduce a flag which indicated the error status and use it together with the condition variable.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.