fairrootgroup / dds Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 13.0 4.27 MB

The Dynamic Deployment System

Home Page: http://dds.gsi.de

License: GNU Lesser General Public License v3.0

CMake 4.89% C++ 90.31% Shell 3.14% Python 1.64% C 0.02%

dds's People

Contributors

Stargazers

Watchers

Forkers

andreylebedev anarmanafov mohammadalturany flaviofalcao ktf cbm-fles dberzano dennisklein heistera rbx christiantackegsi davidrohr arakotoz

dds's Issues

dds-topology: correct hash calculation in case of a hash collision

Calculate a different hash for task in case of collision. For example add 1.
Store a list of collided hashes.

Users' tasks stdout and stderr

Implement the possibility to store in files stdout and stderr of users' tasks.

Handshake check in ConnectionImp

At the moment each channel checks handshake confirmations on it's own. Implement a special flag on the level of ConnectionImpl, which will not process any push message unless handshake is set to be confirmed.

SQLite backend

DDS should store active topology, deployment info and properties info in SQLite tables. This will give us a possibility to implement different kinds of queries to request different kinds of info. For example, a list of active agent sorted by a task or tasks list, a list of agent and user tasks which depend on a given property and so on...

This will also give a possibility to have multiple commanders working simultaneously or when one commander can take over the control after first one died.

Different dds commands should also benefit from having an SQL backend.

dds-info: Add taskId to console output

Add taskId to console output.

-------------->>> 52043878-09d8-4b55-b345-32e819a7b417
Host Info: [email protected]:/Users/andrey/tmp/dds_wn_test/wn1_37/
Agent pid: 18308
Agent UI port: 51975
Agent startup time: 9 sec.
TaskId: 39478837440043409

dds-info: get properties and values

Extend dds-info command with a possibility to list available properties and values of the current deployment.

to list properties, call something like
dds-info --prop-list

to list values of a given property
dds-info --prop-values

Broadcast a property deletion when corresponding task exits

This task is blocked (linked) with GH-27

Optimize bitwise operations in channels.

Implement _normolizeRead(Write) and other protocol bitwise operations as templates to get rid of type dependent calls.
It will significantly simplify the code of channels.

replace custom buffer in the protocol lib with std::array and boost::asio::buffer

comments in topology files

Support comments in topology files.

At the moment if a comment is added like for example:

<main id="main">
    <task>task1</task>
    <!--task>task2</task-->
</main>

the topology parser returns the error:

dds-submit: error: Server reports: Initialization of Main failed with the following errorUnable to initialize task group main error: Topology element with name <xmlcomment> does not exist.

Implement shared memory storage for key-value.

Put and get key-values using shared memory.
key-value library should not access and lock file each time it gets or writes property.
This implementation will be significantly faster compared to file storage.
Implementation can be based on boost::interprocess library.

Archive log files before sending them to dds-commander.

log file rotation over time

In addition to file size implement log file rotation which is based on time.

Doc on the SSH plug-in

Add a chapter to DDS user manual on how to use DDS SSH plug-in.

dds-topology: add stop tasks command

Add command dds-topology --stop that shuts down the tasks.

detailed error message when topology is invalid

Currently if a given topology file is invalid, the dds-submit returns:

dds-submit: error: Server reports: XML validation failed with the following error: XML file is not valid.

This error message gives basically no information on what exactly is wrong with the topology. Implement a more detailed error report, which will give hints to users on what is wrong.

Commander and agent die after idle time

Commander and agent die after idle time even if user processes are still running.

implement env. files for user tasks

For cases when users specify env files in topos for their tasks, DDS agent should execute tasks via "/bin/sh -c ; <user.exe file>".
When no env file is specified, then user.exe should be executed directly.

Env files are defined per task basis.

DDS Log: Additional debug levels

Introduce multiple debug levels in DDS log. Debug level can be decided on multiple levels, like debug_protocol_low, debug_protocol_messages, debug_protocol_events, and so on...

Improvement of SSH scheduler

First assign tasks which have a requirement than assign the rest of the tasks.

split agents activation from the dds-submit command

Split agents activation from the ddd-submit command.
Logically submit of agents and actions related to topology, such as activation, has not much to do with each other. We therefore need to extract agent activation actions from the dds-submit command, which will be used only to submit agents.
Please create a new command - dds-topology command, which can be used to, for example, activate a specific part of topology - task(s), group(s) etc.

key-value propagation API lib

implement a key-value propagation API library

Refactor pushMsg interface for commands with attachment.

In order to send SSimpleMsgCmd which has an attachment on has to implement 5 lines of code:
SSimpleMsgCmd cmd;
cmd.m_sMsg = "message";
CProtocolMessage pm;
pm.encodeWithAttachment<cmdGET_LOG_ERROR>(cmd);
pushMsg(pm);

It would be better to write something like this:
SSimpleCmd cmd("message");
pushMsg<cmdGET_LOG_ERROR, SSimpleMsgCmd>(cmd);

Or even shorter style:
pushMsg<cmdGET_LOG_ERROR, SSimpleMsgCmd>(SSimpleMsgCmd("message"));

Create dds-agent-cmd command

Create a new command - dds-agent-cmd. Which will be used to send different commands to an agent or set of agents, such as getlog, restart, delete, rebase.
Therefore dds-gelog command can be simply be a part of dds-agent-cmd.

Decode message only once. Check that command and message are consistent in message decoding.

Decode message only once. For the moment each callback function receives message and decodes it. Therefore if message has more than one handler than message will be decoded several times. This can be a performance issue for the large messages.

Command and message consistency.
Example:
SBinaryAttachmentCmd recieved_cmd;
recieved_cmd.convertFromData(_msg.bodyToContainer());

Received message and and data structure has to be consistent.

Add option to check WN binaries only for the current system.

"dds-server start" checks if WN binaries are available and if not it tries to download these binaries from web. If web site is not available this will prevent dds-server from starting. We need to add an option that only WN binary for the current system has to be present. In this can one can manually compile WN binary.

Change status of agents according to their state

Statuses can be something like the following:

running (user task is being executed),
idle (no tasks are assigned),
done (last task is done) and so on

Build WN packages without ICU support

In order to drop ICU support in worker packages we need to build boost without ICU.
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/install.html

dds-user-defaults does not create default configuration files.

To reproduce this error delete ~/.DDS/DDS.cfg and run "dds-user-defaults -d -c ~/.DDS/DDS.cfg"

communication channels must be accessed only via weak_ptr

The array of agents in the communication manager is stored as an array of shared_ptr. The array should be open for direct access only when a new agent is added or when an agent is removed. Absolutely all other operations/accesses to agents must be processed via weak_ptr, in order to prevent edge cases errors when an agent is removed, but some routines try to write into it or read from it.

Add possibility to send a bunch of commands

Add possibility to send a bunch of commands.
This can be implemented on the level of ConnectionImpl class. We can keep the same pushMsg interface but instead of immediately send one command we should collect a bunch of commands within a certain time limit and then send all commands as one. pushMsg which sends immediately the command also has to be kept.
Each channel (ConnectionImpl instance) has to decide itself when to send a command.

dds-topology: Add availableOnWorker flag for tasks.

Implementation of callbacks (signals) in BaseChannelImpl

Implement callbacks (signals) in BaseChannelImpl for different events like connect, disconnect, handshake, handshake errors.

key-value: Implement serialisation to file and to socket

key-value exchange between agent and user task is based on the shared memory. However, other serialisation possibilities probably will be needed such as file serialisation and socket. key-value library can decide which serialisation to use base on the user settings or based on the environment.

extend ssh plug-in to support multiple agents per host

A number of agents per host is defined as the last parameter of a host definition in the ssh plug-in configuration file.

Add SimpleMsg attachment for cmdREPLY_HANDSHAKE_ERR

New functional tests

Implement new functional test which emulate behaviour of the FLP-EPN example from AliceO2.

Scheduler unit tests

Implement unit tests for scheduler to test performance.

key-value user API: implement a user function callback on properties update

dds-topology --activate crashes

If "dds-topolody --activate" is called directly after "dds-server start -s" than dds commander crashes.

doc on the DDS topology

Add a chapter to DDS user manual about DDS topology with examples.

Optimization of the SSH scheduler algorithm

Show progress for "dds-agent-cmd getlog" and "dds-topology --activate" in percent.

Move test command from dds-commander to dedicated dds-test executable.

Startup time of agents

Add a possibility to use one agent for multiple user tasks.

At the moment DDS manages strictly one agent per user process. In order to manage hundreds of thousands of user tasks we might need a possibility to use, for example, one agent per physical host, managing multiple user tasks.

Add property propagation types

in order to optimize property propagation on big topologies, we should add a property propagation type attribute to properties assigned to tasks.
The attribute can have 3 statuses:

read (task only reads an assigned property)
write (task only writes an assigned property)
not defined or read/write

In case when a task only writes a property, it won't receive any update if another tasks updates the same property.