fairrootgroup / dds Goto Github PK
View Code? Open in Web Editor NEWThe Dynamic Deployment System
Home Page: http://dds.gsi.de
License: GNU Lesser General Public License v3.0
The Dynamic Deployment System
Home Page: http://dds.gsi.de
License: GNU Lesser General Public License v3.0
Implement the possibility to store in files stdout and stderr of users' tasks.
At the moment each channel checks handshake confirmations on it's own. Implement a special flag on the level of ConnectionImpl, which will not process any push message unless handshake is set to be confirmed.
DDS should store active topology, deployment info and properties info in SQLite tables. This will give us a possibility to implement different kinds of queries to request different kinds of info. For example, a list of active agent sorted by a task or tasks list, a list of agent and user tasks which depend on a given property and so on...
This will also give a possibility to have multiple commanders working simultaneously or when one commander can take over the control after first one died.
Different dds commands should also benefit from having an SQL backend.
Add taskId to console output.
-------------->>> 52043878-09d8-4b55-b345-32e819a7b417
Host Info: [email protected]:/Users/andrey/tmp/dds_wn_test/wn1_37/
Agent pid: 18308
Agent UI port: 51975
Agent startup time: 9 sec.
TaskId: 39478837440043409
Extend dds-info command with a possibility to list available properties and values of the current deployment.
to list properties, call something like
dds-info --prop-list
to list values of a given property
dds-info --prop-values
This task is blocked (linked) with GH-27
Implement _normolizeRead(Write) and other protocol bitwise operations as templates to get rid of type dependent calls.
It will significantly simplify the code of channels.
Support comments in topology files.
At the moment if a comment is added like for example:
<main id="main">
<task>task1</task>
<!--task>task2</task-->
</main>
the topology parser returns the error:
dds-submit: error: Server reports: Initialization of Main failed with the following errorUnable to initialize task group main error: Topology element with name <xmlcomment> does not exist.
Put and get key-values using shared memory.
key-value library should not access and lock file each time it gets or writes property.
This implementation will be significantly faster compared to file storage.
Implementation can be based on boost::interprocess library.
In addition to file size implement log file rotation which is based on time.
Add a chapter to DDS user manual on how to use DDS SSH plug-in.
Add command dds-topology --stop that shuts down the tasks.
Currently if a given topology file is invalid, the dds-submit returns:
dds-submit: error: Server reports: XML validation failed with the following error: XML file is not valid.
This error message gives basically no information on what exactly is wrong with the topology. Implement a more detailed error report, which will give hints to users on what is wrong.
Commander and agent die after idle time even if user processes are still running.
For cases when users specify env files in topos for their tasks, DDS agent should execute tasks via "/bin/sh -c ; <user.exe file>".
When no env file is specified, then user.exe should be executed directly.
Env files are defined per task basis.
Introduce multiple debug levels in DDS log. Debug level can be decided on multiple levels, like debug_protocol_low, debug_protocol_messages, debug_protocol_events, and so on...
First assign tasks which have a requirement than assign the rest of the tasks.
Split agents activation from the ddd-submit command.
Logically submit of agents and actions related to topology, such as activation, has not much to do with each other. We therefore need to extract agent activation actions from the dds-submit command, which will be used only to submit agents.
Please create a new command - dds-topology command, which can be used to, for example, activate a specific part of topology - task(s), group(s) etc.
implement a key-value propagation API library
In order to send SSimpleMsgCmd which has an attachment on has to implement 5 lines of code:
SSimpleMsgCmd cmd;
cmd.m_sMsg = "message";
CProtocolMessage pm;
pm.encodeWithAttachment<cmdGET_LOG_ERROR>(cmd);
pushMsg(pm);
It would be better to write something like this:
SSimpleCmd cmd("message");
pushMsg<cmdGET_LOG_ERROR, SSimpleMsgCmd>(cmd);
Or even shorter style:
pushMsg<cmdGET_LOG_ERROR, SSimpleMsgCmd>(SSimpleMsgCmd("message"));
Create a new command - dds-agent-cmd. Which will be used to send different commands to an agent or set of agents, such as getlog, restart, delete, rebase.
Therefore dds-gelog command can be simply be a part of dds-agent-cmd.
Decode message only once. For the moment each callback function receives message and decodes it. Therefore if message has more than one handler than message will be decoded several times. This can be a performance issue for the large messages.
Command and message consistency.
Example:
SBinaryAttachmentCmd recieved_cmd;
recieved_cmd.convertFromData(_msg.bodyToContainer());
Received message and and data structure has to be consistent.
"dds-server start" checks if WN binaries are available and if not it tries to download these binaries from web. If web site is not available this will prevent dds-server from starting. We need to add an option that only WN binary for the current system has to be present. In this can one can manually compile WN binary.
Statuses can be something like the following:
In order to drop ICU support in worker packages we need to build boost without ICU.
http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/install.html
To reproduce this error delete ~/.DDS/DDS.cfg and run "dds-user-defaults -d -c ~/.DDS/DDS.cfg"
The array of agents in the communication manager is stored as an array of shared_ptr. The array should be open for direct access only when a new agent is added or when an agent is removed. Absolutely all other operations/accesses to agents must be processed via weak_ptr, in order to prevent edge cases errors when an agent is removed, but some routines try to write into it or read from it.
Add possibility to send a bunch of commands.
This can be implemented on the level of ConnectionImpl class. We can keep the same pushMsg interface but instead of immediately send one command we should collect a bunch of commands within a certain time limit and then send all commands as one. pushMsg which sends immediately the command also has to be kept.
Each channel (ConnectionImpl instance) has to decide itself when to send a command.
Implement callbacks (signals) in BaseChannelImpl for different events like connect, disconnect, handshake, handshake errors.
key-value exchange between agent and user task is based on the shared memory. However, other serialisation possibilities probably will be needed such as file serialisation and socket. key-value library can decide which serialisation to use base on the user settings or based on the environment.
A number of agents per host is defined as the last parameter of a host definition in the ssh plug-in configuration file.
Implement new functional test which emulate behaviour of the FLP-EPN example from AliceO2.
Implement unit tests for scheduler to test performance.
If "dds-topolody --activate" is called directly after "dds-server start -s" than dds commander crashes.
Add a chapter to DDS user manual about DDS topology with examples.
At the moment DDS manages strictly one agent per user process. In order to manage hundreds of thousands of user tasks we might need a possibility to use, for example, one agent per physical host, managing multiple user tasks.
in order to optimize property propagation on big topologies, we should add a property propagation type attribute to properties assigned to tasks.
The attribute can have 3 statuses:
In case when a task only writes a property, it won't receive any update if another tasks updates the same property.
Implement a key-value propagation support in the DDS protocol. This means, DDS commander should be able to receive kay-value updates and propagate them to agents, which depend on those key-values.
During key-value tests commander server exits after 1800 sec. Looks like the idle time is not updated in the DDS transport.
Implement requirement for tasks in the topology XML file. For example, required memory or pattern of the required host name.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.