Giter Club home page Giter Club logo

clustershell's People

Contributors

alexbozhenko avatar arkamar avatar bamb0u avatar bjornfor avatar degremont avatar dupgit avatar e4t avatar fihuer avatar getreu avatar griznog avatar haiwu avatar hawartens avatar hdoreau avatar johnnykeats avatar kcgthb avatar martinetd avatar mattaezell avatar oleholmnielsen avatar phantez avatar rezib avatar ryantig avatar thiell avatar thomasboni avatar volans- avatar xdelaruelle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clustershell's Issues

Fix ClusterShell fanout definition

Currently, '''set_info("fanout", f)''' fixes the execution window size of file descriptors in Engine. The user should not be aware of this internal number. The user wants (I know the user well) to set the concurrent number of running EngineClient.

In ClusterShell v1.0, an EngineClient consumes 1 FD (stdout).
In ClusterShell v1.1, an EngineClient consumes 2 FD (stdin and stdout).
In ClusterShell v1.2, an EngineClient consumes 2 or 3 FD (stdin, stdout and optionally stderr).

Fix fanout definition with: fanout is the max size of the concurrent running EngineClient (and not FD).

Then also fix '''clush.py''' and remove the "fanout * 2".

Allow disabling Task MsgTree buffering

For now, each [wiki:Task] centralizes its workers buffers in a lines gathered message tree: it's useful for messages post processing etc.

However, we don't always want this internal buffering, especially for application fully event based (for example, a simple pdsh clone --without dshbak, won't need this feature as it displays remote messages as it gets them).

Add a way to disable internal buffering/gathering at any time. In that case, MsgTree access methods should probably raise an exception when called.

NodeGroups arithmetics

Based on the structure of the NodeSet class, add a class to deal with NodeGroups operations. This class will be able to analyse a string containing valid nodegroup(s) and nodeset(s).

clubak and MsgTree

Create a {{{clubak}}} script which works in the same way that {{{dshbak}}} for pdsh backward compatibility.

This will need to use {{{MsgTree}}} class. So it could interesting to slightly update this class for ClusterShell v2 API.

We should be able to do:

{{{
tree = MsgTree()
for keys, buffers in tree:
...
}}}

Gateways communication framework

Add a layout to ClusterShell in order to communicate with remote cluster gateways.

Connect and state exchange

This layout will allow the ClusterShell library to establish a connection to these remote nodes and then exchange state/informative data with them.

Scalable remote commands execution consideration

This layout will also be used as the communication layout for remote commands execution and output transfer through gateways as proposed in ticket #31.

Implementation

One proposal is to implement this framework using lightweight [http://docs.python.org/library/xml.sax.html Python SAX2 parsers].

Improve non-reg test coverage

Thanks to Python Coverage tool, non-reg test coverage could be compute automatically :

{{{
$ coverage -x ./run_testsuite.py -v
$ coverage -r -o /usr/lib64/python2.4 | grep -v '100%'
}}}

Code non covered could be verify thanks to {{{coverage -a FILE}}}.

Tests should be improved to reach 100% (when possible).

Add epoll based Engine

Add new engine implementation for use with Fedora 11: '''!ClusterShell.Engine.EPoll''', based on [http://docs.python.org/library/select.html?highlight=epoll#select.epoll select.epoll()](new in Python 2.6).

Linux 2.6 offers the epoll facility (I/O event notification facility):

From http://linux.die.net/man/4/epoll:
{{{
epoll is a variant of poll(2) that can be used either as Edge
or Level Triggered interface and scales well to large numbers of
watched fds. Three system calls are provided to set up and control
an epoll set: epoll_create(2), epoll_ctl(2), epoll_wait(2).
}}}

clush: wrong return code on node timeout with -S -u TIMEOUT

Fix clush return code on node timeout when using -S -u timeout, to behave like pdsh. For instance:

PDSH case:
{{{
$ pdsh -w fortoy3 -S -u 2 sleep 10
pdsh@fortoy0: fortoy3: command timeout
sending SIGTERM to ssh fortoy3 pid 1855
pdsh@fortoy0: fortoy3: ssh exited with exit code 255
$ echo $?
255
}}}

CLUSH case:
{{{
$ clush -w fortoy3 -S -u 2 sleep 10
clush: fortoy3: command timeout
$ echo $?
0
}}}

Update wiki pages for CS 1.1 release

When CS 1.1 will be released, do not forget to update to Wiki pages with the really nice clustershell new features:

  • clush page (a lot of new stuff)
  • nodeset page (using stdin examples...)
    Surely something interesting could be done with the reStructuredText pages from manpages, which are understood by Trac Wiki.
  • class diagram for 1.1 is outdated.
  • epydoc 1.0 could be kept, but must not be the first link available.

clush 1.1 tracking

== clush.py tracking ==

  • add write support to clush, eg.:
    {{{

    echo "foobar" | clush -b -w nodes[1-20] cat

    }}}

  • add nodes group support to clush:
    {{{

    clush -a touch some_stuff

    clush -g clu_oss rm /some/stuff

    }}}
    This could be done by adding a configuration file for external command upcall. For example /etc/clustershell/clush.ini (.conf?), a Python http://docs.python.org/library/configparser.html ConfigParser syntax file (for info, !ConfigParser becomes configparser in Python v3 but is still supported):
    {{{

    !python

    [NodeGroup]
    all:
    group: <some command containing "%(group)s" >
    }}}

Add select based Engine

Add new engine implementation based on [http://docs.python.org/library/select.html?highlight=select#select.select select.select()] for systems without poll().

Persistent gateway connections with clush-agent

Utility to avoid gateway reconnections when using clush in scalable mode. This agent is optional.
The idea of '''clush-agent''' is that it is started in a login session on the management node so next clush instances are started as clients to the clush-agent program. Through use of environment variables, the agent can be located and automatically used to route clush requests.

This agent will be able to maintain one ssh connection to each selected gateway node, running there a clush in listen mode (clush -X).

'''Pros''': fastest execution (avoid ssh reconnection), still no daemon running on gateways!
'''Cons''': has to be fully secured within the login session

Simple quote escape conflict with Pdsh worker

See ticket #25 for the complete description of the problem. The issue still remains with Pdsh worker due to appendable environment variables like PDSH_SSH_ARGS_APPEND=...

Should be fixed as soon as we make use of the '''subprocess''' module (and not popen2).

Unhandled NodeSet parse error in clush

{{{
[root@fortoy0 ~]# clush -w fortoy[45-33] ls
Traceback (most recent call last):
File "/usr/bin/clush", line 651, in ?
clush_main(sys.argv)
File "/usr/bin/clush", line 538, in clush_main
nodeset_base = NodeSet(options.nodes)
File "/usr/lib/python2.4/site-packages/ClusterShell/NodeSet.py", line 805, in init
self.update(pattern)
File "/usr/lib/python2.4/site-packages/ClusterShell/NodeSet.py", line 976, in update
for pat, rangeset in _NodeSetParse(other, self._autostep):
File "/usr/lib/python2.4/site-packages/ClusterShell/NodeSet.py", line 733, in _NodeSetParse
raise NodeSetParseRangeError(e)
ClusterShell.NodeSet.NodeSetParseRangeError: invalid values in range : "45-33"
}}}

nodeset: change (for a better?) command syntax

The usage of the nodeset command is not coherent and too hard to use.

Change the usage of this command for a more common one based on the following pattern:

nodeset [command: -c,e,f] [ns1 [operation: -x,i,X] ns2 | ...]

Scalable remote commands execution through gateways

Add scalable commands execution capability to ClusterShell V2 using special nodes called gateways or proxies to dispatch large commands launch accross the cluster. The exact execution algorithm to use is still to be defined.

This will be added as a new worker probably called WorkerTree. This ticket depends on ticket #30 in order to use the new remote nodes communication layout of ClusterShell.

This will be the main addition of ClusterShell v2.0.

NodeSet is not handling whitespace consistently

NodeSet.NodeSet() has some inconsistencies in the output when there are whitespace before and/or after the names, for example :
{{{
NodeSet.NodeSet(" tigrou2 , tigrou7 , tigrou[9-11] ").str()
}}}
returns
{{{
' tigrou[9-11] , tigrou[2,7]'
}}}
and not (a space before and one after)
{{{
' tigrou[2,7,9-11] '
}}}
BTW, maybe whitespace doesn't have a lot of meaning in a node name and should be ignored during the parsing, the output would be :
{{{
'tigrou[2,7,9-11]'
}}}

Simple quote escape conflict

There is a problem with simple quote escaping with clush when using sed for example:

{{{
[root@fortoy0 ~]# clush -w fortoy55 "echo $HOSTNAME | sed 's/\ fortoy55/fortoy/' "
fortoy55: sed: -e expression #1, char 2: unterminated `s' command
clush: fortoy55: exited with exit code 1
}}}

Add Engine selector

Add automatic (with predefined list) and manual Engine implementation selection in Task.py.

Predefined Engine priority try-list will be:

  • [/ticket/7 EPoll]
  • [/ticket/9 KQueue]
  • Poll
  • [/ticket/8 Select]

Set separation character when using nodeset -e

Allow the user to set the separation character when using nodeset --expand, defaulting to a space character. For example, this feature will avoid the use of tr in scripts like : nodeset -e ... | tr ' ' ','

(PhG request)

Display output messages on KeyboardInterrupt (CTRL-C)

When executing '''clush -b''', if some node doesn't answer, the user can end the execution with CTRL-C. However, doing so doesn't print current received gathered messages.

Display already received messages on CTRL-C.

This ticket depends on #21 for a clean implementation.

Optimizes NodeSet.__getitem__

  • Implement {{{NodeSet.getitem()}}} in an optimize way, like {{{RangeSet.getitem()}}}
    • Same thing for {{{NodeSet.getslice()}}}

Improve non-reg test coverage (1.2)

Thanks to Python Coverage tool, non-reg test coverage could be compute automatically :

{{{
$ coverage -x ./run_testsuite.py -v
$ coverage -r -o /usr/lib64/python2.4 | grep -v '100%'
}}}

Code non covered could be verify thanks to {{{coverage -a FILE}}}.

Tests should be improved to reach 100% (when possible).

Same ticket for 1.1 is #32.

Possible to modify the ssh path and ssh arguments

It is very interesting to be able to change:

  • the binary path used by ClusterShell: ssh, pdsh, others?
  • ssh arguments used by ClusterShell.
  • add other ssh arguments.

Maybe the 2 last element could be combined.

Migrate to subprocess module

The Python 2.4+ [http://www.python.org/doc/2.6.2/library/subprocess.html subprocess] module allow safer, feature-richer commands execution.

The use of subprocess will allow easier fixing of ticket #4 (environ propagation) and #19 (separate stdout/stderr).

clush: add option to prevent reading from stdin (like ssh -n)

Add an option to explicitely disallow binding of stdin in order to allow the use of clush in scripts like:

{{{
cat /etc/oddjobd.conf | while read line
do
echo "LINE: $line"
clush -w fortoy3 do_something
done
}}}

ssh offers option''' -n''' to do that (redirects stdin from /dev/null)

Add Worker/EngineClient autoclose option

This ticket will follow the addition of "autoclose" clients or timers (both derives from EngineBaseTimer), allowing task.resume() to exit even when these clients/timers are not finished.

In depth, currently registered Engine clients in ClusterShell are counted at each loop by the runloop (called by task.resume()), that means the runloop will not exit until all engine clients are finished or explicitely terminated. We will modify this behavior by adding a runloop reference counter, and "autoclose" clients will not take a refcount.

separate stdout/stderr handling

Currently, both stdout and stderr are read by ClusterShell and merged in one channel (due to [http://www.python.org/doc/2.4.3/lib/popen3-objects.html popen4()] call).

Add an option to separate stdout and stderr.

External group support to NodeSet

The group external calls used in clush.conf has shown a great usability and this kind of groups should be added to an extension of NodeSet.

Add a configuration file to manage groups requests primitives (upcalls) and support several namespaces (or contexts).

For each namespace, defines 2 or 3 external calls (upcalls):

  1. '''direct mapping''': !GroupName -> list of !GroupName | !NodeSet
  2. '''group listing''': returns a list of !GroupNames
  3. '''reverse mapping''': !GroupName | !NodeSet -> best !GroupNames (optional)

For example, with flat files (can be a default configuration):

  1. {{{ awk '/%(group):/ { print $2 }' }}}
  2. {{{ awk -F: '{ print $1 }' }}}
  3. None

For example, we could easily add a slurm binding:

  1. {{{ sinfo -h -o "%N" -p %(group) }}}
  2. {{{ sinfo -h -o "%P" }}}
  3. {{{ sinfo -h -N -o "%P" -n %(node) }}}

Another example with a getent binding:

  1. {{{ getent netgroup %(group) }}}
  2. {{{ getent netgroup }}} (see if possible)

or use {{{ ldapsearch }}}.

We see that a class (!NodeGroups ?) that takes a configuration file as parameter is needed with 3 methods like:

  1. {{{ expand() }}}
  2. {{{ list() }}}
  3. {{{ fold() }}}

Requests caching needs to be added to this class for faster queries.

Add Inter Task Communication API

Currently, each ClusterShell task is independent, running in a separate thread, and inter-task communication is not supported. User could add their own synchronization or communication system but it may be tricky to implement without an access to Engine internals.

This ticket follows the addition of Inter Task/Engine Communication API, probably like a sort of thread mailboxes handled by the Engine at next loop.
This will allow task1 to perform task2.abort(), here is an idea of how it could perform:

{{{

task1

task2.abort()
if task2 != task_self():
task2.engine.mbox_post("abort")
task_mbox_wait("abort_ok") (current task1 waits for "abort_ok" msg)

task2

poll()
-> event read on special mailbox FD
-> "abort" msg received from task1
self.abort()
task1.engine.mbox_post("abort_ok")

}}}

Advanced interactive clush features

Add advanced interactive features in clush.py, like:

  • add automatic unreachable nodes exclusion
  • result tagging for fast nodeset switching
  • ...

clush.py object rewrite and non-regression tests

clush.py script should be rewrite in an object way.
Thanks to this, non-regression tests will be added, to test:

  • configuration file
    • read from {{{/etc/clustershell/clush.conf}}}
    • read from {{{~/.clush.conf}}}
    • test supported flag in it
  • argument on command line
    • test each supported parameters.

Propagate environment variable on task

It will be very interesting to set environment variables which could be set by task or worker.

# Set environment variable 
vars = { 'PATH': "...", 'LD_LIBRARY_PATH': "..." }
task.putenv(vars)
# or
worker.putenv(vars)

Clush could not copy directories

Even if it is written in clush man page, clush do not support copying directories.

this could be done easily, simply testing if source file is a directoy, if yes, '-r' flag should be added (to worker ssh and worker pdsh)

Add writing support to Engine/Workers

This ticket follows the feature of '''writing''' to Engine clients and associated workers, allowing write() on workers like WorkerSsh.

This is a necessary feature for ClusterShell v2 to send live commands to distant ssh's. Also, the [wiki:clush] tool will be improved by allowing things like:
{{{

echo "foo" | clush -w mycluster[1-32] cat

}}}
(this is not supported by pdsh ;-)

NodeSet 1.2 improvements (NodeGroups)

This ticket tracks the extension of NodeSet to support group of nodes:
#43 External group support to NodeSet
#44 NodeGroups arithmetics
#45 nodeset: change (for a better?) command syntax
#42 nodeset: allow multiple -g/w/x arguments (bonus)

Add a way to add ssh options at runtime

Like ssh_options in clush.conf, but dynamically at runtime, add a way to add (modify ?) ssh options like "-p 2222", either by environment variables or through a clush command line option, or both.

clush: stderr output not correctly displayed

Check & fix that, probably due to the fact that stderr is not buffered.

sometimes that:
{{{

clush -w fortoy[12] uname -a

fortoy12: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
fortoy12: Linux fortoy12 2.6.18-164.6.1.....
}}}

sometimes that:
{{{

clush -w fortoy[12] uname -a

fortoy12: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
fortoy12: @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
fortoy12: @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
fortoy12: IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!

}}}

Better error handling for some worker accessors

For example, if an user forgot to task.schedule(a worker), and try to access to its buffers, avoid trace like:

{{{
File "../lib/ClusterShell/Worker/Worker.py", line 190, in node_buffer
return self.task._msg_by_source((self, node))
AttributeError: 'NoneType' object has no attribute '_msg_by_source'
}}}

Unhandled broken pipe

Fix the following issue:

{{{
[root@fortoy0 ~]# clush -w fortoy[33-34] 'dmesg; sleep 5; dmesg' | less
Unhandled exception in thread started by
Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/ClusterShell/Task.py", line 176, in _start_thread
self._engine.run(self.timeout)
File "/usr/lib/python2.4/site-packages/ClusterShell/Engine/Engine.py", line 634, in run
self.runloop(timeout)
File "/usr/lib/python2.4/site-packages/ClusterShell/Engine/Poll.py", line 175, in runloop
client._handle_read()
File "/usr/lib/python2.4/site-packages/ClusterShell/Worker/Ssh.py", line 174, in _handle_read
self.worker._on_node_msgline(self.key, msg)
File "/usr/lib/python2.4/site-packages/ClusterShell/Worker/Worker.py", line 145, in _on_node_msgline
self._invoke("ev_read")
File "/usr/lib/python2.4/site-packages/ClusterShell/Worker/Worker.py", line 93, in _invoke
self.eh._invoke(ev_type, self)
File "/usr/lib/python2.4/site-packages/ClusterShell/Event.py", line 51, in _invoke
ev_handler(source)
File "/usr/bin/clush", line 101, in ev_read
print "%s: %s" % (ns, buf)
IOError: [Errno 32] Broken pipe
}}}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.