Giter Club home page Giter Club logo

crabserver's Issues

CRABServer support for good lumi lists

Several ways to accomplish this depending on how it's implemented:

  1. upload JSON to ACDC database to create a generic (no specific files/datasets) collection
  2. Same for URL
  3. Pass URL on to splitting algo

Make new ticket(s) as need arises

Evaluate virtualenv for sub shell processes

Servers will have processes running in them which may have conflicting environments (this will be most notable in [wiki:CRABServer CRABServer], I think, where each "user" needs to run processes in a separate environment). Using [http://pypi.python.org/pypi/virtualenv virtualenv] or something similar should make dealing with this a lot simpler. It would also further decouple us from CMSSW and grid middleware versions.

Need new component, InputSandboxCache?

We spent a while discussing this today. All of us favor an approach where the user sandbox flow is as follows:

Client uploads the sandbox to ReqMgr/CRABInterface via http/s in the same way that the CMSSW _cfg.py is uploaded. This will be secured by X509 proxy, same as posting to the CRABInterface.

The CRABInterface uploads, via REST interface, the user sandbox to the sandbox cache which responds with an identifier for the sandbox in "the cache". This identifier is returned to the client. When the job is submitted by the client, this identifier is passed along to the various work queues and is included in the job spec.

Here the handling of the config in Couch and the sandbox in a different cache would differ. The user sandbox would not be placed in the job sandbox, but would rather be downloaded directly by the worker node once the job has started. Eventually this wget would go through a squid cache at the remote site and result in smaller network loads.

Presumably the identifier in the cache would be or would include a hash of the contents of the sandbox so that repeated submission of the same sandbox would not result in wasted space in the cache nor extra bandwidth between the squid and the hash.

The other option, not favored, was to have the local work queue fetch the sandbox from the cache and include it in the job sandbox. We felt this would waste too much bandwidth between the submitting machine and the remote CE.

In any case the major issue is that we need to find or build "the cache" with a REST interface. Does any such thing exist in our software stack already or do we have the option to use a third party supplied option? This would probably not be the most difficult thing to write ourselves, but we worry about doing it right. On the other hand, something we do ourselves can easily include cleanups, diagnostics for Ops, and perhaps pinning of additional sandboxes for MC generation, etc.

This whole approach has the advantage of allowing staged testing. Initially we would use a static URL as the sandbox without any upload capability but test the WN or workqueue level stuff that will have to be added to allow HTTP accessible sand boxes.

We'd like to have a discussion, both of the sandbox data flow and possible implementations of the cache before opening a couple more tickets to address all the details.

Cannot manage stop-crabserver; manage start-crab-server

Trying just to stop and start the crab-server appears to shut down the MySQL database as well. Is this intentional? As it is, stop-crabserver must be combined with stop-services and the start sequence must be combined as well.

Implement API to get logs file

As for the upload to the configCache, CRABREST interface should act as a proxy to provide the client with logs (see also #1302 ). To allow that an API is needed .

hostcert.pem and hostkey.pem paths should be configurable in Credential API

Now the paths of hostkey and hostcert are hardcoded in the credential API. They are set to $HOME/.globus/hostkey.pem and $HOME/.globus/hostcert.pem. It is needed to get them configurable: passing them in the input dictionary of credential API while setting their default values to $HOME/.globus/hostkey.pem and $HOME/.globus/hostcert.pem

Evaluate virtualenv for sub shell processes

Servers will have processes running in them which may have conflicting environments (this will be most notable in [wiki:CRABServer CRABServer], I think, where each "user" needs to run processes in a separate environment). Using [http://pypi.python.org/pypi/virtualenv virtualenv] or something similar should make dealing with this a lot simpler. It would also further decouple us from CMSSW and grid middleware versions.

Improve CRAB REST interface

  • /config API need to be fixed
  • API to return server infos to be used by the client need to be implemented
  • API for getting task status which report percentage of job/state must be exposed.

integrate proxy API

Proxy API are now ready, their integration at component level is missing.

Add stageout parameters to WMSpec

Split off from #638.

Add parameters in WMSpec for user output like remote destination and dataset name used for publishing. Please add the parameters here and ewv will implement.

user proxy location

Hi,

The user proxy is needed to transfer the user files using ftscp command. So its location will be got from the couchDB? or from wmbs DB? or from somewhere else.

Cheers
Hassen

Review documentation

Things evolved and changed, so it is needed to review, to update and complete the documentation.

Support additional user input files

Make sure we support CRAB client's additional input files and that they can either be put in the right place in the user sandbox by the client OR make sure they are moved into the right place on the worker node after unpacking.

User should not have to know Agent JobIDs

In trying to figure out getLog and getOutput for the CRAB client, I realize that I have to supply a range of JobIDs on the WMAgent side. Two problems with this

  1. as a user I have no idea what these are, I am just interested in my personal range, not how they map on the other end
  2. Because of workqueues, etc, I don't think we can guarantee that a users CRAB task is assigned to sequential numbers of jobs on the Agent side.

WMStep for FWLite jobs

At some point we need to support FWLite jobs in addition to cmsRun. CRAB2 does not really support this except through writing a custom script, which is too difficult, especially for the target FWLite user.

Not assigning a milestone to this.

Implement validation of CMS names

using lexicon
{{{
from WMCore.Lexicon import cmsname
cmsname('T2_IT_PISA')
}}}
we can catch big "error"

also a full validation is needed and this should uses list of known sites in SiteDB.

Adding checksum and protocol type

..to the server data api (getoutput) and for the log retrieval api.
Checksum is needed to the verify if the copy was ok.
Protocol type: to bypass bdii in lcg-cp by specifying the protocol.

AsyncStageout component

Agent written in WMAgent component style and implements the main machinery of the AsyncStageout (the implementation of this machninery is described in the Ticket #95). It is needed to merge this agent in the AsyncStageout module.

Meta-discussion on user sandbox distribution issues

Split off from #681.

We have ticket #1151, which I assumed to be a worker node issue. Job lands on a worker node with a list of URLs for sandboxes (user code, LHE files, etc) to fetch and does so, hopefully hitting a squid cache to reduce overall traffic.

We also need a discussion of how the sandbox gets off the UI to the agent or some other "sand box cache" which is what Daniele was getting at in


(to clarify: the basic prototype we are finally going to propose will most probably not shipping the user sandboxes to the agent..at least not implementing the "final strategy" which is matter of discussion here).

Along the time we discussed 3 possible approaches: Https-chirp-gFtp. In principle those are not excluding each other, but, of course to exploit (more than one of) them we should have a generic interface (e.g. what in the past era we called SE API).
I personally have direct experience with the third option listed above and I think make sense to evaluate also the others.

Said that I would like to trigger attention here and came up with a plan, which IMHO means:
-- to define what we want to support (how local schedulers should be supported)
-- to define which kind of API we want to interact with what we want to support (if we'll have more then once choice )

Evaluate ReqMgr as a frontend to CRABServer

Is ReqMgr a good fit for the request management in CRAB3? Does it align with CRAB/analysis requirements? Is it too much or not enough? What changes would be needed to make it more suitable?

Evaluate WMSpec usage to pass parameters at submission level

Split off from #638.

After having the air plugins ready it would be possible to understand which information are needed.

Be able to pass, in the WMSpec, some non-standard but used parameters for submission and copying, like myproxy serve used, role, and group. Please add others here and ewv will implement.

User SiteDB check failing with proper message

When siteDB check fails (eg: because the user is not found in it) a proper error message should be returned. Logs of CRABInterface are below (*) (#)

(*) stderr.log
INFO:cherrypy.error:[11/Apr/2011:14:27:05] ENGINE Serving on crabas2.lnl.infn.it:8988
INFO:cherrypy.error:[11/Apr/2011:14:27:05] ENGINE Bus STARTED
INFO:cherrypy.access:[11/Apr/2011:14:33:01] crabas2.lnl.infn.it 137.138.210.236 "GET /crabinterface/crab/info HTTP/1.1" 200 [data: - in 210 out 1842 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]
/home/crab/ALL_SETUP/WMAgent/install07X/CRABServer_HEAD/src/python/CRABRESTModel.py:250: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
raise cherrypy.HTTPError(500, ex.message)
DEBUG:cherrypy.error:call to POST with args: ['user'] kwargs: {} resulted in
INFO:cherrypy.access:[11/Apr/2011:14:33:09] crabas2.lnl.infn.it 137.138.210.236 "POST /crabinterface/crab/user HTTP/1.1" 500 [data: - in 54 out 184336 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]

(#) CRABInterface.log
call to POST with args: ['user'] kwargs: {} resulted in
[11/Apr/2011:14:33:09] crabas2.lnl.infn.it 137.138.210.236 "POST /crabinterface/crab/user HTTP/1.1" 500 [data: - in 54 out 184336 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.