dmwm / crabserver Goto Github PK

Shell 6.15% Python 55.87% CSS 0.04% HTML 0.35% JavaScript 2.25% Smarty 0.73% Dockerfile 0.31% Jupyter Notebook 6.22% PLSQL 0.52% Pkl 27.57%

crabserver's People

Contributors

Stargazers

Watchers

crabserver's Issues

user proxy location

Hi,

The user proxy is needed to transfer the user files using ftscp command. So its location will be got from the couchDB? or from wmbs DB? or from somewhere else.

Cheers
Hassen

Adding checksum and protocol type

..to the server data api (getoutput) and for the log retrieval api.
Checksum is needed to the verify if the copy was ok.
Protocol type: to bypass bdii in lcg-cp by specifying the protocol.

Review documentation

Things evolved and changed, so it is needed to review, to update and complete the documentation.

Implement API to get logs file

As for the upload to the configCache, CRABREST interface should act as a proxy to provide the client with logs (see also #1302 ). To allow that an API is needed .

At some point we need to support FWLite jobs in addition to cmsRun. CRAB2 does not really support this except through writing a custom script, which is too difficult, especially for the target FWLite user.

Not assigning a milestone to this.

Extend WMSpec for non-trivial user jobs

Ok, take the 2nd patch on top of the first.

LoadDummyData class for AsyncStageout

The LoadDummyData class should create files in site sources to transfer them therefore using the AsyncStageout machinery.

Identify job/tasks states beyond those provided by WMAgent stack

e.g. things like FTS harvest, /store/results

What components manage these states?

Adding job summary status

...to the task status api.

enable data discovery for analysis workflow

all details here https://svnweb.cern.ch/trac/CMSDMWM/ticket/573

this ticket aim at help me tracking this issue related to crab3-02 m.

AsyncStageout component

Agent written in WMAgent component style and implements the main machinery of the AsyncStageout (the implementation of this machninery is described in the Ticket #95). It is needed to merge this agent in the AsyncStageout module.

Evaluate virtualenv for sub shell processes

Servers will have processes running in them which may have conflicting environments (this will be most notable in [wiki:CRABServer CRABServer], I think, where each "user" needs to run processes in a separate environment). Using [http://pypi.python.org/pypi/virtualenv virtualenv] or something similar should make dealing with this a lot simpler. It would also further decouple us from CMSSW and grid middleware versions.

duplicate results got from JSM database in couch

In order to have a self contained Asynch. stage-out component, it is needed to duplicate the results got from JSM databases to Async. database in couch.

Implement appropriate getoutput for GLite

gLite require getouput to be performed even if output, log etc bypass WMS.

Evaluate virtualenv for sub shell processes

integrate proxy API

Proxy API are now ready, their integration at component level is missing.

ConfigCache for user jobs

BossLite and related components ready for jobs

This includes BossLite job handling with gLite and appropriate getoutput and jobstatus components.

User Input Sandbox

Use of "close" storage element for input

Get the fts server from file

FTS server should be got from a configuration file and not a hardcoded dictionary (see ticket #311).

Support additional user input files

Make sure we support CRAB client's additional input files and that they can either be put in the right place in the user sandbox by the client OR make sure they are moved into the right place on the worker node after unpacking.

Cannot manage stop-crabserver; manage start-crab-server

Trying just to stop and start the crab-server appears to shut down the MySQL database as well. Is this intentional? As it is, stop-crabserver must be combined with stop-services and the start sequence must be combined as well.

CRABServer support for good lumi lists

Several ways to accomplish this depending on how it's implemented:

upload JSON to ACDC database to create a generic (no specific files/datasets) collection
Same for URL
Pass URL on to splitting algo

Make new ticket(s) as need arises

mprove CRAB API. Add config api which act as a proxy and do the upload to CouchDB

Auto-increment version number of user datasets by default

We need to, by default at least, automatically increment the dataset version number for user produced datasets to avoid collisions/overwriting.

Split off from #1470. Good discussion there.

User SiteDB check failing with proper message

When siteDB check fails (eg: because the user is not found in it) a proper error message should be returned. Logs of CRABInterface are below (*) (#)

(*) stderr.log
INFO:cherrypy.error:[11/Apr/2011:14:27:05] ENGINE Serving on crabas2.lnl.infn.it:8988
INFO:cherrypy.error:[11/Apr/2011:14:27:05] ENGINE Bus STARTED
INFO:cherrypy.access:[11/Apr/2011:14:33:01] crabas2.lnl.infn.it 137.138.210.236 "GET /crabinterface/crab/info HTTP/1.1" 200 [data: - in 210 out 1842 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]
/home/crab/ALL_SETUP/WMAgent/install07X/CRABServer_HEAD/src/python/CRABRESTModel.py:250: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
raise cherrypy.HTTPError(500, ex.message)
DEBUG:cherrypy.error:call to POST with args: ['user'] kwargs: {} resulted in
INFO:cherrypy.access:[11/Apr/2011:14:33:09] crabas2.lnl.infn.it 137.138.210.236 "POST /crabinterface/crab/user HTTP/1.1" 500 [data: - in 54 out 184336 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]

(#) CRABInterface.log
call to POST with args: ['user'] kwargs: {} resulted in
[11/Apr/2011:14:33:09] crabas2.lnl.infn.it 137.138.210.236 "POST /crabinterface/crab/user HTTP/1.1" 500 [data: - in 54 out 184336 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]

implemet unit test for CRABRESTModel

The unit test is totally missing and must be implemented

as from Simon's comment on #1254

There's no unit tests for this, before going much further the test
coverage needs to increase a lot. REST_t.py in
https://svnweb.cern.ch/trac/CMSDMWM/browser/WMCore/trunk/test/python/WMCore_t/WebTools_t
is probably a good place to start

User should not have to know Agent JobIDs

In trying to figure out getLog and getOutput for the CRAB client, I realize that I have to supply a range of JobIDs on the WMAgent side. Two problems with this

as a user I have no idea what these are, I am just interested in my personal range, not how they map on the other end
Because of workqueues, etc, I don't think we can guarantee that a users CRAB task is assigned to sequential numbers of jobs on the Agent side.

improve and update documentation

this is the entry point for the analysis specific documentation:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/WMAgentRunAnalysis

this is already a bit obsolete in some part and not complete in others.

Document REST api for job submission/tracking

The REST api the Server presents needs to be well defined before significant development begins - is what we've got from Perugia sufficient?

Add API for job status summary

Need to provide an API that reports the amout of job in each status (symmetric to #1446 ticket for CRABClient).

Evaluate ReqMgr as a frontend to CRABServer

Is ReqMgr a good fit for the request management in CRAB3? Does it align with CRAB/analysis requirements? Is it too much or not enough? What changes would be needed to make it more suitable?

Unit test for getLog API

hostcert.pem and hostkey.pem paths should be configurable in Credential API

Now the paths of hostkey and hostcert are hardcoded in the credential API. They are set to $HOME/.globus/hostkey.pem and $HOME/.globus/hostcert.pem. It is needed to get them configurable: passing them in the input dictionary of credential API while setting their default values to $HOME/.globus/hostkey.pem and $HOME/.globus/hostcert.pem

Meta-discussion on user sandbox distribution issues

Split off from #681.

We have ticket #1151, which I assumed to be a worker node issue. Job lands on a worker node with a list of URLs for sandboxes (user code, LHE files, etc) to fetch and does so, hopefully hitting a squid cache to reduce overall traffic.

We also need a discussion of how the sandbox gets off the UI to the agent or some other "sand box cache" which is what Daniele was getting at in

(to clarify: the basic prototype we are finally going to propose will most probably not shipping the user sandboxes to the agent..at least not implementing the "final strategy" which is matter of discussion here).

Along the time we discussed 3 possible approaches: Https-chirp-gFtp. In principle those are not excluding each other, but, of course to exploit (more than one of) them we should have a generic interface (e.g. what in the past era we called SE API).
I personally have direct experience with the third option listed above and I think make sense to evaluate also the others.

Said that I would like to trigger attention here and came up with a plan, which IMHO means:
-- to define what we want to support (how local schedulers should be supported)
-- to define which kind of API we want to interact with what we want to support (if we'll have more then once choice )

Implement validation of CMS names

using lexicon
{{{
from WMCore.Lexicon import cmsname
cmsname('T2_IT_PISA')
}}}
we can catch big "error"

also a full validation is needed and this should uses list of known sites in SiteDB.

Add additional WMSpecs for user generated MC as needed.

We probably need a new spec(s) for the various generators plus user code to support the needed user MC use cases.

Review install documentation (with wider WMCore)

Improve CRAB REST interface

/config API need to be fixed
API to return server infos to be used by the client need to be implemented
API for getting task status which report percentage of job/state must be exposed.

Extend ConfigCache for non-trivial user jobs

AsyncStageout should be a component

The AsyncStageout component should be written in a WMAgent components style.

Add stageout parameters to WMSpec

Split off from #638.

Add parameters in WMSpec for user output like remote destination and dataset name used for publishing. Please add the parameters here and ewv will implement.

Server side crab -report functionality

Parse FWJRs from couch returning lists of lumis successfully processed

Complete porting of FeederManager to wmcore

Remove the usage of ProdCommon WorkQueue from FeederManager.

Initial async stage out implementation

this ticket must be merged with https://svnweb.cern.ch/trac/CMSDMWM/ticket/95

Evaluate WMSpec usage to pass parameters at submission level

Split off from #638.

After having the air plugins ready it would be possible to understand which information are needed.

Be able to pass, in the WMSpec, some non-standard but used parameters for submission and copying, like myproxy serve used, role, and group. Please add others here and ewv will implement.

documentation improvements

Based on the first round of feedbacks

CRABInterface communications with UserFileCache

We need to have CRABInterface API changes to upload the user sand box the same way CRABInterface interacts with the config cache.

The UserFileCache component is covered in #1400

Need new component, InputSandboxCache?

We spent a while discussing this today. All of us favor an approach where the user sandbox flow is as follows:

Client uploads the sandbox to ReqMgr/CRABInterface via http/s in the same way that the CMSSW _cfg.py is uploaded. This will be secured by X509 proxy, same as posting to the CRABInterface.

The CRABInterface uploads, via REST interface, the user sandbox to the sandbox cache which responds with an identifier for the sandbox in "the cache". This identifier is returned to the client. When the job is submitted by the client, this identifier is passed along to the various work queues and is included in the job spec.

Here the handling of the config in Couch and the sandbox in a different cache would differ. The user sandbox would not be placed in the job sandbox, but would rather be downloaded directly by the worker node once the job has started. Eventually this wget would go through a squid cache at the remote site and result in smaller network loads.

Presumably the identifier in the cache would be or would include a hash of the contents of the sandbox so that repeated submission of the same sandbox would not result in wasted space in the cache nor extra bandwidth between the squid and the hash.

The other option, not favored, was to have the local work queue fetch the sandbox from the cache and include it in the job sandbox. We felt this would waste too much bandwidth between the submitting machine and the remote CE.

In any case the major issue is that we need to find or build "the cache" with a REST interface. Does any such thing exist in our software stack already or do we have the option to use a third party supplied option? This would probably not be the most difficult thing to write ourselves, but we worry about doing it right. On the other hand, something we do ourselves can easily include cleanups, diagnostics for Ops, and perhaps pinning of additional sandboxes for MC generation, etc.

This whole approach has the advantage of allowing staged testing. Initially we would use a static URL as the sandbox without any upload capability but test the WN or workqueue level stuff that will have to be added to allow HTTP accessible sand boxes.

We'd like to have a discussion, both of the sandbox data flow and possible implementations of the cache before opening a couple more tickets to address all the details.

dmwm / crabserver Goto Github PK

crabserver's People

Contributors

Stargazers

Watchers

Forkers

crabserver's Issues

Recommend Projects

Recommend Topics

Recommend Org