crabserver's People
Forkers
mmascher cinquo hassenriahi spigad vmancine igor-sfiligoi qunox nizamyusli annawoodard matz-e belforte khurtado mialiu149 talamoig emaszs dciangot todor-ivanov alexanderrichards shervin86 lecriste vlimant amaltaro vkuznet ddaina sharmaprajesh mapellidario novicecpp tvami akhter-towsifa dynamic-entropy ekongat kpedro88 davidlange6 oljemark aspiringmind-codecrabserver's Issues
user proxy location
Hi,
The user proxy is needed to transfer the user files using ftscp command. So its location will be got from the couchDB? or from wmbs DB? or from somewhere else.
Cheers
Hassen
Adding checksum and protocol type
..to the server data api (getoutput) and for the log retrieval api.
Checksum is needed to the verify if the copy was ok.
Protocol type: to bypass bdii in lcg-cp by specifying the protocol.
Review documentation
Things evolved and changed, so it is needed to review, to update and complete the documentation.
User ProxyManagement
API to download user proxy
Implement API to get logs file
As for the upload to the configCache, CRABREST interface should act as a proxy to provide the client with logs (see also #1302 ). To allow that an API is needed .
WMStep for FWLite jobs
At some point we need to support FWLite jobs in addition to cmsRun. CRAB2 does not really support this except through writing a custom script, which is too difficult, especially for the target FWLite user.
Not assigning a milestone to this.
Extend WMSpec for non-trivial user jobs
Ok, take the 2nd patch on top of the first.
LoadDummyData class for AsyncStageout
The LoadDummyData class should create files in site sources to transfer them therefore using the AsyncStageout machinery.
Identify job/tasks states beyond those provided by WMAgent stack
e.g. things like FTS harvest, /store/results
What components manage these states?
Adding job summary status
...to the task status api.
enable data discovery for analysis workflow
all details here https://svnweb.cern.ch/trac/CMSDMWM/ticket/573
this ticket aim at help me tracking this issue related to crab3-02 m.
AsyncStageout component
Agent written in WMAgent component style and implements the main machinery of the AsyncStageout (the implementation of this machninery is described in the Ticket #95). It is needed to merge this agent in the AsyncStageout module.
Evaluate virtualenv for sub shell processes
Servers will have processes running in them which may have conflicting environments (this will be most notable in [wiki:CRABServer CRABServer], I think, where each "user" needs to run processes in a separate environment). Using [http://pypi.python.org/pypi/virtualenv virtualenv] or something similar should make dealing with this a lot simpler. It would also further decouple us from CMSSW and grid middleware versions.
duplicate results got from JSM database in couch
In order to have a self contained Asynch. stage-out component, it is needed to duplicate the results got from JSM databases to Async. database in couch.
Implement appropriate getoutput for GLite
gLite require getouput to be performed even if output, log etc bypass WMS.
Evaluate virtualenv for sub shell processes
Servers will have processes running in them which may have conflicting environments (this will be most notable in [wiki:CRABServer CRABServer], I think, where each "user" needs to run processes in a separate environment). Using [http://pypi.python.org/pypi/virtualenv virtualenv] or something similar should make dealing with this a lot simpler. It would also further decouple us from CMSSW and grid middleware versions.
integrate proxy API
Proxy API are now ready, their integration at component level is missing.
ConfigCache for user jobs
BossLite and related components ready for jobs
This includes BossLite job handling with gLite and appropriate getoutput and jobstatus components.
User Input Sandbox
Use of "close" storage element for input
Get the fts server from file
FTS server should be got from a configuration file and not a hardcoded dictionary (see ticket #311).
Support additional user input files
Make sure we support CRAB client's additional input files and that they can either be put in the right place in the user sandbox by the client OR make sure they are moved into the right place on the worker node after unpacking.
Cannot manage stop-crabserver; manage start-crab-server
Trying just to stop and start the crab-server appears to shut down the MySQL database as well. Is this intentional? As it is, stop-crabserver must be combined with stop-services and the start sequence must be combined as well.
CRABServer support for good lumi lists
Several ways to accomplish this depending on how it's implemented:
- upload JSON to ACDC database to create a generic (no specific files/datasets) collection
- Same for URL
- Pass URL on to splitting algo
Make new ticket(s) as need arises
mprove CRAB API. Add config api which act as a proxy and do the upload to CouchDB
Auto-increment version number of user datasets by default
We need to, by default at least, automatically increment the dataset version number for user produced datasets to avoid collisions/overwriting.
Split off from #1470. Good discussion there.
User SiteDB check failing with proper message
When siteDB check fails (eg: because the user is not found in it) a proper error message should be returned. Logs of CRABInterface are below (*) (#)
(*) stderr.log
INFO:cherrypy.error:[11/Apr/2011:14:27:05] ENGINE Serving on crabas2.lnl.infn.it:8988
INFO:cherrypy.error:[11/Apr/2011:14:27:05] ENGINE Bus STARTED
INFO:cherrypy.access:[11/Apr/2011:14:33:01] crabas2.lnl.infn.it 137.138.210.236 "GET /crabinterface/crab/info HTTP/1.1" 200 [data: - in 210 out 1842 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]
/home/crab/ALL_SETUP/WMAgent/install07X/CRABServer_HEAD/src/python/CRABRESTModel.py:250: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
raise cherrypy.HTTPError(500, ex.message)
DEBUG:cherrypy.error:call to POST with args: ['user'] kwargs: {} resulted in
INFO:cherrypy.access:[11/Apr/2011:14:33:09] crabas2.lnl.infn.it 137.138.210.236 "POST /crabinterface/crab/user HTTP/1.1" 500 [data: - in 54 out 184336 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]
(#) CRABInterface.log
call to POST with args: ['user'] kwargs: {} resulted in
[11/Apr/2011:14:33:09] crabas2.lnl.infn.it 137.138.210.236 "POST /crabinterface/crab/user HTTP/1.1" 500 [data: - in 54 out 184336 us ] [auth: - "" "" ] [ref: "" "CRABClient/v001" ]
implemet unit test for CRABRESTModel
The unit test is totally missing and must be implemented
as from Simon's comment on #1254
- There's no unit tests for this, before going much further the test
coverage needs to increase a lot. REST_t.py in
https://svnweb.cern.ch/trac/CMSDMWM/browser/WMCore/trunk/test/python/WMCore_t/WebTools_t
is probably a good place to start
User should not have to know Agent JobIDs
In trying to figure out getLog and getOutput for the CRAB client, I realize that I have to supply a range of JobIDs on the WMAgent side. Two problems with this
- as a user I have no idea what these are, I am just interested in my personal range, not how they map on the other end
- Because of workqueues, etc, I don't think we can guarantee that a users CRAB task is assigned to sequential numbers of jobs on the Agent side.
improve and update documentation
this is the entry point for the analysis specific documentation:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/WMAgentRunAnalysis
this is already a bit obsolete in some part and not complete in others.
Document REST api for job submission/tracking
The REST api the Server presents needs to be well defined before significant development begins - is what we've got from Perugia sufficient?
Add API for job status summary
Need to provide an API that reports the amout of job in each status (symmetric to #1446 ticket for CRABClient).
Evaluate ReqMgr as a frontend to CRABServer
Is ReqMgr a good fit for the request management in CRAB3? Does it align with CRAB/analysis requirements? Is it too much or not enough? What changes would be needed to make it more suitable?
Unit test for getLog API
hostcert.pem and hostkey.pem paths should be configurable in Credential API
Now the paths of hostkey and hostcert are hardcoded in the credential API. They are set to $HOME/.globus/hostkey.pem and $HOME/.globus/hostcert.pem. It is needed to get them configurable: passing them in the input dictionary of credential API while setting their default values to $HOME/.globus/hostkey.pem and $HOME/.globus/hostcert.pem
Meta-discussion on user sandbox distribution issues
Split off from #681.
We have ticket #1151, which I assumed to be a worker node issue. Job lands on a worker node with a list of URLs for sandboxes (user code, LHE files, etc) to fetch and does so, hopefully hitting a squid cache to reduce overall traffic.
We also need a discussion of how the sandbox gets off the UI to the agent or some other "sand box cache" which is what Daniele was getting at in
(to clarify: the basic prototype we are finally going to propose will most probably not shipping the user sandboxes to the agent..at least not implementing the "final strategy" which is matter of discussion here).
Along the time we discussed 3 possible approaches: Https-chirp-gFtp. In principle those are not excluding each other, but, of course to exploit (more than one of) them we should have a generic interface (e.g. what in the past era we called SE API).
I personally have direct experience with the third option listed above and I think make sense to evaluate also the others.
Said that I would like to trigger attention here and came up with a plan, which IMHO means:
-- to define what we want to support (how local schedulers should be supported)
-- to define which kind of API we want to interact with what we want to support (if we'll have more then once choice )
Implement validation of CMS names
using lexicon
{{{
from WMCore.Lexicon import cmsname
cmsname('T2_IT_PISA')
}}}
we can catch big "error"
also a full validation is needed and this should uses list of known sites in SiteDB.
Add additional WMSpecs for user generated MC as needed.
We probably need a new spec(s) for the various generators plus user code to support the needed user MC use cases.
Review install documentation (with wider WMCore)
Improve CRAB REST interface
- /config API need to be fixed
- API to return server infos to be used by the client need to be implemented
- API for getting task status which report percentage of job/state must be exposed.
Extend ConfigCache for non-trivial user jobs
AsyncStageout should be a component
The AsyncStageout component should be written in a WMAgent components style.
Add stageout parameters to WMSpec
Split off from #638.
Add parameters in WMSpec for user output like remote destination and dataset name used for publishing. Please add the parameters here and ewv will implement.
Server side crab -report functionality
Parse FWJRs from couch returning lists of lumis successfully processed
Complete porting of FeederManager to wmcore
Remove the usage of ProdCommon WorkQueue from FeederManager.
Initial async stage out implementation
this ticket must be merged with https://svnweb.cern.ch/trac/CMSDMWM/ticket/95
Evaluate WMSpec usage to pass parameters at submission level
Split off from #638.
After having the air plugins ready it would be possible to understand which information are needed.
Be able to pass, in the WMSpec, some non-standard but used parameters for submission and copying, like myproxy serve used, role, and group. Please add others here and ewv will implement.
documentation improvements
Based on the first round of feedbacks
CRABInterface communications with UserFileCache
We need to have CRABInterface API changes to upload the user sand box the same way CRABInterface interacts with the config cache.
The UserFileCache component is covered in #1400
Need new component, InputSandboxCache?
We spent a while discussing this today. All of us favor an approach where the user sandbox flow is as follows:
Client uploads the sandbox to ReqMgr/CRABInterface via http/s in the same way that the CMSSW _cfg.py is uploaded. This will be secured by X509 proxy, same as posting to the CRABInterface.
The CRABInterface uploads, via REST interface, the user sandbox to the sandbox cache which responds with an identifier for the sandbox in "the cache". This identifier is returned to the client. When the job is submitted by the client, this identifier is passed along to the various work queues and is included in the job spec.
Here the handling of the config in Couch and the sandbox in a different cache would differ. The user sandbox would not be placed in the job sandbox, but would rather be downloaded directly by the worker node once the job has started. Eventually this wget would go through a squid cache at the remote site and result in smaller network loads.
Presumably the identifier in the cache would be or would include a hash of the contents of the sandbox so that repeated submission of the same sandbox would not result in wasted space in the cache nor extra bandwidth between the squid and the hash.
The other option, not favored, was to have the local work queue fetch the sandbox from the cache and include it in the job sandbox. We felt this would waste too much bandwidth between the submitting machine and the remote CE.
In any case the major issue is that we need to find or build "the cache" with a REST interface. Does any such thing exist in our software stack already or do we have the option to use a third party supplied option? This would probably not be the most difficult thing to write ourselves, but we worry about doing it right. On the other hand, something we do ourselves can easily include cleanups, diagnostics for Ops, and perhaps pinning of additional sandboxes for MC generation, etc.
This whole approach has the advantage of allowing staged testing. Initially we would use a static URL as the sandbox without any upload capability but test the WN or workqueue level stuff that will have to be added to allow HTTP accessible sand boxes.
We'd like to have a discussion, both of the sandbox data flow and possible implementations of the cache before opening a couple more tickets to address all the details.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.