Comments (4)
Comment by slacapra on Wed Jun 11 11:05:36 2008
General idea on this item:
User provide a crab.cfg template and a new conf file (let's call it multicrab.cfg )
In multicrab.cfg user will specify things which are specific for each datasets, namely:
datasetName (obviously),
splitting keys (might be of interest to run on all Higgs sample but just 10^6 QCD background)
storage_path (which can be "general_storage_path"+"dataset_name") so a "name" can be enough
and these parameters are to be defined for each dataset he want to access.
Also, a section "general" where to specify the crab.cfg template plus some common stuff, such as "general_storage_path" or similar. Stuff which are not to be modified with keys dataset-dependent, are supposed to stay in crab.cfg-template.
Eg: SE will go to crab.cfg-template, while SE_path should go into multicrab.cfg because the actual value (to be passed to crab) is modified by multicrab.
Then multicrab is run, with exactly the same set of command of crab. multicrab create N istances of crab, passing to each the proper configuration, and run it. It might possibly be multithread (not at the beginning).
I would avoid a command line call
crab -cfg firstDataset
crab -cfg secondDataset
in favour of instanciating N object of crab class, but let's see how difficult it is. Maybe we can start with CLI like calls.
The output of the connad should be short enough so that it make sense to return it to user when, say, 10 dataset are accessed: eg, multicrab -status should actually return crab -report (and not crab -status, which is too long for this kind of thing).
One problem that I haven't (yet) figured out is how to format the multicrab.cfg: ideally I would like to have same syntax as crab.cfg (or similar), but since we need one section for each dataset, this can be a problem, since the names of the sections are to be known beforehand and not user-defined.
[dataset1]
datasetpath = ...
total_number_of_events = -1
event_per_job = 100000
[dataset2]
datasetpath = ...
total_number_of_events = 10000000
event_per_job = 100000
An other possiblity is to have a single line for each dataset
dataset1, dspath1, -1, 10000
dataset2, dspath2, 1000000, 10000
but this can be error prone.
That's it, for the time being.
from crab2.
Comment by slacapra on Fri Jul 18 13:28:43 2008
Dear all,
I've just committed a prototype of "multicrab", namely a enhanced crab functionality to allow multiple crab task handling in parallel.
We had discussed the idea some time ago: basically you have your code and you want to run it on N different dataset (say signal & various backgrounds). Always the same stuff, with the same crab.cfg but for some changes such as, datasetpath, total_number_events and so on.
I've created a new script
multicrab and multicrab.py (wrapper and python class, respectively) which can be used together with a new multicrab.cfg file to achieve this. multicrab read its cfg file, use a crab.cfg template, modify what is dataset specific and then issues N instances of CRAB.
The command for multicrab are exactly the same of crab, since they are just passed to the latter. The only thing to do is to have a crab.cfg (as usual) plus a multicrab.cfg, which has the following syntax (with comment)
<start of multicrab.cfg>
section for multicrab: now has just the template crab.cfg, but more
keys might appear in the future
[MULTICRAB]
cfg=crab.cfg
Section in common for all dataset
General idea: you define all the parameter in the template (crab.cfg), # but you might want to change the template values for all dataset.
The general syntax is that you first put the crab.cfg [SECTION] and
the the crab.cfg [key], with a "." in between, exactly as you would do
to pass to CRAB keys via command line.
Any parameters can be set or changed
[COMMON]
EDG.se_black_list=es
Add a section for each dataset you want to access (or, more precisely,
any task you want to create).
The name of the section will be used as USER.ui_working_dir, so the
stuff for this dataset will be found in Wmunu/ directory.
Any name is allowed (but MULTICRAB and COMMON) and any number of
sections can be added
The syntax for the parameters is the one described before
SECTION.key=value
and any parameter can be changed. Otherwise, the template one will be
used.
[Wmunu]
CMSSW.datasetpath=/Zmumu/CSA08_CSA08_S156_v1/GEN-SIM-RECO
CMSSW.total_number_of_events=10
CMSSW.number_of_jobs = 5
[Zmunu]
CMSSW.datasetpath=/Zmumu/CSA08_CSA08_S156_v1/GEN-SIM-RECO
CMSSW.total_number_of_events=-1
CMSSW.number_of_jobs = 5
<end multicrab.cfg>
What is missing:
o) for sure we want to add somehow the name of the section to the se_storage_path, so that the output for a given dataset goes to a meaningful place.
o) should check the order of parameter setting: namely with this prioriy (eg) DatasetSpecific/Common/Template (not 100% sure it's the case now)
o) all the code is in a single file (multicrab.py) maybe it's worth splitting
o) documentation
o) feedback :-)
You use multicrab as you would use crab
multicrab -create
multicrab -submit
multicrab -status
multicrab -get
but, since you also have the different crab task, you might want to give command to a given task via crab, in the usual way (eg):
crab -kill all -c Wmunu
That should be all, any comment or feedback is welcome!
Cheers,
Stefano
from crab2.
Comment by slacapra on Mon Jul 21 06:12:42 2008
o) for sure we want to add somehow the name of the section to the se_storage_path, so that the output for a given dataset goes to a meaningful place.
this is now done and in CVS
from crab2.
Closed by spiga on Sun Nov 16 10:56:03 2008
from crab2.
Related Issues (20)
- in NodeNameUtils use cmsweb not cmsweb-testbed for sename (after Oct 6)
- make sure we do not publish unless storage element has valid PNN format HOT 1
- take care of unknown SE's HOT 1
- provide script to change site_origin_name for a dataset in phys03
- allow to publish to /store/group or similar even when enforcing storage_name = valid PNN
- in current 2_11_1 user_dir is replicated when not publishing HOT 4
- make sure to tell user which LFN will be used for stageout
- import fix for find_dupl.sh from Subir Sarkar
- crab_2_11_1 fails to parse users se white list HOT 1
- in 2_11_1 make sure that proxy is there and X509_USER_PROXY defined also when using CAF HOT 2
- deal with PNN's missing in SiteDB PNN to PSN map
- exit immediately and tell if user is trying to use crab server
- crab2_11_1 fails on non-EDM files the new PNN thing HOT 3
- possible problem with lumimask HOT 1
- set HTC accountingGroup to analisis.<username> HOT 1
- update to latest WMCore Lexicon 1.0.0.patch2
- improve help for DBS3SetDatasetLocation.py HOT 2
- report DBS3 client version
- properly handle HTC-CE in reporting SyncCE
- ㅣㄴ
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crab2.