Giter Club home page Giter Club logo

Comments (4)

ericvaandering avatar ericvaandering commented on July 28, 2024

Comment by slacapra on Wed Jun 11 11:05:36 2008

General idea on this item:

User provide a crab.cfg template and a new conf file (let's call it multicrab.cfg )

In multicrab.cfg user will specify things which are specific for each datasets, namely:
datasetName (obviously),
splitting keys (might be of interest to run on all Higgs sample but just 10^6 QCD background)
storage_path (which can be "general_storage_path"+"dataset_name") so a "name" can be enough

and these parameters are to be defined for each dataset he want to access.

Also, a section "general" where to specify the crab.cfg template plus some common stuff, such as "general_storage_path" or similar. Stuff which are not to be modified with keys dataset-dependent, are supposed to stay in crab.cfg-template.
Eg: SE will go to crab.cfg-template, while SE_path should go into multicrab.cfg because the actual value (to be passed to crab) is modified by multicrab.

Then multicrab is run, with exactly the same set of command of crab. multicrab create N istances of crab, passing to each the proper configuration, and run it. It might possibly be multithread (not at the beginning).
I would avoid a command line call
crab -cfg firstDataset
crab -cfg secondDataset
in favour of instanciating N object of crab class, but let's see how difficult it is. Maybe we can start with CLI like calls.

The output of the connad should be short enough so that it make sense to return it to user when, say, 10 dataset are accessed: eg, multicrab -status should actually return crab -report (and not crab -status, which is too long for this kind of thing).

One problem that I haven't (yet) figured out is how to format the multicrab.cfg: ideally I would like to have same syntax as crab.cfg (or similar), but since we need one section for each dataset, this can be a problem, since the names of the sections are to be known beforehand and not user-defined.
[dataset1]
datasetpath = ...
total_number_of_events = -1
event_per_job = 100000
[dataset2]
datasetpath = ...
total_number_of_events = 10000000
event_per_job = 100000

An other possiblity is to have a single line for each dataset
dataset1, dspath1, -1, 10000
dataset2, dspath2, 1000000, 10000
but this can be error prone.

That's it, for the time being.

from crab2.

ericvaandering avatar ericvaandering commented on July 28, 2024

Comment by slacapra on Fri Jul 18 13:28:43 2008

Dear all,
I've just committed a prototype of "multicrab", namely a enhanced crab functionality to allow multiple crab task handling in parallel.

We had discussed the idea some time ago: basically you have your code and you want to run it on N different dataset (say signal & various backgrounds). Always the same stuff, with the same crab.cfg but for some changes such as, datasetpath, total_number_events and so on.

I've created a new script
multicrab and multicrab.py (wrapper and python class, respectively) which can be used together with a new multicrab.cfg file to achieve this. multicrab read its cfg file, use a crab.cfg template, modify what is dataset specific and then issues N instances of CRAB.
The command for multicrab are exactly the same of crab, since they are just passed to the latter. The only thing to do is to have a crab.cfg (as usual) plus a multicrab.cfg, which has the following syntax (with comment)

<start of multicrab.cfg>

section for multicrab: now has just the template crab.cfg, but more

keys might appear in the future

[MULTICRAB]
cfg=crab.cfg

Section in common for all dataset

General idea: you define all the parameter in the template (crab.cfg), # but you might want to change the template values for all dataset.

The general syntax is that you first put the crab.cfg [SECTION] and

the the crab.cfg [key], with a "." in between, exactly as you would do

to pass to CRAB keys via command line.

Any parameters can be set or changed

[COMMON]
EDG.se_black_list=es

Add a section for each dataset you want to access (or, more precisely,

any task you want to create).

The name of the section will be used as USER.ui_working_dir, so the

stuff for this dataset will be found in Wmunu/ directory.

Any name is allowed (but MULTICRAB and COMMON) and any number of

sections can be added

The syntax for the parameters is the one described before

SECTION.key=value

and any parameter can be changed. Otherwise, the template one will be

used.

[Wmunu]
CMSSW.datasetpath=/Zmumu/CSA08_CSA08_S156_v1/GEN-SIM-RECO
CMSSW.total_number_of_events=10
CMSSW.number_of_jobs = 5

[Zmunu]
CMSSW.datasetpath=/Zmumu/CSA08_CSA08_S156_v1/GEN-SIM-RECO
CMSSW.total_number_of_events=-1
CMSSW.number_of_jobs = 5
<end multicrab.cfg>

What is missing:
o) for sure we want to add somehow the name of the section to the se_storage_path, so that the output for a given dataset goes to a meaningful place.
o) should check the order of parameter setting: namely with this prioriy (eg) DatasetSpecific/Common/Template (not 100% sure it's the case now)
o) all the code is in a single file (multicrab.py) maybe it's worth splitting
o) documentation
o) feedback :-)

You use multicrab as you would use crab
multicrab -create
multicrab -submit
multicrab -status
multicrab -get

but, since you also have the different crab task, you might want to give command to a given task via crab, in the usual way (eg):

crab -kill all -c Wmunu

That should be all, any comment or feedback is welcome!

Cheers,
Stefano

from crab2.

ericvaandering avatar ericvaandering commented on July 28, 2024

Comment by slacapra on Mon Jul 21 06:12:42 2008

o) for sure we want to add somehow the name of the section to the se_storage_path, so that the output for a given dataset goes to a meaningful place.

this is now done and in CVS

from crab2.

ericvaandering avatar ericvaandering commented on July 28, 2024

Closed by spiga on Sun Nov 16 10:56:03 2008

from crab2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.