Giter Club home page Giter Club logo

hgg-coffea's Introduction

hgg-coffea

Repository for building a columnar analysis for Hgg using nanoAOD samples

Structure

Example worfkflow for drell-yan studies is included.

Each workflow can be a separate "processor" file, creating the mapping from NanoAOD to the histograms we need. Workflow processors can be passed to the runner.py script along with the fileset these should run over. Multiple executors can be chosen (for now iterative - one by one, uproot/futures - multiprocessing and dask-slurm).

To run the example, run:

python runner.py --workflow dystudies

Example plots can be found in make_some_plots.ipynb though we might want to make that more automatic in the end.

Requirements

Coffea installation with Miniconda

For installing Miniconda, see also https://hackmd.io/GkiNxag0TUmHnnCiqdND1Q#Local-or-remote

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Run and follow instructions on screen
bash Miniconda3-latest-Linux-x86_64.sh

NOTE: always make sure that conda, python, and pip point to local Miniconda installation (which conda etc.).

You can either use the default environmentbase or create a new one:

# create new environment with python 3.7, e.g. environment of name `coffea`
conda create --name coffea python=3.7
# activate environment `coffea`
conda activate coffea

Install coffea, xrootd, and more:

conda install -c conda-forge coffea # pip install git+https://github.com/CoffeaTeam/coffea.git # for bleeding edge
conda install -c conda-forge xrootd
conda install -c conda-forge ca-certificates
conda install -c conda-forge ca-policy-lcg
conda install -c conda-forge dask-jobqueue
conda install -c anaconda bokeh 
conda install -c conda-forge 'fsspec>=0.3.3'
conda install dask

Other installation options for coffea

See https://coffeateam.github.io/coffea/installation.html

Running jupyter remotely

See also https://hackmd.io/GkiNxag0TUmHnnCiqdND1Q#Remote-jupyter

  1. On your local machine, edit .ssh/config:
Host lxplus*
  HostName lxplus7.cern.ch
  User <your-user-name>
  ForwardX11 yes
  ForwardAgent yes
  ForwardX11Trusted yes
Host *_f
  LocalForward localhost:8800 localhost:8800
  ExitOnForwardFailure yes
  1. Connect to remote with ssh lxplus_f
  2. Start a jupyter notebook:
jupyter notebook --ip=127.0.0.1 --port 8800 --no-browser
  1. URL for notebook will be printed, copy and open in local browser

Scale-out (Sites)

Scale out can be notoriously tricky between different sites. Coffea's integration of slurm and dask makes this quite a bit easier and for some sites the ``native'' implementation is sufficient, e.g Condor@DESY. However, some sites have certain restrictions for various reasons, in particular Condor @CERN and @FNAL.

Condor@FNAL (CMSLPC)

Follow setup instructions at https://github.com/CoffeaTeam/lpcjobqueue, run them from within the hgg-coffea directory that you have checked out. After starting the singularity container run a test with

python runner.py --meta Era2017_legacy_v1.json --wf dystudies -d root://cmseos.fnal.gov//store/user/$USER/hgg_test/ --executor dask/lpc --samples filefetcher/dystudies.json --chunk=100000 --max=5

Condor@CERN (lxplus)

Only one port is available per node, so its possible one has to try different nodes until hitting one with 8786 being open. Other than that, no additional configurations should be necessary.

python runner.py --wf dystudies --executor dask/lxplus

Coffea-casa (Nebraska AF)

Coffea-casa is a JupyterHub based analysis-facility hosted at Nebraska. For more information and setup instuctions see https://coffea-casa.readthedocs.io/en/latest/cc_user.html

After setting up and checking out this repository (either via the online terminal or git widget utility run with

python runner.py --wf dystudies --executor dask/casa

Authentication is handled automatically via login auth token instead of a proxy. File paths need to replace xrootd redirector with "xcache", runner.py does this automatically.

Canned example on cms-lpc scaleout

python runner.py --meta Era2017_legacy_v1.json --wf dystudies -d root://cmseos.fnal.gov//store/user/$USER/hgg_test/ --executor dask/lpc --samples filefetcher/dystudies.json --chunk=100000 --scaleout 1 --limit 2 --only DYJets-M50 --ts DummyTagger1 DummyTagger2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.