Giter Club home page Giter Club logo

idap-200gbps-atlas's Introduction

IDAP 200 Gbps with ATLAS PHYSLITE

Targeting analysis at 200 Gbps with ATLAS PHYSLITE. This repository is very much a work in progress.

ATLAS does not have released OpenData, so there isn't an AGC we can copy and try to run. As a result, this repository's main purpose is as a facilities test:

  • Run from PHYSLITE
  • Load 200 Gbps off of the PHYSLITE samples
  • Push all that data downstream to DASK (or similar) workers.

We have a losely tracked set of lessons learned.

Description of files

  • materialize_branches.ipynb: read list of branches, distributable with Dask (use for benchmarking)

Usage

When run on the UChicago AF Jupyter Notebook no package installs are required.

There is a requirements.txt which should allow this to be run on a bare-bones machine (modulo location of files, etc.).

If you are going to use the servicex version, you have to pin dask_awkward==2024.2.0. The future versions have a bug which hasn't been fixed yet.

Input file details

The folder input_files contains the list of input containers / files and related metadata plus scripts to produce these.

In total:

  • number of files: 219,029
  • size: 191.073 TB
  • number of events: 23,347,787,104

with additional files:

  • input_files/find_containers.py: query rucio for a list of containers, given a list of (hardcoded) DSIDs

  • input_files/container_list.txt: list of containers to run over

  • input_files/produce_container_metadata.py: query metadata for containers: number of files / events, size

  • input_files/container_metadata.json: output of input_files/produce_container_metadata.py with container metadata

  • input_files/get_file_list.py: for a given dataset creates a txt file listing file access paths that include appropriate xcache. The same kind of output can be obtained by doing:

    export SITE_NAME=AF_200
    rucio list-file-replicas mc20_13TeV:mc20_13TeV.364126.Sherpa_221_NNPDF30NNLO_Zee_MAXHTPTV500_1000.deriv.DAOD_PHYSLITE.e5299_s3681_r13145_p6026 --protocol root  --pfns --rses MWT2_UC_LOCALGROUPDISK
    
  • input_files/containers_to_files.py: process the list of containers into a list of files per container with hardcoded xcache instances, writes to input_files/file_lists/*.

Branch list determination

Branches to be read are determined with a 2018 data file.

  • input_files/size_per_branch.ipynb: produce breakdown of branch sizes for given file
  • input_files/branch_sizes.json: output of notebook above

Acknowledgements

NSF-1836650 PHY-2323298

This work was supported by the U.S. National Science Foundation (NSF) cooperative agreements OAC-1836650 and PHY-2323298 (IRIS-HEP).

idap-200gbps-atlas's People

Contributors

alexander-held avatar gordonwatts avatar ivukotic avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.