Giter Club home page Giter Club logo

root_pandas's Introduction

root_pandas

Build Status PyPI DOI

root_pandas is a convenience package built around the root_numpy library. It allows you to easily load and store pandas DataFrames using the columnar ROOT data format used in high energy physics.

It's modeled closely after the existing pandas API for reading and writing HDF5 files. This means that in many cases, it is possible to substitute the use of HDF5 with ROOT and vice versa.

On top of that, root_pandas offers several features that go beyond what pandas offers with read_hdf and to_hdf.

These include

  • Specifying multiple input filenames, in which case they are read as if they were one continuous file.
  • Selecting several columns at once using * globbing and {A,B} shell patterns.
  • Flattening source files containing arrays by storing one array element each in the DataFrame, duplicating any scalar variables.

Reading ROOT files

This is how you can read the contents of a ROOT file into a DataFrame:

from root_pandas import read_root

df = read_root('myfile.root')

If there are several ROOT trees in the input file, you have to specify the tree key:

df = read_root('myfile.root', 'mykey')

Specific columns can be selected like this:

df = read_root('myfile.root', columns=['variable1', 'variable2'])

You can also use * in the column names to read in any matching branch:

df = read_root('myfile.root', columns=['variable*'])

In addition, you can use shell brace patterns as in

df = read_root('myfile.root', columns=['variable{1,2}'])

You can also use * and {a,b} simultaneously, and several times per string.

Working with stored arrays can be a bit inconventient in pandas. root_pandas makes it easy to flatten your input data, providing you with a DataFrame containing only scalars:

df = read_root('myfile.root', columns=['arrayvariable', 'othervariable'], flatten=True)

Assuming the ROOT file contains the array [1, 2, 3] in the first arrayvariable column, flattening will expand this into three entries, where each contains one of the array elements. All other scalar entries are duplicated. The automatically created __array_index column also allows you to get the index that each array element had in its array before flattening.

There is also support for working with files that don't fit into memory: If the chunksize parameter is specified, read_root returns an iterator that yields DataFrames, each containing up to chunksize rows.

for df in read_root('bigfile.root', chunksize=100000):
    # process df here

You can also combine any of the above options at the same time.

Writing ROOT files

root_pandas patches the pandas DataFrame to have a to_root method that allows you to save it into a ROOT file:

df.to_root('out.root', key='mytree')

You can also call the to_root function and specify the DataFrame as the first argument:

to_root(df, 'out.root', key='mytree')

By default, to_root erases the existing contents of the file. Use mode='a' to append:

for df in read_root('bigfile.root', chunksize=100000):
    df.to_root('out.root', mode='a')

When doing this to stream data from one ROOT file into another, you shouldn't forget to os.remove the output file first, otherwise you will append more and more data to it on each run of your program.

root_pandas's People

Contributors

ibab avatar remenska avatar alexpearce avatar konstantinschubert avatar maxnoe avatar

Watchers

Ruggero Turra avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.