darothen / xbpch Goto Github PK

View Code? Open in Web Editor NEW

17.0 5.0 5.0 2 MB

xarray interface for bpch files

License: MIT License

Python 100.00%

xarray python climate geos-chem data analysis binary

xbpch's Introduction

xbpch: xarray interface for bpch files

xpbch is a simple utility for reading the proprietary binary punch format (bpch) outputs used in versions of GEOS-Chem earlier than v11-02. The utility allows a user to load this data into an xarray- and dask-powered workflow without necessarily pre-processing the data using GAMAP or IDL.

This package is maintained as part of a broader, community effort to tackle big data problems in geoscience.

What's the Deal?

The contemporary scientific Python software stack provides free, powerful tools for nearly all of your data processing, analysis, and visualization needs. These tools are well supported by a large community of heavily invested users and developers from academia, government, and industry. They are also developed (mostly) as part of community-based, open-source, and user-driven projects.

For nearly any application you might have in the geosciences, you can start using this powerful, free software stack today with minimal friction. However, one friction point that has tripped up adoption by GEOS-Chem users is that it is difficult to work with legacy bpch-format diagnostics files. xbpch solves this problem by providing a convenient and performant way to read these files into a modern Python-based analysis or workflow.

Furthermore, xbpch is 100% future-proof. In two years, when your GEOS-Chem simulations are writing NetCDF diagnostics, you won't need to change more than a single line of code in any of your scripts using xbpch. All you'll need to do is swap out xbpch's function for reading data and instead defer to it's parent package (xarray). It will literally take less than 10 keystrokes to make this change in your code. Plus - you'll be backwards compatible with any legacy output you need to analyze.

So give xbpch a try, and let me know what issues you run in to! If we solve them once today, they'll be solved in perpetuity, which means more time for you to do science and less time to worry about processing data.

Installation

Requirements

xbpch is only intended for use with Python 3, although with some modifications it would likely work with Python 2.7 (Pull Requests are welcome!). As the package description implies, it requires up-to-date copies of xarray (>= version 0.9) and dask (>= version 0.14). The best way to install these packages is by using the conda package management system, or the Anaconda Python distribution.

To install xbpch and its dependencies using conda, execute from a terminal:

$ conda install -c conda-forge xbpch xarray dask

Alternatively, you can install xbpch from PyPI:

$ pip install xbpch

You can also install xbpch from its source. To do this, you can either clone the source directory and manually install:

$ git clone https://github.com/darothen/xbpch.git
$ cd xbpch
$ python setup.py install

or, you can install via pip directly from git:

$ pip install git+https://github.com/darothen/xbpch.git

Please note that if you locally clone the repository from GitHub but do not explicitly install the package using setup.py, the file xbpch/version.py will not get written properly and you will not be able to use the package. We strongly recommend you install the package using traditional techniques to ensure that all dependencies are properly added to your environment.

Quick Start

If you're already familiar with loading and manipulating data with xarray, then it's easy to dive right into xbpch. Navigate to a directory on disk which contains your .bpch output, as well as tracerinfo.dat and diaginfo.dat, and execute from a Python interpeter:

from xbpch import open_bpchdataset
fn = "my_geos_chem_output.bpch"
ds = open_bpchdataset(fn)

After a few seconds (depending on your hard-drive speed) you should be able to interact with ds just as you would any xarray.Dataset object.

Caveats and Future Notes

xbpch should work for most simple workflows, especially if you need a quick-and-dirty way to ingest legacy GEOS-Chem output. It is not tested against the majority of output grids, including data for the Hg model or nested models. Grid information (at least for the vertical) is hard-coded and may not be accurate for the most recent versions of GEOS-Chem.

Most importantly, xbpch does not yet solve the problem of manually scanning bpch files before producing a dataset on disk. Because the bpch format does not encode metadata about what its contents actually are, we must manually process this from any output file we wish to load. For the time being, we do not short-circuit this process because we cannot necessarily predict file position offsets in the bpch files we read. In the future, I hope to come up with an elegant solution for solving this problem.

Acknowledgments

This utility packages together a few pre-existing toolkits which have been floating around the Python-GEOS-Chem community. In particular, I would like to acknowledge the following pieces of software which I have built this utility around:

PyGChem by Benoit Bovy
gchem by Gerrit Kuhlmann

Furthermore, the strategies used to load and process binary output on disk through xarray's DataStore API is heavily inspired by Ryan Abernathey's package xmitgcm.

License

This work is licensed under a permissive MIT License. I acknowledge important contributions from Benoît Bovy, Gerrit Kuhlmann, and Christoph Keller in the form of prior work which helped create the foundation for this package.

Contact

Daniel Rothenberg - [email protected]

xbpch's People

Contributors

Stargazers

Watchers

Forkers

lizziel eklovens tsherwen sdeastham jinlx yumengch

xbpch's Issues

Missing version file causes import error

I had to add xbpch/version.py and define version within it for my xpbch fork to properly import. Could you add this in or make an update that similarly avoids the issue? My install from conda is fine so this should only impact users who fork the repo.

Lizzie (GEOS-Chem Support Team)

Upload to PyPI

Error opening BPCH output from ND51b diagnositic

GC version 13.3.4
ND51b output diagnostic over all lat/long grids.

I'm trying to read in a file using:

import xarray as xr
import xbpch as xb

ds = xb.open_bpchdataset(filename=bpchfile, tracerinfo_file=tinfo_file, diaginfo_file=dinfo_file)

and get the following error, which seems to be related to whether or not the xbpch /uff.py can find the prefix or suffix of a line with a specified format (fmt='i'). I tried toggling my endian input settings, but the default of big endian is the only one that doesn't give me an "OSError: ...This can happen if 'endian' is incorrect" error.

I'm not sure if this is an error in xbpch or if this isn't one of the types of files xbpch is supposed to be able to read or not? Any help would be appreciated!

The error report:

File "c:\users\jhask\onedrive\documents\python\my_functions\gcpy\examples\bpch_to_nc\untitled0.py", line 54, in bpch_to_netcdf
    ds = xb.open_bpchdataset(filename=bpchfile,

  File "C:\Users\jhask\anaconda3\envs\sci\lib\site-packages\xbpch\core.py", line 77, in open_bpchdataset
    store = BPCHDataStore(

  File "C:\Users\jhask\anaconda3\envs\sci\lib\site-packages\xbpch\core.py", line 279, in __init__
    self._bpch._read_header()

  File "C:\Users\jhask\anaconda3\envs\sci\lib\site-packages\xbpch\bpch.py", line 227, in _read_header
    line = self.fp.readline('20sffii')

  File "C:\Users\jhask\anaconda3\envs\sci\lib\site-packages\xbpch\uff.py", line 84, in readline
    prefix_size = self._fix()

  File "C:\Users\jhask\anaconda3\envs\sci\lib\site-packages\xbpch\uff.py", line 77, in _fix
    raise EOFError

EOFError

Regression Testing

Definitely need some basic testing system to ensure against regressions, especially as GEOS-Chem evolves.

Missing names in tracerinfo.dat cause type mismatch error upon read

Hi Daniel,

In order to successfully read GEOS-Chem 12.0.3 binary benchmark output I had to manually insert short and long names into tracerinfo.dat for ND21 tracers # 3 and # 60. Otherwise I got a type mismatch error since the 2nd entry for those tracers in the absence of the name columns was float (concentration) rather than string.

These names were blank in tracerinfo.dat due to a bug in GEOS-Chem which I just pushed a fix for in dev/12.1.0. However, anyone trying to use xbpch with ND21 output from recent older versions will run into this problem. For backwards compatibility I recommend adding some handling to check if the 2nd entry is float upon read of tracerinfo.dat. If the type is float then the names are missing in the file and dummy names should be assigned, allowing the read to proceed.

Lizzie (GEOS-Chem Support Team)

UserWarning: Duplicate names specified.

Hi --

FYI I get the following warning error when using xb.open_bpchdataset:

import os
import os.path as osp
import xarray as xr
import numpy as np
import xbpch as xb
ds = xb.open_bpchdataset('trac_avg.geosfp_4x5_POPs.201607010000.mp')

/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/bmy/miniconda/envs/bmy/lib/python3.6/site-packages/pandas/io/parsers.py:710: UserWarning: Duplicate names specified. This will raise an error in the future.
return _read(filepath_or_buffer, kwds)
/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/bmy/miniconda/envs/bmy/lib/python3.6/site-packages/xbpch/core.py:91: FutureWarning: iteration over an xarray.Dataset will change in xarray v0.11 to only include data variables, not coordinates. Iterate over the Dataset.variables property instead to preserve existing behavior in a forwards compatible manner.
for v in ds:

This might be generated in a dependent package like pandas but I'm not sure where.

I am using these versions:

xarray                    0.10.8                   py36_0    defaults
xbpch                     0.3.3                    py36_0    conda-forge
pandas                    0.23.4           py36h04863e7_0    defaults

Do you know of a workaround for this (or a package that needs to be updated)?

Thanks
Bob Y.

Incompatibility with xarray 0.12.0

Similar to issue #8: xbpch v0.3.4 seems to be incompatible with xarray v0.12.0

>>> import xbpch
Traceback (most recent call last):                                                                      
   File "~/miniconda/lib/python3.7/site-packages/xbpch/__init__.py", line 7       
      from . bpch import BPCHFile                                                                         
   File "~/miniconda/lib/python3.7/site-packages/xbpch/bpch.py", line 11 
      from xarray.core.pycompat import OrderedDict                                                      
ImportError: cannot import name 'OrderedDict' from 'xarray.core.pycompat' 
(~/miniconda/lib/python3.7/site-packages/xarray/core/pycompat.py)

CI automation of regression tests

Following completion of #1.

xbpch couldn't read all variables in a dataset

When I use Matlab read ctm.bpch, and I got these data variables.

however, when using xbpch in Jupyter, I only get DXYP variable
Here is my code:
import sys
import xbpch
ctm = 'C:\\Users\\dell\Desktop\\code\\ctm.bpch.20000120'
ctm_data = xbpch.open_bpchdataset(ctm)
print(ctm_data)
Output in Python:

My QA: where are the other three variables: BXHGHT-$ when using xbpch package? and how to fix this issue?

Error opening ND51 bpch file from GEOS-Chem v12.2.1 using xbpch 0.3.4

I am trying to use xbpch to open ND51 satellite diagnostic files from GEOS-Chem v12.2.1. I can import xbpch successfully, but when I use the package to open an ND51 file, I receive an error:

`import xbpch
fn = "ts_satellite.20160101.bpch"
ds = xbpch.open_bpchdataset(fn)

KeyError Traceback (most recent call last)
~/miniconda/envs/geo/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2655 try:
-> 2656 return self._engine.get_loc(key)
2657 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'name'

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in
1 import xbpch
2 fn = "ts_satellite.20160101.bpch"
----> 3 ds = xbpch.open_bpchdataset(fn)

~/miniconda/envs/geo/lib/python3.6/site-packages/xbpch/core.py in open_bpchdataset(filename, fields, categories, tracerinfo_file, diaginfo_file, endian, decode_cf, memmap, dask, return_store)
78 tracerinfo_file=tracerinfo_file,
79 diaginfo_file=diaginfo_file, endian=endian,
---> 80 use_mmap=memmap, dask_delayed=dask
81 )
82 ds = xr.Dataset.load_store(store)

~/miniconda/envs/geo/lib/python3.6/site-packages/xbpch/core.py in init(self, filename, fields, categories, fix_cf, mode, endian, diaginfo_file, tracerinfo_file, use_mmap, dask_delayed)
277
278 # Parse the binary file and prepare to add variables to the DataStore
--> 279 self._bpch._read_var_data()
280
281 # Create storage dicts for variables and attributes, to be used later

~/miniconda/envs/geo/lib/python3.6/site-packages/xbpch/bpch.py in read_var_data(self)
312 var_attr['unit'] = unit
313
--> 314 vname = diag['name']
315 fullname = category_name.strip() + "" + vname
316

~/miniconda/envs/geo/lib/python3.6/site-packages/pandas/core/frame.py in getitem(self, key)
2925 if self.columns.nlevels > 1:
2926 return self._getitem_multilevel(key)
-> 2927 indexer = self.columns.get_loc(key)
2928 if is_integer(indexer):
2929 indexer = [indexer]

~/miniconda/envs/geo/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2656 return self._engine.get_loc(key)
2657 except KeyError:
-> 2658 return self._engine.get_loc(self._maybe_cast_indexer(key))
2659 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2660 if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'name'`

I'm quite new to Python, so I'm not entirely sure what this error means. Maybe someone here can help?

@JiaweiZhuang I thought I would copy you, because this occurs when I use the preinstalled xbpch package from the geo environment on the Amazon Cloud GEOS-Chem tutorial AMI.

tracerinfo.dat not corrrectly read in GEOS-Chem v12.2.0

Hello Daniel,

When using xpbch to read (successfully completed) GEOS-Chem output from version v12.2.0, I see the following message:

/work/home/ts551/.conda/envs/py_3_7_master/lib/python3.7/site-packages/xbpch/util/diaginfo.py:104: UserWarning: At least one row in /work/home/ts551/GC/rundirs/geosfp_4x5_tropchem.v12.2.0.test_JNIT.3600s.J025/tracerinfo.dat wasn't decoded correctly; we strongly recommend you manually check that file to see that all tracers are properly recorded.

This is due to a change in the formatting of tracerinfo.dat between v12.1.1 and v12.2.0 (see below). Is it possible to allow for xpbch to read both formats?

---- change in tracerinfo.dat

in v12.1.1

#==============================================================================
# tracerinfo.dat: Created by GEOS-CHEM at 2019/01/15 10:31
#
# ****** CUSTOMIZED FOR NOx-Ox-Hydrocarbon-Aerosol SIMULATION *****
#
# This file contains name weight and index information about GEOS-CHEM
# tracers.  It is read by routine "ctm_tracerinfo.pro" of the GAMAP package.
#
# File Format:
# -----------------------------------------------------------------------------
# NAME     (A8   )  Tracer name (up to 8 chars)
#  --      (1X   )  1-character spacer
# FULLNAME (A30  )  Full tracer name (up to 30 chars)
# MOLWT    (E10.0)  Molecular weight (kg/mole)
# C        (I3   )  For HC's: # moles C/moles tracer; otherwise set=1
# TRACER   (I9   )  Tracer number (up to 9 digits)
# SCALE    (E10.3)  Standard scale factor to convert to unit given below
#  --      (1X   )  1-character spacer
# UNIT     (A40  )  Unit string
#
#==============================================================================
# GEOS-CHEM tracers [ppbv]
#==============================================================================
NO       NO tracer                      3.000E-02  1        1 1.000E+09 ppbv
O3       O3 tracer                      4.800E-02  1        2 1.000E+09 ppbv
PAN      PAN tracer                     1.210E-01  1        3 1.000E+09 ppbv
CO       CO tracer                      2.800E-02  1        4 1.000E+09 ppbv
...
AVGW     Mixing ratio of H2O vapor      0.000E+00  1    24003 1.000E+00 v/v
AIRNUMDE Dry air number density         0.000E+00  1    24004 1.000E+00 molec air/cm3
T        Temperature                    0.000E+00  1    24005 1.000E+00 K
PMID     Pressure at average pressure l 0.000E+00  1    24006 1.000E+00 hPa
PEDGE    Pressure at grid box lower edg 0.000E+00  1    24007 1.000E+00 hPa
RH       Relative humidity              0.000E+00  1    24008 1.000E+00 %
#==============================================================================
# ND69 diagnostic quantities
#==============================================================================
DXYP     Grid box surface area          0.000E+00  1    25001 1.000E+00 m2

 

in v12.2.0

#==============================================================================
# tracerinfo.dat: Created by GEOS-CHEM at 2019/02/20 11:28
#
# ****** CUSTOMIZED FOR NOx-Ox-Hydrocarbon-Aerosol SIMULATION *****
#
# This file contains name weight and index information about GEOS-CHEM
# tracers.  It is read by routine "ctm_tracerinfo.pro" of the GAMAP package.
#
# File Format:
# -----------------------------------------------------------------------------
# NAME     (A31  )  Tracer name (up to 31 chars)
#  --      (1X   )  1-character spacer
# FULLNAME (A30  )  Full tracer name (up to 30 chars)
# MOLWT    (E10.0)  Molecular weight (kg/mole)
# C        (I3   )  For HC's: # moles C/moles tracer; otherwise set=1
# TRACER   (I9   )  Tracer number (up to 9 digits)
# SCALE    (E10.3)  Standard scale factor to convert to unit given below
#  --      (1X   )  1-character spacer
# UNIT     (A40  )  Unit string
#
#==============================================================================
# GEOS-CHEM tracers [ppbv]
#==============================================================================
NO                              NO tracer                      3.000E-02  1        1 1.000E+09 ppbv
O3                              O3 tracer                      4.800E-02  1        2 1.000E+09 ppbv
PAN                             PAN tracer                     1.210E-01  1        3 1.000E+09 ppbv
CO                              CO tracer                      2.800E-02  1        4 1.000E+09 ppbv
ALK4                            ALK4 tracer                    1.200E-02  4        5 1.000E+09 ppbC
...
AVGW                            Mixing ratio of H2O vapor      0.000E+00  1    24003 1.000E+00 v/v
AIRNUMDEN                       Dry air number density         0.000E+00  1    24004 1.000E+00 molec air/cm3
T                               Temperature                    0.000E+00  1    24005 1.000E+00 K
PMID                            Pressure at average pressure l 0.000E+00  1    24006 1.000E+00 hPa
PEDGE                           Pressure at grid box lower edg 0.000E+00  1    24007 1.000E+00 hPa
RH                              Relative humidity              0.000E+00  1    24008 1.000E+00 %
#==============================================================================
# ND69 diagnostic quantities
#==============================================================================
DXYP                            Grid box surface area          0.000E+00  1    25001 1.000E+00 m2

xbpch.open_bpchdataset()---error: unpack requires a buffer of 36 bytes

There is an problem in processing some of GEOSChem output with 'error: unpack requires a buffer of 36 bytes'.
There are two output files: https://pan.baidu.com/s/19DhAyIpq_UDmH1Civ5v8FQ.
It works for 'ts_satellite.20080522.bpch' but with error for 'ts_satellite.20080523.bpch'

Depreciating PyGChem and redirecting users to xbpch

@darothen do you think that I can safely add a redirection note to xbpch in PyGChem's readme file?

The goal of PyGChem was to provide the minimal set of features needed to connect GEOS-Chem to the Python Scientific Stack. However,

PyGChem is not being maintained anymore, while xbpch is.
PyGChem provides the necessary tools to read the bpch format into iris cubes, while xbpch works with xarray. I think both are useful, but given that xarray will eventually implement conversion of xarray.DataArray objects to iris.cube.Cube objects, we probably don't need both features.
for GEOS-Chem's netcdf outputs, xarray can be used directly...
Besides reading files, the goal with PyGChem was also to provide a Python interface to HEMCO. But that feature has never been fully working and it perhaps makes more sense if such feature is maintained by the GC support-team.

Upload to conda-forge

conflicting sizes for dimension in reading bpch

Hi there,
when i used open_bpchdataset module to open the ND51 output files of GEOS-Chem (V12.5.0), it turns out there is error about dimension dismatch.

The spatial resolution of my similation is 2x2.5. Is there anything wrong with my tracerinfo.dat?

Thanks in advance!

Best,
rainbow

Incompatibility with Pandas 1.1.0

I ran into an error reading a GEOS-Chem bpch file after upgrading to Pandas to 1.1.0. I traced the problem to this section of code in util/diaginfo.py:

    tracer_df = (
        tracer_df
            .apply(_assign_hydrocarbon, axis=1)
            .assign(chemical=lambda x: x['molwt'].astype(bool))
    )

Before that code is executed tracer_df correctly stores tracerinfo.dat content:

       name                       full_name    molwt  C  tracer         scale  \
0      ACET                     ACET tracer  0.01200  3       1  1.000000e+09   
1      ACTA                     ACTA tracer  0.06006  1       2  1.000000e+09   
2      AERI                     AERI tracer  0.12690  1       3  1.000000e+09   
3      ALD2                     ALD2 tracer  0.01200  2       4  1.000000e+09

Following the apply, all rows are for ACET which is wrong:

      name    full_name  molwt  C  tracer         scale  unit  hydrocarbon  \
0     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
1     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
2     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False   
3     ACET  ACET tracer  0.012  3       1  1.000000e+09  ppbC        False

I was able to fix it by initializing the new column 'hydrocarbon' prior to the apply:

    tracer_df['hydrocarbon']=False                                                                     
    tracer_df = (
        tracer_df
            .apply(_assign_hydrocarbon, axis=1)
            .assign(chemical=lambda x: x['molwt'].astype(bool))
    )

I downgraded my pandas version to 0.25.1 and verified this was not necessary in that older version, but it is in the new version.

Here is the error message I got to help others find this issue via search:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: 'name'
The above exception was the direct cause of the following exception:
KeyError                                  Traceback (most recent call last)
<ipython-input-43-2d7bd5a6928f> in <module>
      2     ds = xb.open_bpchdataset(filename=gcc_bpch,
      3                              tracerinfo_file=tracerinfo_f,
----> 4                              diaginfo_file=diaginfo_f)
      5 except FileNotFoundError:
      6     print('Could not find file {}'.format(bpchfile))
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/core.py in open_bpchdataset(filename, fields, categories, tracerinfo_file, diaginfo_file, endian, decode_cf, memmap, dask, return_store)
     79         tracerinfo_file=tracerinfo_file,
     80         diaginfo_file=diaginfo_file, endian=endian,
---> 81         use_mmap=memmap, dask_delayed=dask
     82     )
     83     ds = xr.Dataset.load_store(store)
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/core.py in __init__(self, filename, fields, categories, fix_cf, mode, endian, diaginfo_file, tracerinfo_file, use_mmap, dask_delayed)
    278 
    279         # Parse the binary file and prepare to add variables to the DataStore
--> 280         self._bpch._read_var_data()
    281 
    282         # Create storage dicts for variables and attributes, to be used later
/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/bpch.py in _read_var_data(self)
    312             var_attr['unit'] = unit
    313 
--> 314             vname = diag['name']
    315             fullname = category_name.strip() + "_" + vname
    316 
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2897             if self.columns.nlevels > 1:
   2898                 return self._getitem_multilevel(key)
-> 2899             indexer = self.columns.get_loc(key)
   2900             if is_integer(indexer):
   2901                 indexer = [indexer]
/anaconda3/envs/gcpy/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:
KeyError: 'name'

Incompatibility with xarray 0.10.2

xbpch 0.3.0 is not compatible with xarray 0.10.2:

In [1]: import xbpch
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-659176cbe09c> in <module>()
----> 1 import xbpch

~/Research/Computing/miniconda3/envs/geo/lib/python3.6/site-packages/xbpch/__init__.py in <module>()
      5     pass
      6
----> 7 from . bpch import BPCHFile
      8 from . core import open_bpchdataset, open_mfbpchdataset

~/Research/Computing/miniconda3/envs/geo/lib/python3.6/site-packages/xbpch/bpch.py in <module>()
     12
     13 from . uff import FortranFile
---> 14 from . util import cf
     15 from . util.diaginfo import get_diaginfo, get_tracerinfo
     16

~/Research/Computing/miniconda3/envs/geo/lib/python3.6/site-packages/xbpch/util/cf.py in <module>()
     12
     13 from xarray.core.variable import as_variable, Variable
---> 14 from xarray.core.indexing import LazilyIndexedArray
     15 from xarray.conventions import MaskedAndScaledArray
     16

ImportError: cannot import name 'LazilyIndexedArray'

Those import commands no longer work in xarray 0.10.2:

In [2]: from xarray.core.indexing import LazilyIndexedArray
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-247a23d0f2bf> in <module>()
----> 1 from xarray.core.indexing import LazilyIndexedArray

ImportError: cannot import name 'LazilyIndexedArray'

In [3]: from xarray.conventions import MaskedAndScaledArray
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-3-6f922ed71e58> in <module>()
----> 1 from xarray.conventions import MaskedAndScaledArray

ImportError: cannot import name 'MaskedAndScaledArray'

This is about xarray.core so is not documented in xarray change log

Xarray Error while implementing bpch_to_nc script of xbpch: conflicting sizes for dimension

Hello everyone,

I am facing an error of " conflicting sizes for dimension time " when I am implementing bpch_to_nc script of xbpch. If anyone knows the solution for this error please let me know.

In the terminal after executing the command I get:

Reading in file(s)...
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py:710: UserWarning: Duplicate names specified. This will raise an error in the future.
return _read(filepath_or_buffer, kwds)
Traceback (most recent call last):
File "/usr/local/bin/bpch_to_nc", line 4, in
import('pkg_resources').run_script('xbpch==0.3.3', 'bpch_to_nc')
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 657, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 1437, in run_script
exec(code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/xbpch-0.3.3.dist-info/scripts/bpch_to_nc", line 77, in
ds = open_bpchdataset(args.bpch_files[0], **open_kws)
File "/usr/local/lib/python2.7/dist-packages/xbpch/core.py", line 82, in open_bpchdataset
ds = xr.Dataset.load_store(store)
File "/usr/local/lib/python2.7/dist-packages/xarray/core/dataset.py", line 397, in load_store
obj = cls(variables, attrs=attributes)
File "/usr/local/lib/python2.7/dist-packages/xarray/core/dataset.py", line 365, in init
self._set_init_vars_and_dims(data_vars, coords, compat)
File "/usr/local/lib/python2.7/dist-packages/xarray/core/dataset.py", line 383, in _set_init_vars_and_dims
data_vars, coords, compat=compat)
File "/usr/local/lib/python2.7/dist-packages/xarray/core/merge.py", line 365, in merge_data_and_coords
indexes=indexes)
File "/usr/local/lib/python2.7/dist-packages/xarray/core/merge.py", line 443, in merge_core
dims = calculate_dimensions(variables)
File "/usr/local/lib/python2.7/dist-packages/xarray/core/dataset.py", line 109, in calculate_dimensions
(dim, size, k, dims[dim], last_used[dim]))
ValueError: conflicting sizes for dimension 'time': length 38 on 'IJ-SOA-$_POA' and length 19 on 'PL-SUL=$_SO2dms'

Thank you

-Pritanjali Shende

Separating data from one bpch into another bph by time before converting it into netcdf

Hi @darothen ,

I am trying to separates data from one bpch (one two-day data) into another bpch files (two one-day data) by time and convert bpch files into netcdf format. I was wondering if xbpch has similar function to bpch_sep in gamap to achieve it. One option I can figure out is to use xbpch to convert bpch into netcdf first then seperate by time by using cdo. But it's not very straightforward. Do you have any insight on it?

Cheers,
Lixu