geoschem / gcpy Goto Github PK

Python toolkit for GEOS-Chem. Contains basic plotting scripts, plus the suite of GEOS-Chem benchmarking utilities.

License: Other

Python 43.24% Jupyter Notebook 56.53% Shell 0.21% Dockerfile 0.02%

python cloud-computing visualization-tools geos-chem atmospheric-modelling atmospheric-chemistry scientific-computing plotting-in-python benchmarking python-toolkit

gcpy's People

Contributors

Stargazers

Watchers

gcpy's Issues

[BUG/ISSUE] Zonal mean plots with pressure subset are incorrect when y-axis flipped

Using the pressure subsetting option in compare_zonal_mean works correctly when the flip_ref and/or flip_dev are set to the default value (False). However, it seems that the plots are not correct if the vertical axis is flipped.

For example, here are concentration plots for Br2 mean full column (no subsetting) with the vertical axis of the Dev data flipped. This is comparing a GCC restart file with a GCHP restart file, so this flipping is necessary.

To hone in on the stratosphere I use the pressure subsetting for interval [1,100]. This is what is plotted:

I suspect the issue is the order of the subsetting and the flipping. I will investigate.

[BUG/ISSUE] Colorbar x-axis labels run into each other

The condition for switching from auto-binning of x-axis labels to forcing 4 bins in function compare_single_level is not adequate for ensuring the axis is readable. For example, the x-axis labels on this plot are very close together making it hard to read.

This is happening because the condition to switch to 4 bins is bound difference less than 0.001. However, this only catches a subset of instances where values have three digits beyond the decimal point.

[BUG/ISSUE] xbpch cannot open ND64 bpch diagnostic data

I am posting this issue on behalf of Jonathan Moch.

Describe the bug

I've been trying to implement the UVflux (ND-64) diagnostics as netcdf and have had some success (the values are not all zero!), but am having trouble now doing the comparison with the .bpch diagnostic.

When I put the .bpch output and relavant .dat files into the xpbch.open_bpchdataset function in python I keep getting a "KeyError."

I'm currently trying to rerun the test simulation with just the ND64 output to see if that helps, but if you have any suggestions on what to do I would appreciate it. The relevant output is at: /n/scratchlfs/jacob_lab/jmoch/geoE_output/geosfp_4x5_uvflxtest/OutputDir/run1/

And the full error message, which is a bit opaque to me, is below. It looks like maybe there is a missing name of the diagnostic? Was the UVFlux diagnostic not set up to be recognized in xpbch?

Error message

KeyError                                  Traceback (most recent call last)
/n/jacob_lab/Lab/python/miniconda/envs/jacob/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889             try:
-> 2890                 return self._engine.get_loc(key)
   2891             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'name'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-8-b267f079dc48> in <module>
      5     reftracerinfo = os.path.join(refdir,'tracerinfo.dat')
      6     refdiaginfo = os.path.join(refdir,'diaginfo.dat')
----> 7     refdata = xbpch.open_bpchdataset(refbpfile, tracerinfo_file=reftracerinfo, diaginfo_file=refdiaginfo)
      8 
      9 # Dev

/n/jacob_lab/Lab/python/miniconda/envs/jacob/lib/python3.6/site-packages/xbpch/core.py in open_bpchdataset(filename, fields, categories, tracerinfo_file, diaginfo_file, endian, decode_cf, memmap, dask, return_store)
     79         tracerinfo_file=tracerinfo_file,
     80         diaginfo_file=diaginfo_file, endian=endian,
---> 81         use_mmap=memmap, dask_delayed=dask
     82     )
     83     ds = xr.Dataset.load_store(store)

/n/jacob_lab/Lab/python/miniconda/envs/jacob/lib/python3.6/site-packages/xbpch/core.py in __init__(self, filename, fields, categories, fix_cf, mode, endian, diaginfo_file, tracerinfo_file, use_mmap, dask_delayed)
    278 
    279         # Parse the binary file and prepare to add variables to the DataStore
--> 280         self._bpch._read_var_data()
    281 
    282         # Create storage dicts for variables and attributes, to be used later

/n/jacob_lab/Lab/python/miniconda/envs/jacob/lib/python3.6/site-packages/xbpch/bpch.py in _read_var_data(self)
    312             var_attr['unit'] = unit
    313 
--> 314             vname = diag['name']
    315             fullname = category_name.strip() + "_" + vname
    316 

/n/jacob_lab/Lab/python/miniconda/envs/jacob/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2973             if self.columns.nlevels > 1:
   2974                 return self._getitem_multilevel(key)
-> 2975             indexer = self.columns.get_loc(key)
   2976             if is_integer(indexer):
   2977                 indexer = [indexer]

/n/jacob_lab/Lab/python/miniconda/envs/jacob/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2890                 return self._engine.get_loc(key)
   2891             except KeyError:
-> 2892                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2893         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2894         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: 'name'

[QUESTION] compare_zonal_mean not reading in Xarray.Dataset files?

For fields where I've used gcpy.open_mfdataset to open a list of files and then .mean(dim='time',keep_attrs=True) to get an annual average, gcpy.compare_zonal_mean is rejecting the fields saying they are not an axarray DataArray, even though the fields are definitely Xarray Datasets and gcpy.compare_single_level successfully loads the same exact files and makes figures with no problem.

The specific error I am getting is:

TypeError Traceback (most recent call last)
in
1 # Zonal mean - PDF
2 pdfname = os.path.join(plotsdir,'ZonalMean_{}_{}diag.pdf'.format(desc, comparison))
----> 3 gcpy.compare_zonal_mean( refdata, refstr, devdata, devstr, varlist=varlist, pdfname=pdfname )
4 #gcpy.add_bookmarks_to_pdf( pdfname, varlist, remove_prefix='SpeciesConc')

~/Py_notebooks/gcpy/gcpy/benchmark.py in compare_zonal_mean(refdata, refstr, devdata, devstr, varlist, itime, weightsdir, pdfname, cmpres, match_cbar, pres_range, normalize_by_area, enforce_units, flip_ref, flip_dev, use_cmap_RdBu)
1050 for i in range(6):
1051 regridder = refregridder_list[i]
-> 1052 ds_ref_cmp += regridder(ds_ref_reshaped[i])
1053 else:
1054 ds_ref_cmp = ds_ref

/n/jacob_lab/Lab/python/miniconda/envs/gcpy/lib/python3.6/site-packages/xesmf/frontend.py in call(self, a)
260 regrid_func = self.regrid_dataarray
261 else:
--> 262 raise TypeError("input must be numpy array or xarray DataArray!")
263
264 return regrid_func(a)

TypeError: input must be numpy array or xarray DataArray!

[FEATURE REQUEST] Add species class for GCPy to use generally?

I would be interested in whether GCST/GCPy users thought it would good to have an all-encompassing species class to used and accessible through GCPy?

There is one of these in AC_tools (class species in AC_tools/variables.py), which allows for you access various variables (such as the subset below). These species objects can then be passed to functions and when properties are needed (e.g. what mechanism a species is used within) can be retrieved.

specs2use = ['O3', 'ALD2', 'ACET']                                                                                                          
spec_objs = [AC.species(i) for i in specs2use]                                                                                              
prt_str = 'Name = {:<5}, RMM: {:.2f}, Drydep: {}, Wetdep: {}, Formula: {}, Ox: {}, LaTeX: {}'    
for spec in spec_objs: 
    print( prt_str.format( spec.name, spec.RMM, spec.Drydep, spec.Wetdep )  )     
                                                                                                                                         
Name = O3   , RMM: 48.00, Drydep: True, Wetdep: False, Formula: O3, Ox: 1.0, LaTeX: O$_{3}$
Name = ALD2 , RMM: 44.05, Drydep: True, Wetdep: True, Formula: CH3CHO, Ox: nan, LaTeX: Acetaldehyde
Name = ACET , RMM: 58.08, Drydep: True, Wetdep: False, Formula: CH3C(O)CH3, Ox: nan, LaTeX: CH$_{3}$C(O)CH$_{3}$

The properties are currently just generated from the GEOS-Chem wiki page on species.

I saw the related commit on the GEOS-Chem repository and was thinking about updating to use this file when it started being outputted.

[FEATURE REQUEST] Use multi-threading to speed up benchmark plotting

Creating GEOS-Chem output benchmark plots using the run_1mo_benchmark.py script (which calls routines in benchmark.py) can take an extremely long time. This is especially true for GCHP benchmark plots, which require regridding from cube-sphere to lon/lat.

It would be nice to use Python's multi-threading capabilities to e.g. create all of the PDF files for each category in parallel. This could drastically reduce the amount of time needed to generate benchmark plots.

[DISCUSSION] To-do list for distributing GCPy as a package

We will need to do the following before we can ship GCPy as a Python package:

Implement semantic versioning (i.e. version numbers X.Y.Z)
Make sure all *.py files are PEP-8 compliant by running pycodestyle on them
Add more examples & scripts -- solicit input from community
Expand the existing unit tests, and implement continuous integration on TravisCI
Add documentation on readthedocs -- right now we need to get ownership of that back from D. Rothenberg.
Push gcpy to conda or pip

Any others?

[FEATURE REQUEST] Pick a license. Any license

Believe it or not, by not picking and documenting what licences you are releasing the code under can cause people legal trouble. See:

https://blog.codinghorror.com/pick-a-license-any-license/

Sorry to bring it up, but once I saw that you do not have any license I cannot even really look at it...

[FEATURE REQUEST] Add code to generate plots & tables for 1-year (fullchem) benchmark simulations

Overview

As of this writing, the following plots & tables for the 1-year (fullchem) benchmarks are currently generated by IDL code. We would to add this functionality in Python so that we can retire IDL. As this work progresses we will update this issue in the comments section below.

Already implemented in GCPy:

Plots:

Species concentration by category
Emissions by category
J-values
Column AOD

Tables:

Emission totals table for 1-year benchmark simulations
Inventory totals table for 1-year benchmark simulations
Global masses at end of simulation

Still generated by IDL (to be eventually added into GCPy):

Plots

Model vs. Observations
Aerosol vertical profiles

Tables

Aerosol burdens
Global mean AOD
O3 STE (vertical flux across 100hPa)

Please see this issue about work that is being done to migrate the code for plotting results from the 1-year TransportTracers simulation from IDL to Python: #40

[BUG/ISSUE] Zeros shown as blue in cubed sphere comparison plots

When using gcpy benchmark function compare_single_level with cubed sphere output, 5 of 6 faces show up as blue. The southern pole face is white, as expected. This only impacts the difference plots and the fractional difference plot with dynamic range. The fractional difference plot with fixed range is fine. Plots on lat-lon grids are also fine.

[FEATURE REQUEST] Abstract plotting data onto a map or zonal mean into a routine that can be reused

In routines compute_single_level and compute_zonal_mean, we end up repeating much of the same code to create a single-level map or zonal-mean plot. As time allows, we should try to abstract this plotting code into routines that can be repeatedly called. This would also facilitate development of general-purpose plotting routines for users.

[BUG/ISSUE] Striping in fractional diff plots comparing cubed sphere output

When using gcpy function compare_single_level with cubed sphere output there appears to be horizontal striping in the fractional difference plot (lower left subplot). The striping does not appear in any of the other subplots. This seems like it may be an issue with either the calculation of the fractional difference or how python is plotting cubed sphere. The horizontal striping appears to only affect some of the faces, and the boundary between the striping and what looks like correct data is the vertical centerline of a face.

$gcpy_cs_fracdiff_striping$

[BUG/ISSUE] Empty or unused directories

Directory gcpy/obs is effectively empty since it contains file init.py with no text. Some other directories contain files that are not used anywhere atm_sci. These could be cleaned up to focus GCPy on functions actively used and supported.

[BUG/ISSUE] xarray open_mfdataset cannot open multiple GCHP diagnostic/restart files created with MAPL v1.0.0

Describe the bug
The xarray open_mfdataset function dies with an error when trying to read more than one file created by GCHP using the new MAPL v1.0.0 (i.e. in GCHP 12.5.0 and later versions).

To Reproduce

import xarray as xr
filelist = [ '/path/to/GCHP/file1.nc', '/path/to/GCHP/file2.nc']
ds = xr.open_mfdataset(filelist)

Expected behavior
The returned dataset should contain the merged data from e.g. file1.nc and file2.nc.

Screenshots
Instead, this error occurs:

  File "./run_1mo_benchmark.py", line 479, in <module>
    ds = xr.open_mfdataset([gchp_vs_gchp_refspc, gchp_vs_gchp_refaod])
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/backends/api.py", line 719, in open_mfdataset
    ids=ids)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 553, in _auto_combine
    data_vars=data_vars, coords=coords)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 475, in _combine_nd
    compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 493, in _auto_combine_all_along_first_dim
    data_vars, coords)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/combine.py", line 514, in _auto_combine_1d
    merged = merge(concatenated, compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 532, in merge
    variables, coord_names, dims = merge_core(dict_like_objects, compat, join)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 451, in merge_core
    variables = merge_variables(expanded, priority_vars, compat=compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 170, in merge_variables
    merged[name] = unique_variable(name, var_list, compat)
  File "/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/lib/python3.6/site-packages/xarray/core/merge.py", line 90, in unique_variable
    % (name, out, var))
xarray.core.merge.MergeError: conflicting values for variable 'anchor' on objects to be combined:
first value: <xarray.Variable (nf: 6, ncontact: 4)>
dask.array<shape=(6, 4, 4), dtype=int32, chunksize=(6, 4, 4)>
Attributes:
    long_name:  anchor point
second value: <xarray.Variable (nf: 6, ncontact: 4)>
dask.array<shape=(6, 4, 4), dtype=int32, chunksize=(6, 4, 4)>
Attributes:
    long_name:  anchor point

Required information:

OS: CentOS 7
python 3.6.9
xarray 0.12.1
netcdf-fortran 4.4.4
netcdf4 1.4.2
dask 2.3.0
dask-core 2.3.0
numpy 1.16.4
numpy-base 1.16.4
scipy 1.3.1

Additional context
The problem seems to be caused by a single variable called "anchor".

[QUESTION] Changing y axis on gcpy.compare_zonal_mean

For the gcpy.compare_zonal_mean the y axis is hPa, which ends up being an issue if you want to look at stratospheric changes and also gives you a distorted view of the troposphere. Is there a way to change the y-axis to log(pressure) or to km? That would give a better vertical profile where thing's are so compressed for the upper troposphere and above.

[BUG/ISSUE] compare_zonal_mean flip_ref and flip_dev not working as expected

When comparing GCHP internal state output (checkpoint file) I use flip_ref and flip_dev to flip the vertical axis for zonal mean plots using compare_zonal_mean. It appears that it has no effect, however, and the plots show concentrations that are incorrect with respect to the y-axis.

For example, here is the correctly plotted SpeciesConc collection zonal mean for H2O, which does not require flipping.

In contrast, here is the internal restart file zonal mean for H2O:

To reproduce the issue, use compare_zonal_mean function with GCHP checkpoint files (gcchem*) and pass arguments flip_ref=True and flip_dev=True in one case, and default values (False) in another. If the resulting plots are the same then flip_ref and flip_dev do not work as expected.

Expected behavior: passing flip_ref and flip_dev as True to compare_zonal_mean should result in vertically flipping relative to not passing them. The zonal mean should resemble the SpeciesConc collection zonal mean.

This issue has gone under the radar since we do not benchmark GCHP restart files and the only History collection that is vertically flipped is emissions which we do not generate zonal mean plots for.

[BUG/ISSUE] Benchmark plotting functions fail if cmpres passed as cubed sphere

The comparison resolution argument cmpres that is passed to benchmark.py function compare_single_level is supposed to be able to be either cubed sphere or lat-lon. However, if it is cubed sphere then the function fails due to a type error. This is because cmpres is passed as string but the function expects it to be integer if cubed sphere in certain places. All cubed sphere comparisons to date have not passed cmpres explicitly, and hence this bug has not been previously detected. The quick fix is to cast cmpres to integer if cubed sphere and passed explicitly.

A related issue occurrs in benchmark.py function compare_zonal_mean. If cmpres is passed as cubed sphere then the function fails with a type error. For zonal mean the comparison resolution should never be cubed sphere but this is not enforced. The quick fix is to override the passed cubed sphere resolution with lat-lon '1x1.25'.

[BUG/ISSUE] All zero plots appear as yellow with compare_single_level

If comparing datasets with all zero values using compare_single_level, the ref and dev plots (top row) are all yellow. This can be confusing when flipping through plots since 0 is supposed to correspond to the color white.

[BUG/ISSUE] Issue in gcpy.compare_single_level: Top title is not printed

I was using gcpy.compare_single_level to compare output between GCC and GCHP runs. The data variable I am using is Met_MODISLAI (which is only xy dimension). The data variables are shown here:

Ref (GCC)
<xarray.Dataset>
Dimensions:       (lat: 91, lon: 144, time: 1)
Coordinates:
  * time          (time) datetime64[ns] 2016-01-15T01:00:00
  * lon           (lon) float64 -180.0 -177.5 -175.0 ... 172.5 175.0 177.5
  * lat           (lat) float64 -89.5 -88.0 -86.0 -84.0 ... 84.0 86.0 88.0 89.5
Data variables:
    Met_MODISLAI  (time, lat, lon) float32 ...

Dev
<xarray.Dataset>
Dimensions:       (lat: 288, lon: 48, time: 1)
Coordinates:
  * lon           (lon) float64 1.0 2.0 3.0 4.0 5.0 ... 44.0 45.0 46.0 47.0 48.0
  * lat           (lat) float64 1.0 2.0 3.0 4.0 5.0 ... 285.0 286.0 287.0 288.0
  * time          (time) datetime64[ns] 2016-01-15T01:00:00
Data variables:
    Met_MODISLAI  (time, lat, lon) float32 ...

And my calling sequence is:

#!/usr/bin/env python

# Imports
import os
import xarray as xr
import gcpy

# Ref dataset
refstr  = '12.3.0 GCC'
refdir  = '~/LT/12.3.0/geosfp_2x25_TransportTracers/OutputDir'
reffile = 'GEOSChem.StateMet_inst.20160115_0100z.nc4'
refpath = os.path.join(refdir, reffile)
refdata = xr.open_dataset(refpath)

# Dev dataset
devstr  = '12.3.0 GCHP'
devdir  = '~/SP/if17/gchp_TransportTracers/OutputDir'
devfile = 'GCHP.StateMet_inst.20160115_0100z.nc4'
devpath = os.path.join(devdir, devfile)
devdata = xr.open_dataset(devpath)

varlist = ['Met_MODISLAI']

# Variables to plot

# PDF filename
pdfname = '/n/scratchlfs/jacob_lab/ryantosca/GC/rundirs/plots/pdf/MODIS_LAI_surface_nc_diag.pdf'

# Compare data
gcpy.compare_single_level(refdata, refstr, devdata, devstr,
                          varlist=varlist, pdfname=pdfname)

It produces the PDF, but for some reason the top title is not printed. The echoback to stdout is:

Reuse existing file: ./conservative_2x2.5_1x1.25.nc
/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/ ib/python3.6/site-packages/xesmf/backend.py:36: UserWarning: Input array is not F_CONTIGUUS. Will affect performance.
  warnings.warn("Input array is not F_CONTIGUOUS. "
Reuse existing file: ./conservative_c48_1x1.25_0.nc
Reuse existing file: ./conservative_c48_1x1.25_1.nc
Reuse existing file: ./conservative_c48_1x1.25_2.nc
Reuse existing file: ./conservative_c48_1x1.25_3.nc
Reuse existing file: ./conservative_c48_1x1.25_4.nc
Reuse existing file: ./conservative_c48_1x1.25_5.nc

Creating /n/scratchlfs/jacob_lab/ryantosca/GC/rundirs/plots/pdf/MODIS_LAI_surface_nc_diagpdf for 1 variables
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-ryantosca'
/net/seasasfs02/srv/export/seasasfs02/share_root/ryantosca/python/geo/miniconda/envs/geo/ib/python3.6/site-packages/xarray/core/dataarray.py:507: FutureWarning: elementwise compaison failed; returning scalar instead, but in the future will perform elementwise comparion
  return key in self.data
0 Incorrect dimensions for Met_MODISLAI!

See attached file below.

I believe that the top title is computed in this block:

        if 'lev' in refdata[varname].dims and 'lev' in devdata[varname].dims:
            if ilev == 0: levstr = 'Surface'
            elif ilev == 22: levstr = '500 hPa'
            else: levstr = 'Level ' +  str(ilev-1)
            figs.suptitle('{}, {}'.format(varname,levstr), fontsize=fontsize, y=offset)
        elif 'lat' in refdata[varname].dims and 'lat' in devdata[varname] and 'lon' in refdata[varname].dims and 'lon' in devdata[varname]: 
            figs.suptitle('{}'.format(varname), fontsize=fontsize, y=offset)
        else:
            print('Incorrect dimensions for {}!'.format(varname))

So I am not sure what is going on here.

[BUG/ISSUE] run_1mo_benchmark.py dies with an error if we restrict the types of plots that are created

Describe the bug
If we restrict the types of benchmark plots that are created (which we often do for testing purposes), then the run_1mo_benchmark.py script will die with an error:

  File "./run_1mo_benchmark.py", line 267, in <module>
     sigdiff_files=gcc_vs_gcc_sigdiff)
  File "/n/home09/ryantosca/GC/python/gcpy/gcpy/benchmark.py", line 3098, in make_benchmark_conc_plots
    for v in diff_sfc:
UnboundLocalError: local variable 'diff_sfc' referenced before assignment

To Reproduce
Add the plots=['zonalmean'] keyword to make_benchmark_conc_plots:

if gcc_vs_gcc:

    if plot_conc:
        # Concentration plots
        # (includes lumped species and separates by category)
        print('%%% Creating GCC vs. GCC concentration plots %%%')
        bmk.make_benchmark_conc_plots(gcc_vs_gcc_refspc,
                                      gcc_vs_gcc_refstr,
                                      gcc_vs_gcc_devspc,
                                      gcc_vs_gcc_devstr,
                                      dst=gcc_vs_gcc_plotsdir,
                                      overwrite=True,
                                      restrict_cats=['Oxidants'],
                                      plots=['zonalmean'],
                                      sigdiff_files=gcc_vs_gcc_sigdiff)

Expected behavior
The code should create all benchmark plots and print the list of significant differences.

Potential solution
This could be easily fixed by not trying to update the list of significant differences for e.g. surface plots when we are only creating zonal mean plots. A few if statements can resolve this.

[FEATURE REQUEST] Global mass table

The GCSC has requested global mass tables for all species as part of the benchmark output. These values can be computed from the restart file output at the end of the simulation.

[FEATURE REQUEST] Add percent change to global mass tables

Daniel Jacob has requested that we add a percent change to the benchmark global mass table output. This will involve a little bit of reformatting in such a way the we change the global mass printout but not the emissions totals printout.

[FEATURE REQUEST] Include usage example in README

The GCPy README does not include how to use the package. Adding this info, such as an example that goes through import to comparing two datasets, would be useful.

[BUG/ISSUE] Colorbar ticks are incorrectly placed if the plot is zero or undefined everywhere.

Describe the bug

It seems that newer versions of Matplotlib (since 3.0 or 3.1) cause colorbar ticks to be placed incorrectly for plots that are zero everywhere or undefined everywhere

This is probably caused in the normalization of colors in the from the min and max of the data range to the matplotlib color scale.

[BUG/ISSUE] CircleCI code not used

GCPy contains code for CircleCI but CircleCI is not currently supported. This code should be removed.

[BUG/ISSUE] zonal mean y-axis not flipped after update to matplotlib 3.1.0

Starting in matplotlib 3.1.0 the zonal mean vertical axis is not flipped despite the call to invert_yaxis. This appears to be a quirk of the new version of matplotlib where set_yscale negates any previous call to invert the y-axis. Moving invert_yaxis to after set_yscale corrects the issue.

[FEATURE REQUEST] Store constants used in GCPy in file constants.py

Constants used in GCPy are not all in file constants.py, such as values returned by function skip_these_vars(). The package would be improved by keeping all constants in that file for usage elsewhere.

[FEATURE REQUEST] Benchmark plot change request from Stratospheric Working Group

Susan Strahan, Co-Chair of GEOS-Chem Stratospheric Working Group writes:
"Dylan, Seb, and I are happy with those plots EXCEPT they’ve got to be in log-pressure and run from about 100 to 1 hPa. So this is the only change to benchmarks we are recommending at this time. Only the plots made for the stratosphere need to change."

[BUG/ISSUE] Fixed issues when reading GCHP files with MAPL v1.0.0

Description

Data variables from GCHP files created with MAPL v1.0.0 have 5 dimensions (time, lev, nf, Ydim, Xdim) -- where nf is the number of faces and Xdim = Ydim = number of points along the cube face.
This caused several issues with the prior GCPy code, which was developed to work with GCHP output from the prior version of MAPL. These output files contained 4 dimensions: (time, lev, 6*Ydim, Xdim).

In commit c7b1177, we have added several fixes in for these issues. We also skip reading a few data variables ("anchor", "ncontact", "orientation") from GCHP output created with MAPL v1.0.0, so as to avoid issues with GCPy and xarray.

[QUESTION] Trouble setting up gcpy on Mac

Hi everyone,
I'm trying to setup gcpy on my Mac. I have anaconda3 installed. I downloaded gcpy from git and put it under my Desktop. I used this command to create the environment:
conda env create -vv -n gcpy gcpy/docs/environment.yml.
However, I got the error below:

I was wondering if this is the right way to set up gcpy?
Does anybody have similar issues?

Jun Meng

[BUG/ISSUE] Matplotlib function set_clim will be deprecated in Matplotlib 3.3

Using the benchmark plotting functions with the currently recommended package configuration gives the following warning about a Matplotlib function that will be deprecated in a future version:

The set_clim function was deprecated in Matplotlib 3.1 and will be removed in 3.3. Use ScalarMappable.set_clim instead. [benchmark.py:1714]

This is not a problem right now but could become a problem in the future when Matplotlib is updated. We should switch to the recommended function before Matplotlib releases 3.3.

[FEATURE REQUEST] Return dictionary from core.compare_varnames

Current core.compare_varnames outputs 6 lists:

    Returns:
    --------
        commonvars: list of strs
            Variables that are common to both refdata and devdata,
            regardless of dimension.

        commonvarsOther: list of strs
            Variables that are common to refdata and devdata,
            and that do not have lat, lon, and level dimensions.

        commonvars2D: list of strs
            Variables that are common to refdata and devdata,
            and that have lat and lon dimensions, but not level.

        commonvars3D: list of strs
            Variables that are common to refdata and devdata,
            and that have lat, lon, and level dimensions.

        refonly: list of strs
            Plottable variables (i.e. 2D or 3D) that are only
            present in the Ref dataset.

        devonly: list of strs
            Plottable variables (i.e. 2D or 3D) that are only
            present in the Dev dataset.

This can be difficult to use because you need to know exactly what each one is. Changing the output will also break existing code (just happened to me with the latest update).

It would be great if the output was instead a single dictionary with six key and value pairs (key = string description, value = list of names). This would allow expanding the output without breaking existing code and also make the output easier to understand and use.

[FEATURE REQUEST] CS to CS regridding in gcpy

When comparing cubed sphere data sets at different resolutions using compare_single_level I ran into this error:

Traceback (most recent call last):
  File "/n/home08/elundgren/bin/compare_var.py", line 20, in <module>
    gcpy.compare_single_level(ds1, desc1, ds2, desc2, varlist=varlist, weightsdir=weightsdir)
  File "/n/home08/elundgren/GC/python/gcpy/gcpy/benchmark.py", line 294, in compare_single_level
    refregridder_list = make_regridder_C2L(refres, cmpres, weightsdir=weightsdir, reuse_weights=True)
  File "/n/home08/elundgren/GC/python/gcpy/gcpy/grid/regrid.py", line 16, in make_regridder_C2L
    llgrid = make_grid_LL(llres_out)
  File "/n/home08/elundgren/GC/python/gcpy/gcpy/grid/horiz.py", line 12, in make_grid_LL
    [dlat,dlon] = list(map(float, llres.split('x')))
AttributeError: 'int' object has no attribute 'split'

I traced this to the logic in compare_single_level that determines what regridder to make. The logic is written such that if ref or dev is CS, and the comparison grid is different from the native, a CS to lat-lon regridder is created. However, if both ref and dev are cubed sphere the comparison grid is also cubed sphere, so the CS to lat-lon regridder will not work. Unless argument cmpres is passed such that it is explicitly set as lat-lon, e.g. cmpres='1x1'.

In summary there are two problems: (1) the CS to CS regridding is missing from gcpy (make_regridder_C2C does not exist), and (2) the logic that would call for CS to CS regridding is missing and instead a CS to lat-lon regridder is used, resulting in an error.

To address the second problem I pushed an update to return with an explanatory error message if CS to CS regridding is called for. See 4a2094d.

ERROR: CS to CS regridding is not yet implemented in gcpy. Ref and dev cubed sphere grids must be the same resolution, or pass cmpres to compare_single_level as a lat-lon grid resolution.

However, this should not be necessary. CS to CS regridding needs to be implemented into gcpy.

[BUG/ISSUE] Global mass tables not recognizing new species as missing in reference version

The global mass tables in GCPy are not properly recognizing when species are missing from a version. For example, in comparisons of dev/12.7.0 with GEOS-Chem 12.6.0, new species ETNO3, MENO3, IPRNO3, and NPRNO3 are all listed with the values of the previous species in the Ref column when they should be NaN.

Here is the original message from Jenny Fisher:

I don’t think the global mass tables are pointing to the right Ref version as they show values for all the new species in the 12.6.0 version.

Here is the problematic text in the global mass tables:

###############################################################################
### Global mass (Gg) at end of simulation (Trop only)                       ###
### Ref = GC_12.6.0; Dev = GC_12.7.0_SmallAlkylNitrateChem                  ###
###############################################################################
                              Ref                 Dev      Dev - Ref    % diff
ETHLN        :           0.396240            0.395432      -0.000808    -0.204
ETNO3        :           0.396240           19.799126      19.402885  4896.751
...
IPMN         :          97.114059           96.601143      -0.512917    -0.528
IPRNO3       :          97.114059           26.614943     -70.499115   -72.594
...
MEK          :        2037.818237         2031.013916      -6.804321    -0.334
MENO3        :        2037.818237           48.085583   -1989.732666   -97.640
...
NPMN         :           0.003098            0.003046      -0.000051    -1.661
NPRNO3       :           0.003098            6.418084       6.414987  207093.115
...

[FEATURE REQUEST] 1-year Transport Tracer benchmark

The GEOS-Chem Transport Tracer benchmark code is currently written in IDL and stored at https://bitbucket.org/gcst/gc_1yr_rnpbbepasv_benchmark. The same functionality should be added to GCPy as needed to retire the dependency on IDL.

[DISCUSSION] Defining the scope & use cases of GCPy in README

Motivation

In README or somewhere there should be a list/table of what GCPy is intended to do (as well as NOT indented to do). One apparent question a new user will have is "in what cases should I download & learn GCPy?". Having a clear answer to that will bring many benefits:

By narrowing down the scope of GCPy, we can out-source many common tasks to the upstream xarray/numpy/scipy. This avoids reinventing the wheel and reduces development burden.
We can worry less about GCPy's API stability and backward compatibility. A major issue for new packages like GCPy is that the APIs can change frequently from version to version (often for good reasons like better efficiency or more functionalities). On the other hand, upstream packages have much more stable APIs. By having a small number of APIs, users will feel less disturbed by version updates.
Some general-purpose functionalities can be pushed upstream to xarray so they can sustain longer and benefit a broader community. There are a lot of model-specific packages (for example NCAR's esmlab https://github.com/NCAR/esmlab), but I feel that many functionalities are duplicating. For example, the code handling cubed-sphere might be easily adapted for other multi-tile grids -- this is a very common need in Earth science models (e.g. xgcm also has a notion of multi-tile grid)

Examples

Some apparent GCPy use cases:

Generate the standard benchmark plots
Obtain GEOS-Chem's horizontal/vertical grid information
GCHP-specific regridding functionalities

Some apparent non-GCPy use cases:

Modify a NetCDF file (crop a domain, extract some variables)
Simple 2D plotting on lat-lon grid, or 1D plotting against time
All statistical analysis tools (should be done in SciPy/sklearn)

[FEATURE REQUEST] Add OH metrics output for benchmarks

Reproduce the GAMAP metrics output in GCPy so that we do not need to generate with IDL. Ideally, we would compute and compare global mean OH, MCF lifetime, and CH4 lifetime.

[QUESTION] Trouble running gcpy

I ran into some trouble running gcpy on my Mac using Anaconda python:

I created the gcpy conda environment using the gcpy environment.yml file
I installed the gcpy directory using the command:
git clone https://github.com/geoschem/gcpy.git gcpy
in my directory '/Users/psk9/Desktop/GC'

I then ran my python gcpy analysis script from the directory '/Users/psk9/Desktop/GC'

The first problem I ran into was with the 'import xbpch' command:
Traceback (most recent call last): File "../quick_plots.py", line 29, in <module> import xbpch File "/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/__init__.py", line 7, in <module> from . bpch import BPCHFile File "/anaconda3/envs/gcpy/lib/python3.6/site-packages/xbpch/bpch.py", line 11, in <module> from xarray.core.pycompat import OrderedDict ImportError: cannot import name 'OrderedDict'

I fixed this by downgrading to xarray 11.0.3 with the command:
conda install xarray=0.11.3
in the gcpy environment

The next problem i ran into was the following:

Traceback (most recent call last): File "quick_plots.py", line 236, in <module> gcpy.compare_single_level( refdata, refstr, devdata, devstr, ilev=0, varlist=varlist) AttributeError: module 'gcpy' has no attribute 'compare_single_level'

To avoid this error and run successfully I had to set the PYTHON path variable
in my .bash_profile as follows:
export PYTHONPATH=":/Users/psk9/Desktop/GC/gcpy"

[BUG/ISSUE] Restricted range differences incorrectly all zero in some emissions column sum plots

In some emissions column sum benchmark plots that compare cubed sphere data, the restricted difference plots (middle row, right column) show all zeros when the dynamic range difference subplot and fractional difference subplot show non-zeros differences. This appears to be a bug in gcpy. The colorbar range is [-0.1,0.1], which usually indicates NaN in the array trying to be plotted. This needs further investigation.

[BUG/ISSUE] Zonal mean log P plots show 0 instead of 0.1, 0.01 for full column

When I plot zonal mean for the full column and enable log-scale for the y-axis the labels are cut off for 0.1 and 0.01. For example:

[BUG/ISSUE] Benchmark plots omit comparisons unless species/emissions are found in both Ref and Dev

When creating the benchmark plots, species and/or emissions are only plotted if they found in both Ref and Dev datasets.

For example if a species is present in Ref, but removed from Dev, then the benchmark code will totally skip that comparison. This has caused some confusion among GCSC members who are examining the benchmark results.

What we should do is to set the missing species to zero and then force the comparison.

[QUESTION] gcbm does not have plot_layer

Right now I keep encountering an attribution error that gcpy.benchmark (gcbm) does not have plot_layer. I'm working through the first tutorial of benchmark_design.ipynb--does gcbm no longer have the plotting function?

[BUG/ISSUE] Allow comparisons of CS output with different resolutions

When comparing CS output at two different resolution, GCPy will crash with "ERROR: CS to CS regridding is not yet implemented in gcpy. Ref and dev cubed sphere grids must be the same resolution, or pass cmpres to compare_single_level as a lat-lon grid resolution." To get around this, we will set the default comparison grid to 1x1.25 lat-lon when comparing two different CS resolutions.

[FEATURE REQUEST] Create a function to print out version numbers of GCPy dependent packages

Description:

The xarray package contains a function xr.show_versions(), which lists the version numbers of all dependent packages. This can be useful for debugging.

I propose that we make a similar function gcpy.show_versions(). We can ask GCPy users to cut-n-paste us the output of show_versions when they open issues.

[BUG/ISSUE] Display of all zeros in colorbar

We have had two different ways of conveying that all displayed values are zero. One is to show multiple zeroes spanning the entire colorbar.

Another is to show a single zero in the middle with the bounds of the colorbar unlabeled (current).

It would be good to get some feedback on preference. I prefer multiple zeroes since then there is no doubt about the range, but others may disagree.

[BUG/ISSUE] gcpy.open_mfdataset won't open .nc4 data

gcpy.open_mfdataset won't open .nc4 data. Instead it says it needs ".nc"

I think this is a bug since the default GCHP output is now .nc4.

The errors says: "ValueError: Found unknown file extension (.nc4); please pass a BPCH or netCDF file with extension 'bpch' or 'nc'."

geoschem / gcpy Goto Github PK

gcpy's People

Contributors

Stargazers

Watchers

Forkers

gcpy's Issues

Describe the bug

Error message

Overview

Already implemented in GCPy:

Plots:

Tables:

Still generated by IDL (to be eventually added into GCPy):

Plots

Tables

Related

Describe the bug

Description

Motivation

Examples

Description:

Recommend Projects

Recommend Topics

Recommend Org