jamal919 / pycaz Goto Github PK

Collection of functions for data analysis, model input preparation, post-processing, analysis.

License: Apache License 2.0

Python 68.69% Fortran 1.38% Jupyter Notebook 29.93%

toolbox utility preprocessing postprocessing

pycaz's Introduction

pycaz - a collection of analysis functions

pycaz, a play on sounds on package, is a python package gathering the analysis tools developed during modelling with SCHISM, but now expanded to contains all sorts of analysis and modelling functionalities. The original package was called (vary unoriginally) pyschism. However following the development of the toolbox by the SCHISM/NOAA called pyschism, the name is released for their use. The name pycaz comes from combination of py (short of python) and caz (from Bengali word "কাজ", which means work).

Being a collection of different types of analysis, pre- and post-processing methods, some modules follows a more object oriented pattern, and some follows a more functional/procedural pattern. The documentation of the modules are not yet well-developed, and will be incorporated in a best-effort basis.

Installation

For best experience, the package should be installed in a conda environment, which can be obtained through any conda distribution such a anaconda or miniconda.

conda create -n pycaz -c conda-forge python numpy scipy matplotlib xarray netcdf4 utide cmocean rioxarray tqdm ipykernel pyproj cartopy geopandas shapely jupyterlab jupyter notebook

Then the toolbox can be installed using -

conda activate pycaz
pip install .

Contact

Feel free to use the scripts and if you find any bug report in the repo issues.

pycaz's People

Contributors

Stargazers

Watchers

Forkers

zjlucky nushrat-aa

pycaz's Issues

:bulb: Adds functionalities to mask a grid

For the Hgrid element, it would be nice to have a masking feature. This feature is particularly useful when plotting, for example masking the land area. In tri-plot, simply by setting nan value does not work well, and only option seems to be applying a mask.

matplotlib.tri module seems to have some relevant functionalities.

:bulb: Tidefac object

For SCHISM (as well as ADCIRC), the tidal boundary is setup using a list of frequencies. These frequencies are naive, e.g., no idea about the time. The reference time information is provided through the equilibrium argument which is typically computed through tide_fac.f. tide_fac.f program also computes the nodal factors according to the start time and total runtime.

One common way is to use this tide_fac program to compute the nodal arguments, etc., and update a pre-existing bctides.in. Another way is to compute the nodal argument locally, for example using utide, and update the aguments. For any of these cases, it would be nice to have a Tidefac object which contains the computed nodal factors and equilibrium argument.

In addition to that, a method to read output from the tide_fac program could be also useful.

:bug: Undesired behaviour for Hgrid.subset_nodes

Currently the Hgrid.subset_nodes() is implemented as following, which has an undesired behaviour -

pycaz/pycaz/schism/hgrid.py

Lines 92 to 99 in 85a0b76

 def subset_nodes(self, nodeid:np.ndarray): 

 # check the nodeid is 1-based index 

 try: 

 assert np.all(nodeid >= 1) 

 except: 

 raise AssertionError('Node ids must start from 1') 

 return self.xy[np.isin(self.nodeid, nodeid)]

Here, a logical selection is used based on if the index is present in the dataset or not. This results in a ordered indexing of the asked nodes. For example, if nodeid=[1, 4, 3] is passed, it will return the xy in the order of [1, 3, 4]. However, more often then not the nodes are required at their asked ordering, e.g., [1, 4, 3]. Additionally, if a not is out of bound, it will not raise any error under current implementation, which is a misfortune.

The proper implementation should be a direction slicing, like self.xy[nodeid-1, :]. Why -1? Because in hgrid convention the indexing starts from 1, not 0 (python indexing).

:wastebasket: pyschism codebase is depreciated

After the introduction of the pyschism toolbox by schism-dev, the functionality of this toolbox is mostly available through the official toolbox. jamal919/pyschism repository also releases the pip repo for the schism-dev/pyschism. The logical step at this point is to pivot the toolbox.

I have come up with an idea of rewriting the various parts of the toolbox, particularly the analysis in a functional pattern to be able to continue using the toolbox. Hence, to revamp it is necessary to have a new name, structure of the functions.

🧐 HYCOM data

Dr. Wu, in SCHISM group, posted a script for acquiring HYCOM data from hycom's DODs server. It would be nice to be implement similar downloader in the webdata module.

https://github.com/wenfanwu/get_hycom_online/blob/main/get_hycom_online.m

🐛 Can not read JTWC deck files containing transitions

JTWC deck files often contains transitions, for example in cyclone Ana (https://www.emc.ncep.noaa.gov/gc_wmb/vxt/DECKS/bsh072022.dat).
The additional fields in the transition line looks something like this - "0, , 0, 0, 0, 0, TRANSITIONED, shC32022 to sh072022, "

The current reader pycaz.cyclone.jtwc.read_jtwc can not read such file.

:lady_beetle: xarray fails with downloading GFS data

Xarray DAP access seems to be failing, for not reproducible reason with an "OSError [Errno -73] NetCDF: Malinformed or inaccessible DAP2 DATADDS or DAP4 DAP response". Running the script multiple times, often fix it, but that is a hit or miss. For example, during writing this issue, I had run the script 5 times to get it work finally!

Previously I have experimented with pydap to download the data, and my experience is it always works, or throws Error that are legible. However, one issue with pydap is I could not find a way to download a subset, rather I had to download the full data and make a subset afterwards. Needs further investigation, and potentially a backup backend function based on pydap to download the data if xarry fails a preselected (say 3) times.

:bulb: Direct interface to extract nodes data in Gr3/Hgrid

Currently in Gr3/Hgrid, the nodes can be accessed through gr3.nodes property. While this can take a slicer, the access is not really nice. More access option is necessary to access only the data, access the lon-lat positions etc.

:bulb: nml update feature

Namelist file handing, nml update, modify feature using f90nml toilered for using with SCHISM, WWM.

:memo: Notebook for webdata

Demonstration notebook for gfs.py and hwrf.py classes, to download web data.

:bug: Track file with empty lines

Any JTWC track file with empty line throws an error saying -

Traceback (most recent call last):
  File "build_sflux.py", line 24, in <module>
    track = read_jtwc('track_rsmc.csv')
  File "/home/khan/MEGA/Codes/pyschism/schism/io.py", line 76, in read_jtwc
    ncyclone = int(fields[1].strip())
IndexError: list index out of range

Possible fixes are checking for empty lines and exclude them.

:bulb: extent method in Hgrid/Gr3 object

As the extent of the Gr3/Hgrid is often needed to data preparation, it would be nice to have an extent method. The signature could be the following -

def extent(self, buffer: int) -> tuple/named tuple of  (East, West, South, North)

:recycle: Break data.py into individual classes

Current data.py provides functionalities to download data from NOAA-GFS DAP server, and NOAA-HWRF model website. These two classes are very different and needs separate location.

In the spirit of updating the code base, it would be nice to break this file into two separate classes, and save them into a module called webdata. Each file inside can handle scraping data from a single source.

The NOAA-HWRF output can be moved to a file called hwrf.py, and GFS can be moved to gfs.py.

:bug: Handling 0 radial info value

On JTWC, radial info values can be zero if the radial info is not defined, or the part of the radial quadrant is on land (set to zero). Current implementation of fradinfo interpolates over theta. Where the radinfo becomes 0, radinfo becomes 0, and thus rmax calculation raises issues.

Fix could be selected interpolation by filling the gaps with non-zero mean value.

Affected file cyclone.py

:bulb: Gr3/Hgrid data object

The current implementation of Gr3 and Hgrid is currently sort of complicated. They are implemented as classes, not data classes. There should be better option to access these datasets, particularly more applicable for functional approach.

:bulb: Add Land/Ocean boundary generation feature

In Hgrid, it would be nice to have a Land/Ocean boundary generation feature based on the depth value. Currently the options to generate the boundary segments are either through SMS/ADCIRC during mesh generation, or using xmgredit after mesh generation. matplotlib.tri module could be useful for this purpose.

:memo: Update readme file

Readme file needs to be updated before Version 1 released.

:recycle: Improve the import statements

Currently the import statements import local functions as local files, e.g., .module notation. It is better to update it so that the imports are typical python module structure, e.g., from pycaz.tide import *

:bulb: Altimetry related features

Handling altimetry related data handling got a bit difficult due to various data format changes, e.g., reprocessing and introduction of groups inside netCDF files See here for Jason-3. Particularly I felt it during handling Sentinel-6 data.

Sentinel-3 data is also now available through EUMETSAT data store. Notebooks are needed to explore those too.

:bulb: Organise the scripts into the library

The scripts folder contains multiple self-contained files which does only one thing. To kick-start the project, it would be ideal to first organise these scripts into the library itself. The scripts may serve the checkpoints for further developments, or testing units.

To extend the idea further, the scripts might be transformed into notebooks for demonstration purposes.

:bulb: Bctides data object and functionalities for SCHISM

bctides is the boundary condition file used in SCHISM. It is a mandatory input file.

Similar to the dictionary like object used for Hgrid and Gr3, a similar object structure with helper functions, and useful built-in methods can be added to the library.

Following are the essential feature needed.

Bctides data object
read_bctides function
update_bctides function

:bug: Sflux object

SCHISM takes meteorological inputs as so called sflux files.

Sflux has a predefined structure, and a predefined way of naming. The objective of the sflux object would be the following -

Provide a interface for creating 3 types of sflux object : air, prc, and rad
They will hold basic sflux related information, but will not process data
Will provide method for writing the data into files
Also will provide method to write the sflux_inputs.txt

Current schism/sflux.py is supposed to provide these functionalities, but currently it is not working due to a missing field called 'grid'. Needs re-implementation.

:bug: Missing import in schism.py

Several imports are missing in schism.py. They are following -

datetime and timedelta from datetime
pandas
netcdf4

:bulb: Tide filters

Tide filters are convolving array for removing tidal signal from hourly timeseries. These are quite efficient and used by organizations like PSMSL and SONEL to remove tide and compute monthly and yearly means. It would be nice to have these filters into the package.

:bug: Bctides class is initialised with empty list

Bctides class in the old implementatino was initialised with empty list values. It is an extremely dangerous practice, and should never be used. As empty list which is passed as an argument has a pre-defined reference, which means, if the read command is called twice or more time it will append to the OLD referenced empty list (which is not empty after the first call). Should be updated, either by totally removing such impelemtation, or using a copy.deepcopy command when assigning to the class variables.

pycaz/scripts/pre_adapt_tidefac.py

Line 15 in 85a0b76

 def __init__(self, info='', ntip=0, tip_dp=0, tip=[], nbfr=0, bfr=[], nope=0, boundaries=[]): 

:recycle: Reorganize the tide modules

A dedicated tide module is needed to take into account multi-dimensional functionalities - solving, interpolate, reconstruction etc. The current folder also includes multiple unnecessary files - like schism.py etc that needs to be removed.

	def subset_nodes(self, nodeid:np.ndarray):
	# check the nodeid is 1-based index
	try:
	assert np.all(nodeid >= 1)
	except:
	raise AssertionError('Node ids must start from 1')

	return self.xy[np.isin(self.nodeid, nodeid)]