Giter Club home page Giter Club logo

gutils's Introduction

🚤 Glider Utilities (GUTILS)

license GitHub release

🐍 + 🌊 + 🚤

A python framework for working with the data from Autonomous Underwater Vehicles (AUVs)

Supports:

  • Teledyne Webb Slocum Gliders

The main concept is to break the data down from each deployment of glider into different states:

  • Raw / Binary data
    • Slocum: rt (.tbd, .sbd, .mbd, and .nbd) and delayed (.ebd and .dbd)
  • ASCII data
    • Using tools provided by vendors and/or python code, an ASCII representation of the dataset should be able to be analyzed using open tools and software libraries. GUTILS provides functions to convert Raw/Binary data into an ASCII representation on disk.
  • Standardized DataFrame
    • Once in an ASCII representation, GUTILS provides methods to standardize the ASCII data into a pandas DataFrame format with well-known column names and metadata. All analysis and computations are done in the pandas ecosystem at this stage, such as computing profiles and other variables based on the data. This in an in-memory state.
  • NetCDF
    • After analysis and computations are complete, GUTILS can serialize the DataFrame to a netCDF file format that is compatible with the IOOS Glider DAC profile netCDF format. GUTILS provides metadata templates to make sure metadata is captured correctly the output netCDF files.

Resources

Installation

GUTILS is available as a python library through conda and was designed for Python 3.8+.

$ conda create -n gutils python=3.9
$ source activate gutils
$ conda install -c conda-forge gutils

Development

Setup

$ git clone [[email protected]:axiom/packrat.git](https://github.com/secoora/GUTILS.git)

Install Anaconda (using python3): http://conda.pydata.org/docs/download.html

Read Anaconda quickstart: http://conda.pydata.org/docs/test-drive.html

It is recommended that you use mamba to install to speed up the process: https://github.com/mamba-org/mamba.

Setup a GUTILS conda environment and install the base packages: you are

$ mamba env create environment.yml
$ conda activate gutils

Update

To update the gutils environment, issue these commands from your root gutils directory

$ git pull
$ conda deactivate
$ conda env remove -n gutils
$ mamba env create environment.yml
$ conda activate gutils

Testing

The tests are written using pytest. To run the tests use the pytest command.

To run the "long" tests you will need this cloned somewhere. Then set the env variable GUTILS_TEST_CONFIG_DIRECTORY to the config directory, ie export GUTILS_TEST_CONFIG_DIRECTORY=/data/dev/SGS/config and run pytest -m long

To run a specific test, locate the test name you would like to run and run: pytest -k [name_of_test] i.e. pytest -k TestEcoMetricsOne

To run the tests in Docker, you can build the image (which does not include the tests or test data to reduce image size) and volume mount the tests when running:

docker built -t gutils .
docker run -it --rm -v $(pwd)/gutils/tests:/code/gutils/tests gutils pytest -m "not long"

gutils's People

Contributors

kwilcox avatar mlindemu avatar jr3cermak avatar lukecampbell avatar tsgolden avatar dependabot[bot] avatar

Stargazers

Christian Sarason avatar Vecko avatar Anthony Cossio - NOAA avatar shaunwbell avatar  avatar Brita Irving avatar  avatar

Watchers

 avatar James Cloos avatar Shane St Savage avatar  avatar Guilherme Castelão avatar  avatar Anthony Cossio - NOAA avatar

gutils's Issues

`gutils` not available on Anaconda, and other questions

Hey there. I was very happy to find a FOSS package to interact with data output from Teledyne Webb Slocum Gliders.

Following the setup instructions in the readme, I found that gutils isn't on Conda forge. I assume this method of installation doesn't work (yet?)

It would be great to know:

  • how feature complete this package is
  • whether this package is in active development/what the current focusses are

If contributions are welcome, I'm happy to open some PRs tackling different areas (assuming my supervisor gives me the go-ahead).

Assign_profiles() has errors

I found two bugs in the GUTILS assign_profile() function.
For line 77 in assign_profiles.
inflections = np.where(np.diff(delta_depth) != 0)[0]

inflections = np.where(np.diff(delta_depth) != 0)[0]

This line will think each extreme point is a new profiles.
128 129 130 131
For example, delta_depth = [ .., .., 1.0, 0.0, -1.0, -1.0, …..], then
inflections will be [0, 128, 129, 200, ….]. For 128 ~129 is a little profile for extreme point, that will looks like the graph below.

screen shot 2018-03-12 at 9 47 13 am

and I guess the sampling algorithm, between line 57-66 you get sampling data depending on the time interval(tsint), which cases the offset issues in the graph below
screen shot 2018-03-12 at 9 51 25 am
I think sampling data can find extreme points, but can’t get a very precisely extreme point.

get_decimal_degrees() has errors near equator

From Kerfoot:

I found a bug in the GUTILS get_decimal_degrees() when trying to convert
small (near the equator) GPS positions from NMEA coordinates to decimal
degrees:

https://github.com/SECOORA/GUTILS/blob/master/gutils/gbdr/methods.py#L281

I rewrote the function to do a straight mathematical conversion instead
of converting to a string, parsing, etc. Here's the code:

def get_decimal_degrees(lat_lon):
     """Converts glider gps coordinate ddmm.mmm to decimal degrees dd.ddd

     Arguments:
     lat_lon - A floating point latitude or longitude in the format ddmm.mmm
         where dd's are degrees and mm.mmm is decimal minutes.

     Returns decimal degrees float
     """
     # Absolute value of the coordinate
     try:
         pos_lat_lon = abs(lat_lon)
     except (TypeError, ValueError) as e:
         return

     # Calculate NMEA degrees as an integer
     nmea_degrees = int(pos_lat_lon/100)*100

     # Subtract the NMEA degrees from the absolute value of lat_lon and divide by 60
     # to get the minutes in decimal format
     gps_decimal_minutes = (pos_lat_lon - nmea_degrees)/60.0

     # Divide NMEA degrees by 100 and add the decimal minutes
     decimal_degrees = (nmea_degrees/100) + gps_decimal_minutes

     if lat_lon < 0:
         return -decimal_degrees

     return decimal_degrees

Unstable `axiom/gutils:latest` Docker image

I saw in the push.yml workflow (which runs on push and PRs) that the image is built and uploaded to axiom/gutils:latest as part of testing. This might cause some confusion since latest in other projects usually refers to the latest stable version, rather than testing.

- name: Build and push
uses: docker/build-push-action@v2
with:
push: false
tags: axiom/gutils:latest
cache-from: type=local,src=${ BUILDX_CACHE }
cache-to: type=local,dest=${ BUILDX_CACHE }
outputs: type=docker

- name: Push latest image to Docker Hub if on master or main branch of the repo
uses: docker/build-push-action@v2
with:
push: true
tags: axiom/gutils:latest
cache-from: type=local,src=${ BUILDX_CACHE }
cache-to: type=local,dest=${ BUILDX_CACHE }

I'd recommend building locally for testing, or using the git SHA as tag.

The publish workflow (tagging with the version name in publish.yml) is correct though, so should be used instead for production.


Not a big issue, just an FYI for the community.

utilization of parquet for intermediate storage

There are several steps involved in migration to parquet for intermediate processing of slocum glider data.

  • ensure dbdreader reproduces similar results to slocum binaries (smerckel/dbdreader#18)
  • replacement of convertDbds.sh with dbdreader/parquet
  • desired storage pattern for parquet (just using tables for now)

Is there a particular storage pattern or design desired for the parquet data structures?
REF: https://arrow.apache.org/docs/python/parquet.html#parquet-file-writing-options

  • Enforce version 2.4? 2.6?
  • Ensure the structure is queryable by time for speedy subsetting?
  • If enforcing 2.6, timestamp units become less an issue
  • Partitioning? Glider ID, Deployment ID, Process method (rt vs. delayed), QC'd (Level 0, 1, ...)

Have to nail down a potential dbdreader issue first.

Allow users to speficy a CAC file if metadata is not contained in the binary files

For practical purposes, this sensor list is very rarely included in each file due to it's large size relative to the actual sensor data (as an example, the file binary file I'm working with is 78680 bytes and the ascii header takes up 78481 bytes). The operator can choose to either 1) Include it in every file, 2) include it in no files if there is already a copy of the sensor list on shore, 3) Include it in just the first file of the mission.

For our real-time operations, we generate this file (called a cache or cac file) on shore and then set up the glider to NEVER include it in the data files.

Once the files are on shore, the dbd2asc executable takes one or more binary data files and, using the sensor list either contained in the file or the location specified via dbd2asc -c /PATH/TO/SENSORLILST.cac, converts the binary data to ascii. As far as I can tell, GUTILS provides no option to specify an external cac file. My guess is that the vast majority of users who would potentially use this repo will also NOT transmit this sensor list in every binary data file, but can't say for sure.

Segment_id

just find this is wrong place ask to question, so i close this

pyupgrade

Suggestions

  • Wait on #22 (to avoid merge conflicts)
  • Add pre-commit.ci to repo Pre-commit is already in the push.yml GHA workflow
    • Pros: Pre commit hooks run in CI for pull requests (independent of dev install). No associated cost for public repos.
  • Modify .pre-commit-config.yaml to add pyupgrade
    • Pros: Automatically remove outdated Python syntax, and convert code to take advantage of new features (e.g., f-strings, remove encoding comments)
    • Cons: Requires explicitly dropping support for Python versions as per arg in pyupgrade workflow

Addition to .pre-commit-config.yaml

- repo: https://github.com/asottile/pyupgrade
  rev: v3.15.0
  hooks:
    - id: pyupgrade
      args: [--py36-plus]

echometrics improvements

Continued from PR cf-convention/vocabularies#186

  • Use pytest test_slocum.py to exercise echometrics/pseudogram code to produce desired netCDF results
  • Always assign echometrics variables with extras dimension even if pseudogram is missing
  • Wire acoustic sensor configuration through deployment.json using extra_kwargs
  • apply extra_kwargs to ascii to nc conversion
  • tests/test_slocum.py::TestEcoMetricsThree::test_pseudogram produces three netCDF files that need to be consistent
  • Manual running of ascii/netCDF produces one file; pytest produces three files [differences in json config files]

Deferred:

  • apply extra_kwargs to dbd to ascii conversion (no current hooks to grab extra_kwargs from deployment.json)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.