noaa-ocs-hydrography / kluster Goto Github PK

View Code? Open in Web Editor NEW

46.0 46.0 12.0 61.88 MB

A distributed multibeam processing system built using the Pangeo ecosystem

License: Creative Commons Zero v1.0 Universal

Python 45.03% Jupyter Notebook 0.53% Batchfile 0.01% CSS 0.39% JavaScript 0.46% HTML 53.57% Shell 0.01%

kluster's People

Contributors

Stargazers

Watchers

Forkers

oceanxplorer billshi-noaa giumas hannahmunoz saeed-moghimi-noaa davesteps ukho epifanio selimnairb 10thd eriffon viktor-av

kluster's Issues

Incorporate Dask Progress in the GUI/command line

See dask progressbar

Should support GUI and console apps. Should not lock up GUI, threaded progress only I think

Read chunks from zarr store

Allow user to set the chunksize for a new run, reload_data would then read the chunksize and set the attribute within Fqpr

I am not able to process any multibeam file. The software was installed following carefully the provided instructions and the data could be imported without any problem but it fails to process. I have tried on 2 different machines (Windows) and using multiple Kongsberg .all files (from EM2040, EM3002, EM302, and EM122). See the attached log for more details

log.txt

Unexpected filter behaviour?

Hi @ericgyounkin ,

I have noticed some unexpected behaviour when using the filter plugin functionality. It has to do with when running in 'Points View' mode and when we have data selected over multiple lines. Our filter tool works by creating 3d 'chips' of data, you would expect that a larger area results in more chips however as you see from the example below when I select similar sized subsets I can very different numbers of chips when the subset spans multiple lines:

Subset of single line - results in 12 chips in our filter

Similar sized subset crossing 2 lines - 195 chips for our model!

I continue to try and debug but was wondering if you had any insights on why this could be happening?

Many thanks.

Surfacing inefficient/memory errors with large areas

Look at replacing with vgrid
Should support chunking of soundings, or some kind of parallelized input with a single output object
Examine cloud friendly formats cloud optimized geotiff

Investigate colorlog and logging integration

https://pypi.org/project/colorlog/

Configure travis CI and tests

currently have test_fqpr_generation with tests
need to configure Travis to use this and build with tests on push

Using ping counters across Kongsberg files

Kongsberg uses a 16bit counter field. As you log data and the counter number gets to the 16bit limit, it resets and starts over. Kluster converts multiple multibeam files into one dataset, concatenating along the time dimension. You can end up with duplicate ping counter numbers this way, if the lines happen to include one of these counter resets.

Need to fix this for reform_vars related methods to work (where we use counter and time to reform pings)

Error converting .all files v0.8.8

Hi, I'm getting the following error when trying to import/convert several .all files:

Using existing local cluster client...
<Client: 'tcp://127.0.0.1:55569' processes=8 threads=8, memory=63.87 GiB>
****Running Kongsberg .all converter****
1 file(s), Using 1 chunk(s) in parallel
[                                        ] | 0% Completed |  1.4s
Error running action multibeam
Traceback (most recent call last):
  File "HSTB\kluster\gui\kluster_worker.py", line 38, in run
  File "HSTB\kluster\fqpr_actions.py", line 270, in execute_action
  File "HSTB\kluster\fqpr_actions.py", line 52, in execute
  File "HSTB\kluster\fqpr_convenience.py", line 112, in convert_multibeam
  File "HSTB\kluster\fqpr_generation.py", line 532, in read_from_source
  File "HSTB\kluster\xarray_conversion.py", line 1000, in read
  File "HSTB\kluster\xarray_conversion.py", line 1552, in batch_read
  File "HSTB\kluster\xarray_conversion.py", line 1314, in _batch_read_sequential
  File "distributed\client.py", line 1946, in gather
    return self.sync(
  File "distributed\utils.py", line 310, in sync
    return sync(
  File "distributed\utils.py", line 364, in sync
    raise exc.with_traceback(tb)
  File "distributed\utils.py", line 349, in f
    result[0] = yield future
  File "tornado\gen.py", line 762, in run
  File "distributed\client.py", line 1811, in _gather
    raise exception.with_traceback(traceback)
  File "HSTB\kluster\xarray_conversion.py", line 110, in _run_sequential_read
  File "HSTB\kluster\fqpr_drivers.py", line 173, in sequential_read_multibeam
  File "HSTB\drivers\par3.py", line 879, in sequential_read_records
  File "HSTB\drivers\par3.py", line 801, in _finalize_records
IndexError: index 1681 is out of bounds for axis 0 with size 1681

OS: WIndows 10
Version: 0.8.8 (same error with v0.8.4)

fqpr_generation.sv_correct- currently only supports nearest-in-time profile selection, expand to other common methods

See Fqpr.return_cast_idx_nearestintime

Should probably include a nearest in time/distance type method that has been successful in operational hydro.

Flying vessel

Is the vessel obj off? Or am I doing something wrong?

A waterline of -9 m makes things more reasonable.

expand Accuracy Test to work with GDAL supported formats as well as Bathygrid data

Include ability to run filters on points in Points View and subsets of FQPR

Think about GSF as an interchange format

Might be a better option out there, but GSF would probably be better than exporting to csv

This might already support reading/writing
https://github.com/schwehr/generic-sensor-format

Spec can be found here
https://www.leidos.com/products/ocean-marine

include ability to filter time series data

Integrate some of the stuff in scipy/numpy for filtering/interpolation
use the basic plots type widget to select data variables/time periods
save results to disk

saved results should be in a separate variable so that you retain original data

UI in linux

Hi, is it possible to run the UI in linux (ubuntu)?

Export shapefile for ship tracklines, grid cells

Docker file won´t built a working image

I cannot build the docker image using the docker file located in the root folder. It seems that the last 2 lines, besides uncommenting, need changing "conda run" for "RUN conda". But after it and when trying to build the docker image from get docker file, I am getting this error

executor failed running [conda run -n kluster_test /bin/bash -c conda -n kluster_test pip install git+https://github.com/noaa-ocs-hydrography/kluster.git#egg=hstb.kluster]: exit code: 1

See the log

(base) PS C:\Users\monoc\Downloads\kluster-kluster_0_8_9> docker build -t kluster089 .
[+] Building 1.4s (19/20)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.89kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:18.04 0.8s
=> [ 1/17] FROM docker.io/library/ubuntu:18.04@sha256:c2aa13782650aa7ade424b12008128b60034c795f25456e8eb552d0a0f447cad 0.0s
=> CACHED [ 2/17] RUN apt-get update 0.0s
=> CACHED [ 3/17] RUN apt-get install -y git 0.0s
=> CACHED [ 4/17] RUN apt-get install -y wget 0.0s
=> CACHED [ 5/17] RUN apt install libgl1-mesa-glx -y 0.0s
=> CACHED [ 6/17] RUN apt-get install ffmpeg libsm6 libxext6 -y 0.0s
=> CACHED [ 7/17] RUN adduser --disabled-password --gecos "Non-root user" --uid 1000 --gid 100 --home /ho 0.0s
=> CACHED [ 8/17] RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-py38_4.10.3-Linux-x86_64.sh -O ~/minico 0.0s
=> CACHED [ 9/17] RUN echo ". /home/eyou102/miniconda3/etc/profile.d/conda.sh" >> ~/.profile 0.0s
=> CACHED [10/17] RUN conda init bash 0.0s
=> CACHED [11/17] RUN mkdir /home/eyou102/kluster 0.0s
=> CACHED [12/17] WORKDIR /home/eyou102/kluster 0.0s
=> CACHED [13/17] RUN conda update --name base --channel defaults conda 0.0s
=> CACHED [14/17] RUN conda create -n kluster_test python=3.8.12 0.0s
=> CACHED [15/17] RUN conda install -c conda-forge qgis=3.18.3 vispy=0.9.4 pyside2=5.13.2 gdal=3.3.1 h5py python-geohash 0.0s
=> ERROR [16/17] RUN conda -n kluster_test pip install git+https://github.com/noaa-ocs-hydrography/kluster.git#egg=hstb.k 0.6s

[16/17] RUN conda -n kluster_test pip install git+https://github.com/noaa-ocs-hydrography/kluster.git#egg=hstb.kluster:
#19 0.540 ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['/bin/bash', '-c', 'conda -n kluster_test pip install git+https://github.com/noaa-ocs-hydrography/kluster.git#egg=hstb.kluster']' command failed. (See above for error)
#19 0.540
#19 0.540 CommandNotFoundError: No command 'conda kluster_test'.
#19 0.540
#19 0.540

executor failed running [conda run -n kluster_test /bin/bash -c conda -n kluster_test pip install git+https://github.com/noaa-ocs-hydrography/kluster.git#egg=hstb.kluster]: exit code: 1

TPU - build a total uncertainty calculator based on the Rob Hare model

This is kind of in progress, should allow the inclusion of kongsberg/applanix rms error sources.

retain sounding-wise lookup for last process run

Build a dictionary attribute that has an integer key with the value being a description of the last process run.

Allow you to query the whole dataset to ensure that each sounding is up to date. Also lets you query the processing history of a single sounding.

Important if we allow selection of soundings for processing (and not just whole lines)

fqpr_project: provide intelligence in guiding processing steps. i.e. don't allow user to process with the wrong steps

Similar to Charlene, use the understanding of things like project EPSG to ensure user/automated process does not process with a different EPSG

Getting an error while processing/georeferenceing .all file

Getting this error when trying to process a .all file. Unlike #79 I can import/convert the file but it is the next processing/georefererencing stage which fails:

****Building tx/rx vectors at time of transmit/receive****
Operating on system serial number = 275
using installation params 1616079030
Traceback (most recent call last):
  File "/snap/pycharm-community/267/plugins/python-ce/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "/home/david/kluster/HSTB/kluster/fqpr_convenience.py", line 308, in process_multibeam
    fqpr_inst.get_orientation_vectors(initial_interp=orientation_initial_interpolation, subset_time=subset_time)
  File "/home/david/kluster/HSTB/kluster/fqpr_generation.py", line 2208, in get_orientation_vectors
    self.generate_starter_orientation_vectors(prefixes, timestmp)
  File "/home/david/kluster/HSTB/kluster/fqpr_generation.py", line 623, in generate_starter_orientation_vectors
    rx_heading = abs(float(self.multibeam.xyzrph[txrx[1] + '_h'][tstmp]))
KeyError: 'rx_h'

File info:

FQPR: Fully Qualified Ping Record built by Kluster Processing
-------------------------------------------------------------
Contains:
2 sonar heads, 18000 pings, version 0.8.9
Start: Thu Mar 18 14:50:30 2021 UTC
End: Thu Mar 18 14:55:30 2021 UTC
Minimum Latitude:  <omitted> Maximum Latitude: <omitted>
Minimum Longitude:  <omitted> Maximum Longitude:  <omitted>
Minimum Northing: Unknown Maximum Northing: Unknown
Minimum Easting: Unknown Maximum Easting: Unknown
Minimum Depth: Unknown Maximum Depth: Unknown
Current Status: converted complete
Sonar Model Number: em2040_dual_rx
Primary/Secondary System Serial Number: 275/281
Horizontal Datum: 32630
Vertical Datum: waterline
Navigation Source: Unknown
Contains SBETs: False
Sound Velocity Profiles: 1

Disable dask progress in GUI, repeats lines clogging up stdout

EM2042 support and new KMALL datagram revision

Kongsberg has launched their new MBES for shallow water, the EM 2042. Could you Kluster support it? I am attaching a sample zipped file .
There is also the just released Kmall rev J (https://www.kongsbergdiscovery.online/sis/kmall/html/index.html) . although that EM2042 file was logged with the previous datagram revision.

rework the output progress bars to maybe merge/replace progress information, keep the output text to a minimum

replace with a progress bar that increments once for each chunk to reduce text overload

fqpr_sat.WobbleTest - provide clear guidance on values to use based on plots

WobbleTest will give you an indication of what might be wrong with your data using cross-correlation plots
Need to provide clear guidance (i.e. you have a 9ms Latency value) instead of just showing plots

EM124 Data Processing in Kluster

"Sonar model not understood" issue with the "em124" data

set a custom port for the Dask cluster

Will be useful to provide the user a way to manually select the PORT number where to start a dask cluster - so that we can expose such port in the docker environment.
I had a look into dask_helpers.py but not sure which method I should modify - seems the code needs to split the address string in IP:PORT - the values for IP:PORT can be then stored into- and retrieved from- kluster_variables.py

backscatter support

Support for reading .all seabed image 89 datagram (sample amplitudes)
Support for reading .kmall MRZ reflectivity, either

reflectivity1 (not corrected)
reflectivity2 (corrected by operator selected value and meanAbsCoeff)

backscatter calibration values - different scalar values for freq/mode

pyproj Transformer - order of inputs dependent on epsg

order of inputs in Transformer.transform appears to depend on the epsg provided. Needs more testing.

fqpr_generation.Fqpr.export_pings_to_dataset : beam index is based on sector-wise beam numbers, not ping-wise beam numbers

Need to rework beam index in soundings to be based on the ping-wise beam number. Should either:

rework how beam indices are stored in raw_ping datasets (where beam numbers are based on actual beam numbers as soon as data is converted)

fix the export_pings_to_dataset to make the beam numbers based on actual beam numbers. Would probably require reforming pings temporarily or something.

unify logger calls across modules

All kluster modules should probably use the same logger instance I think, since we drive to file. Fix that across all modules.

Array with shape error when processing mulitbeam data

Hello,
I have been practising importing em122 files (.all), and it was going well. So I decided to import and process a larger batch (approximately 28 files) and then kept getting this error:

C:... \anaconda3\envs\kluster_test\lib\site-packages\zarr\util.py", line 526, in check_array_shape
raise ValueError('parameter {!r}: expected array with shape {!r}, got {!r}'
ValueError: parameter 'value': expected array with shape (7530,), got (7752,)

2023-10-02 16:42:26,818 - INFO - kluster_action: no data returned from action execution

Any ideas? Many thanks!
PS the software is super cool :)

Creating a 'plugin' for Kluster

Hi @ericgyounkin,
As you're aware we are developing algorithms/models for cleaning soundings from multibeam data. I noticed #23 and we have been mainly using GSF as a format to share data so being able to read GSF files would great. We are always exploring tools that might help us iterate our algorithms faster. The inputs to the models are essentially dask dataframes of x,y,z data. The two things I would be looking to try first would be:

Apply an algorithm to a full survey (all loaded lines)
Apply an algorithm to selected subset (same subset that is in the 3d points view)

My feeling is that it shouldn't be too hard as you are using xarray datasets which are easily converted to dask data frames. Any pointers you can give to help us try this out, such us code structure/architecture, how to interact with the data or create some basic gui elements, would be much appreciated. One specific question I have:

Is there a way to use the GUI in debug or interactive mode IE so we can see in a console what is being run?

Thanks

pyproj Transformer - order of inputs depends on CRS

found order of coordinates in transformer depends on CRS being from proj4 string or EPSG

from pyproj import Transformer, CRS

specifying nad83 10N with pyproj string

manual_crs = CRS.from_proj4('+proj=utm +zone=10 +ellps=GRS80 +datum=NAD83')
georef_transformer = Transformer.from_crs( manual_crs.geodetic_crs, manual_crs)
georef_transformer.transform(40, -120)
Out[5]: (inf, inf)
georef_transformer.transform(-120, 40)
Out[6]: (756099.6479720183, 4432069.056784666)

specifying nad83 10N from epsg

epsg_crs = CRS.from_epsg(26910)
georef_transformer = Transformer.from_crs(epsg_crs.geodetic_crs, epsg_crs)
georef_transformer.transform(40, -120)
Out[9]: (756099.6479720183, 4432069.056784666)
georef_transformer.transform(-120, 40)
Out[10]: (inf, inf)

manual_crs does indeed declare itself equivalent to 26910

manual_crs.to_epsg()
Out[16]: 26910

tpu - how to handle kongbserg failure to calculate quality factor

kmall MRZ datagrams contain Ifremer quality factor. Leaves 0.0 for quality factor when the computation fails. Currently that drives sonar uncertainty to zero for that sounding. This is probably the opposite of what we want. Should we artifically drive up uncertainty as a way of flagging the sounding? Should we just flag it rejected in the detection type? Are they already rejected?

New utility - directory monitoring application that takes in all raw data forms and generates project

incorporates:

intelligence from #4
filtering data using serialnumber and date
directory monitoring using watchdog probably
synchronize with external storage option?

Construct a jupyter notebook showing how to use kluster

should probably upload a test data file (i have a 9MB file that we can use)
Look at using dask + binder to compose the notebook

Test processing using Caris SVP file and svcorrect

Still need to actually test Caris SVP based profile with svcorrect. The SoundSpeedProfile will work, but there might be some underlying issues that I haven't anticipated.

Investigate more accurate beam correction methods

see this paper for one example Ive found

Error redrawing 3d points after filtering

Hi @ericgyounkin,
This happens intermittently. I am trying to pin down the conditions under which it happens:

I subset some data like so:

I apply a filter, in this case the angle filter -25+25. When the filter finished the points are redrawn like so:

However if I reselect the subset, the data is correct, indicating that the data written to disk is correct.

As I said it does not happen all the time. It happens with other filters so not related to the angle filter. It does seem to be related to having multiple lines selected as I don't think it happens if all the data is from one line. I will keep trying to get insight into the problem. Thanks.

integrate CUBE into bathygrid/kluster

involves working on open source CUBE + Numba, integration with Bathygrid, and expanding gridding in Kluster GUI/convenience for CUBE option

input post processed navigation - need to determine input datum and overwrite Fqpr input datum

If the processed SBET is nad83 export, need to save that datum to the Fqpr object to then feed georeferencing.

add customizable kluster_variables

add changed values to ini file
add logic in kluster_variables that loads and alters existing variables based on the ini
setting the kluster_variables variables using Global should change the reference

3d Vessel playback utility

Currently have fqpr_visualizations FqprVisualizations that can provide plots and animations
Can animate beam vectors and vessel orientation here
gui.dialog_vesselview can display a vessel as a 3d model

Combining all these things and you basically get a vessel + multibeam animation. Look at using vispy to compose these elements as well as the sv corrected offsets to get the full view.

Allow processing by subset of time

Can currently use subset_time with the main processing steps in fqpr_generation to only process a subset of the data. See fqpr_convenience reprocess_sounding_selection for an example of this. This method will not allow you to write back to disk however. Zarr write currently is based on the data index, not the time, so a write of 100 pings will write to the first 100 pings of the zarr store. Not good, need writes to be based on time for this to work.