n-west / bliss Goto Github PK

License: BSD 3-Clause "New" or "Revised" License

CMake 0.84% C++ 32.36% Jupyter Notebook 60.78% Cap'n Proto 0.14% Cuda 5.31% Python 0.57%

bliss's Introduction

Breakthrough Listen Interesting Signal Search

BLISS is a toolkit for finding narrowband doppler-drifting signals. This is frequently used to search for technosignatures.

BLISS is able to use cuda-accelerated kernels with deferred execution and memoization for flagging, noise estimation, integration, and hit search.

Installation

Running bliss requires

libhdf5-cpp-103 (double check!)
hdf5-filter-plugin
bitshuffle

Binary package

Prebuilt wheels are available for the following runtimes:

cpu only: pip install dedrift
cuda 11: pip install dedrift-cuda11x
cuda 12: pip install dedrift-cuda12x

Building from source

This project builds with cmake and is set up to be built as a python package with pyproject.toml. Building and running depend on the following libraries/tools:

cmake
gcc / clang capable of C++17 and supporting your version of cuda
libhdf5-dev
(optional) libcapnp-dev

CMake-based (dev) builds:

The standard cmake workflow should configure everything for a build:

mkdir build
cd build
cmake .. # -G Ninja # if you prefer ninja
make -j $(($(nproc)/2)) # replace with make -j CORES if you don't have nproc

The python package is partially set up in bliss/python/bliss. During the build process, the C++ extensions are built and placed in this package. In cmake development mode, this is placed in build/bliss/python/bliss and configured to be updated with any file changes as they occur (each file is symlinked) and new files will be added at the next build.

Python package build

pyproject.toml configures the python package and uses py-cmake-build as the build backend to get the required C++ extensions built and packaged appropriately. You can build this package with standard tools such as pip install . and python -m build.

Tests

Inside build/bland/tests you will have a test_bland executable. You can run those tests to sanity check everything works as expected.

Inside build/bliss/ you will have a justrun executable. You can pass a fil file to that, but right now this is a useful debugging and sanity check target, the output will be underwhelming.

Inside build/bliss/python you should have a pybliss.cpython-311-x86_64-linux-gnu.so or similarly named pybliss shared library. This can be imported in python and exposes functions and classes for dedoppler searches. Inside the notebooks directory there is an rfi mitigation visual.ipynb jupyter notebook that walks through several of the functions with plots showing results. That is the best way to get a feel for functionality.

Usage

Python

The following is example usage for Voayger-1 recordings from the Green Bank Telescope

import bliss

data_loc = "/datag/public/voyager_2020/single_coarse_channel/old_single_coarse/"
cadence = bliss.cadence([[f"{data_loc}/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5",
                    f"{data_loc}/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5",
                    f"{data_loc}/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5"
                    ],
                    [f"{data_loc}/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5"],
                    [f"{data_loc}/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5"],
                    [f"{data_loc}/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5"]])


cadence.set_device("cuda:0")

working_cadence = cadence
working_cadence = bliss.flaggers.flag_filter_rolloff(working_cadence, .2)
working_cadence = bliss.flaggers.flag_spectral_kurtosis(working_cadence, .05, 25)


noise_est_options = bliss.estimators.noise_power_estimate_options()
noise_est_options.masked_estimate = True
noise_est_options.estimator_method = bliss.estimators.noise_power_estimator.stddev

working_cadence = bliss.estimators.estimate_noise_power(working_cadence, noise_est_options)

int_options = bliss.integrate_drifts_options()
int_options.desmear = True
int_options.low_rate = -500
int_options.high_rate = 500

working_cadence = bliss.drift_search.integrate_drifts(working_cadence, int_options)

working_cadence.set_device("cpu")

hit_options = bliss.drift_search.hit_search_options()
hit_options.snr_threshold = 10
cadence_with_hits = bliss.drift_search.hit_search(working_cadence, hit_options)

hits_dict = bliss.plot_utils.get_hits_list(cadence_with_hits)

C++

    auto voyager_cadence = bliss::cadence({{"/datag/public/voyager_2020/single_coarse_channel/old_single_coarse/single_coarse_guppi_59046_80036_DIAG_VOYAGER-1_0011.rawspec.0000.h5",
                    "/datag/public/voyager_2020/single_coarse_channel/old_single_coarse/single_coarse_guppi_59046_80672_DIAG_VOYAGER-1_0013.rawspec.0000.h5",
                    "/datag/public/voyager_2020/single_coarse_channel/old_single_coarse/single_coarse_guppi_59046_81310_DIAG_VOYAGER-1_0015.rawspec.0000.h5"
                    },
                    {"/datag/public/voyager_2020/single_coarse_channel/old_single_coarse/single_coarse_guppi_59046_80354_DIAG_VOYAGER-1_0012.rawspec.0000.h5"},
                    {"/datag/public/voyager_2020/single_coarse_channel/old_single_coarse/single_coarse_guppi_59046_80989_DIAG_VOYAGER-1_0014.rawspec.0000.h5"},
                    {"/datag/public/voyager_2020/single_coarse_channel/old_single_coarse/single_coarse_guppi_59046_81628_DIAG_VOYAGER-1_0016.rawspec.0000.h5"}});

    auto cadence = voyager_cadence;

    cadence.set_device("cuda:0");

    cadence = bliss::flag_filter_rolloff(cadence, 0.2);
    cadence = bliss::flag_spectral_kurtosis(cadence, 0.1, 25);

    cadence = bliss::estimate_noise_power(
            cadence,
            bliss::noise_power_estimate_options{.estimator_method=bliss::noise_power_estimator::STDDEV, .masked_estimate = true}); // estimate noise power of unflagged data

    cadence = bliss::integrate_drifts(
            cadence,
            bliss::integrate_drifts_options{.desmear        = true,
                                            .low_rate       = -500,
                                            .high_rate      = 500,
                                            .rate_step_size = 1});

    cadence.set_device("cpu");

    auto cadence_with_hits = bliss::hit_search(cadence, {.method=bliss::hit_search_methods::CONNECTED_COMPONENTS,
                                                        .snr_threshold=10.0f});

    auto events = bliss::event_search(cadence);

    bliss::write_events_to_file(events, "events_output");

bliss's People

Contributors

Stargazers

Watchers

bliss's Issues

Make a simple user-accessible executable to find hits in a few files

For a non-API/programmer mode it's useful to have just an executable that can run and find hits. Seticore and turboseti both have an executable and any cli should adopt their good patterns that people will already be used to (but don't just copy since there's probably some legacy cruft)

The idea would be to move justrun in to something better named with a good cli so changing things doesn't require a rebuild and you don't need to open up a python env or write custom c++ to tinker with some hit finding parameters

Fix code scanning alert - Multiplication result converted to larger type

Tracking issue for:

https://github.com/n-west/bliss/security/code-scanning/1

I think the right fix for this is to not worry about the types but implement the kahan sum for cpu (we do that for gpu and it makes a nice difference)

Adjust noise power for desmeared hits

Desmearing adds to the integration length so the noise floor needs to be compensated appropriately for the additional bins.

Extend coarse channel discovery up to `observation_target` and `cadence`

get_number_coarse_channels and get_coarse_channel_with_frequency would be really convenient to have at top level of cadence and observation_target along with verification that it'll make sense for all scans in the structure so users don't have to slice down to a randomish scan to do the discovery of a coarse channel to interrogate.

Proper python package and develop install

Right now there exists a setup.py that can do develop installs and might even be able to generate a wheel that could be used.

Finish this up so it's actually easy to install with pip / other python build frontends

Add a filter shape compensation

ATA data uses a 4-tap per coarse channel pfb that leaves a pretty distinct filter shape that leaves a non-uniform noise floor. Add a noise-floor linearizer / filter shape compensation method.

Some sane approaches off the cuff:

Pass through the known filter shape and subtract out the expected shape to get a uniform noise floor
Estimate / fit some curve or parameter to the actual noise floor shape, and remove that. This is adaptive which might make some folks cringe

Check in with Wael to see if he has specific preferences

no such file or directory crypt.h in conda environment

      FAILED: bliss/python/CMakeFiles/nanobind-static.dir/__/__/_deps/nanobind-src/src/implicit.cpp.o
      /home/nwest/.conda/envs/bliss-testdist/bin/c++  -I/datax/scratch/nwest/tmp/pip-install-tr2r6zgv/dedrift_85159952386d42d3bbcbbe1c41ac2c94/build/cp38-cp38-linux_x86_64/_deps/nanobind-src/ext/robin_map/include -I/home/nwest/.conda/envs/bliss-testdist/include/python3.8 -I/datax/scratch/nwest/tmp/pip-install-tr2r6zgv/dedrift_85159952386d42d3bbcbbe1c41ac2c94/build/cp38-cp38-linux_x86_64/_deps/nanobind-src/include -g -std=gnu++17 -fPIC -fvisibility=hidden -fno-strict-aliasing -MD -MT bliss/python/CMakeFiles/nanobind-static.dir/__/__/_deps/nanobind-src/src/implicit.cpp.o -MF bliss/python/CMakeFiles/nanobind-static.dir/__/__/_deps/nanobind-src/src/implicit.cpp.o.d -o bliss/python/CMakeFiles/nanobind-static.dir/__/__/_deps/nanobind-src/src/implicit.cpp.o -c /datax/scratch/nwest/tmp/pip-install-tr2r6zgv/dedrift_85159952386d42d3bbcbbe1c41ac2c94/build/cp38-cp38-linux_x86_64/_deps/nanobind-src/src/implicit.cpp
      In file included from /datax/scratch/nwest/tmp/pip-install-tr2r6zgv/dedrift_85159952386d42d3bbcbbe1c41ac2c94/build/cp38-cp38-linux_x86_64/_deps/nanobind-src/include/nanobind/nb_python.h:21,
                       from /datax/scratch/nwest/tmp/pip-install-tr2r6zgv/dedrift_85159952386d42d3bbcbbe1c41ac2c94/build/cp38-cp38-linux_x86_64/_deps/nanobind-src/include/nanobind/nanobind.h:38,
                       from /datax/scratch/nwest/tmp/pip-install-tr2r6zgv/dedrift_85159952386d42d3bbcbbe1c41ac2c94/build/cp38-cp38-linux_x86_64/_deps/nanobind-src/include/nanobind/trampoline.h:13,
                       from /datax/scratch/nwest/tmp/pip-install-tr2r6zgv/dedrift_85159952386d42d3bbcbbe1c41ac2c94/build/cp38-cp38-linux_x86_64/_deps/nanobind-src/src/implicit.cpp:10:
      /home/nwest/.conda/envs/bliss-testdist/include/python3.8/Python.h:44:10: fatal error: crypt.h: No such file or directory
         44 | #include <crypt.h>
            |          ^~~~~~~~~

The fix (if you're in a conda environment) is to use

export CPATH=/home/nwest/.conda/envs/bliss-testdist/include/

This shouldn't happen and is some bad interaction of conda envs

Report of broken build without cuda

Vishal sent a report of a build failure on a macbook without cuda.

      -- Configuring done (94.2s)
      CMake Error at bliss/CMakeLists.txt:15 (target_link_libraries):
        Target "justrun" links to:
      
          CUDA::cudart_static
      
        but the target was not found.  Possible reasons include:
      
          * There is a typo in the target name.
          * A find_package call is missing for an IMPORTED target.
          * An ALIAS target is missing.

This is a bug I accidentally introduced because justrun isn't set up super flexible and I keep modifying it to do different types of experiments.

Automate releases

from CI, and also make it easy to build something that can be pip-installed on data center systems

add sigma clipping

Sigma clipping has been used in other contexts around SETI pipelines and it's a pretty logical thing to do for flagging. Add an implementation for flagging and ask around for opinions on other applications

handle single-node and spliced files

Right now there is an h5 filterbank file reader that makes some light assumptions about a single coarse channel. Add support to read from single node (blcXX), and spliced files.

single-node files will contain some number of coarse channels from a scan (probably 64, but depends on channelization).

spliced files are all nodes from a scan concatenated together.

A viable processing strategy is to add another layer to class hierarchy that represents coarse channels, at reading time only read the coarse channel that is being accessed and try to implement some lazy loading. Alternatively, it might make some sense to rename the filterbank class to coarse channel since that's effectively what is being captured with that class.

This requires a little bit more design thought but the goal would be to have lazy-loading to process a single coarse channel of interest from one of these files but also be able to process all coarse channels in a file (in parallel)

Allow a graph/array_deferred "reset" that reverts back to function rather than destroys it

The compute graph method of delaying execution depends on some kind of cleanup to allow enough memory for subsequent coarse channels to execute. That looks like this:

working_obs.set_device("cuda:0")
working_obs = bliss.flaggers.flag_filter_rolloff(working_obs, .2)
working_obs = bliss.flaggers.flag_spectral_kurtosis(working_obs, .05, 15)

noise_est_options = bliss.estimators.noise_power_estimate_options()
noise_est_options.masked_estimate = True
noise_est_options.estimator_method = bliss.estimators.noise_power_estimator.stddev

working_obs = bliss.estimators.estimate_noise_power(working_obs, noise_est_options)

int_options = bliss.integrate_drifts_options()
int_options.desmear = True
int_options.low_rate = -500
int_options.high_rate = 500


working_obs = bliss.drift_search.integrate_drifts(working_obs, int_options)
working_obs.set_device("cpu") # <--- This is key

hit_options = bliss.drift_search.hit_search_options()
hit_options.snr_threshold = 10

working_obs = bliss.drift_search.hit_search(working_obs, hit_options)

Iteratively requesting hits from each scan will move all temporaries like data, mask, freq-drft plane, rfi back to cpu. That can (and will!) eventually consume a lot of memory for large numbers of coarse channels or scans. To avoid this, I added a detach_graph option that defaults to true in hit detection, so after detecting hits the freq drift plane is purged since that takes the most memory. It would be a bit smarter to replace the std::variant with a tuple (or something similar but slightly smarter) that keeps the original function pointer around and either always uses the function pointer to return data since it's probably fast enough OR keeps a cache and if you feel like getting fancy purges it on demand or when mem pressure builds up

improve MAD SNR estimation

MAD (Median Absolute Deviation) provides a good alternative to SNR estimation. Right now it works on CPU for non-masked SNR. This is largely due to the median implementation being a relatively simple C++-container based method. Implement median for the general case and get masked medians + cuda version running

Improve vram usage of pipeline processing

It's currently possible to set devices for various data structures. The actual data movement can either be instant or delayed until the underlying is requested.

The problem with this is the approach I've taken to the API is treating it more as a function library where you run flagging, SNR estimation, etc on a per-cadence/target/scan basis. This means all 6 files in an ABACAD cadence would get loaded to a GPU and processed.

At 1048576 fine channels/coarse channel. With 16 slow-time steps of 4B each... 4 * 1048576 * 16 / 1e9= .067GB. We also need the mask which is another .0167GB. The drift range we want to search (-500...500) requires 1000 * 1048576 * 4/1e9 = 4.19GB.
That's a total of 4.273. I also want to collect rfi info along drifts, so that's another 1000 * 1048576/1e9=1.05GB/rfi type to collect of which we currently have 3. That's a total of 7.42GB on the GPU (excluding intermediates) for a single coarse channel.

Of course, after hit detection we don't need most of that data anymore and the amount on-disk is radically reduced.

So, I need to rethink the API a little bit. The options right now seem to be

Force users to manage it by running individual scans. This is pretty crappy experience
Use lazy eval and delay execution / device movement until the final result is requested. The hits of individual coarse channels will be requested one at a time, never all at once. For debugging we can still execute the intermediate lazy objects
Use the hyperseti pipeline object approach where you don't necessarily get function calls as the primary interface but a pipeline object that lets us make all of the execution decisions internally after it's fully constructed.

I'm leaning heavily towards (2) which has a rough path of:

Start with file reading / device movement and rather than reading data / mask immediately, hold a callable inside coarse_channel rather than the actual data.
Prototype that similar concept to flagging
Figure out a method to read data only once between flagging, noise estimation, integration (Think about returning shared_ptr of data,mask and the coarse_channel holds on to a weak_ptr, but need to think carefully about the lifetimes to make sure the weak_ptr is valid for all 3 requests)
Complete the owl

Add cuda backend

Add cuda backend for bland to support all operation types on gpus. Also add bliss cuda kernels

(de)Serialize rfi information for hits

RFI information along drift integration paths is collected and can be used for filtering but is not saved when hits are serialized, so information is lost

Non-reproducible hit finding on cuda

hit search on cuda seems to be somewhat non-reproducible. It's expect that hits may come in with a different order, but they should always be the same hits. Running a pipeline on voyager data with range -5 to 5Hz/sec then moving to cpu for hit_search gives a consistent 11 hits (I've run this a few dozen times repeatedly):

hit: .start_freq_MHz=8419.921875 (.index=524288), .drift_rate_Hz_per_second=-0.000000 (.index=500), .SNR=24762.199219, .power=3523141632, bandwidth=0.0
hit: .start_freq_MHz=8419.565228 (.index=651993), .drift_rate_Hz_per_second=-0.367353 (.index=536), .SNR=127.647469, .power=12842161, bandwidth=449.8
hit: .start_freq_MHz=8419.542734 (.index=659988), .drift_rate_Hz_per_second=-0.367353 (.index=536), .SNR=1071.168579, .power=107766488, bandwidth=61.5
hit: .start_freq_MHz=8419.520239 (.index=668097), .drift_rate_Hz_per_second=-0.367353 (.index=536), .SNR=122.781815, .power=12352645, bandwidth=441.4
hit: .start_freq_MHz=8419.475402 (.index=684087), .drift_rate_Hz_per_second=-0.367353 (.index=536), .SNR=13.124967, .power=1320456, bandwidth=5.6
hit: .start_freq_MHz=8419.475080 (.index=684202), .drift_rate_Hz_per_second=-0.357149 (.index=535), .SNR=14.723750, .power=1481304, bandwidth=5.6
hit: .start_freq_MHz=8419.610230 (.index=635887), .drift_rate_Hz_per_second=-0.387762 (.index=538), .SNR=13.594336, .power=1116704, bandwidth=329.7
hit: .start_freq_MHz=8419.565711 (.index=651764), .drift_rate_Hz_per_second=-0.367353 (.index=536), .SNR=14.698190, .power=1478733, bandwidth=8.4
hit: .start_freq_MHz=8419.564756 (.index=652106), .drift_rate_Hz_per_second=-0.387762 (.index=538), .SNR=13.553493, .power=1113350, bandwidth=11.2
hit: .start_freq_MHz=8419.520717 (.index=667868), .drift_rate_Hz_per_second=-0.367353 (.index=536), .SNR=13.233637, .power=1331390, bandwidth=5.6
hit: .start_freq_MHz=8419.519762 (.index=668210), .drift_rate_Hz_per_second=-0.387762 (.index=538), .SNR=13.299211, .power=1092462, bandwidth=14.0

When the cuda connected_components implementation runs, we get ....more hits... A lot of them are in the range above and below 1.3 Hz/sec. If I reduce the search range to -1 to 1 Hz/sec, there are usually 11 hits but unfortunately often we get 12 hits and sometimes even 13 hits. The extra hits can look like this:

hit: .start_freq_MHz=8419.542708 (.index=659990), .drift_rate_Hz_per_second=0.795932 (.index=22), .SNR=56.486515, .power=3594187, bandwidth=64.3
hit: .start_freq_MHz=8419.542594 (.index=660047), .drift_rate_Hz_per_second=0.612255 (.index=40), .SNR=70.239807, .power=4996826, bandwidth=290.6
hit: .start_freq_MHz=8419.921953 (.index=524253), .drift_rate_Hz_per_second=-0.540826 (.index=153), .SNR=1547.952393, .power=110120584, bandwidth=162.1

It does look like the expected hits are always there, so it's not necessarily an issue of random hits but something is randomly causing bad hits and at higher drift rate ranges there are nearly guaranteed bad hits showing up.

Update default unspecified coarse channelization

Parkes data contains 64M fine channels that represents a single coarse channel. Blimpy has some logic that calculates coarse channelization and would return 64 coarse channels so turboseti winds up chunking up 1M fine channels to process at a time.

This is a good idea ™️ when there are too many fine channels to reasonably process at once to split them up in to smaller work units. We can probably do better than a hard 64, but aiming for 4GB or so of work at once to some number that evenly divides the larger number of channels to smaller channels makes sense

Not picking up HDF5 found in conda env

Report of a system that isn't finding HDF5 installed in a conda environment. The associated error messages complain about CMP0074 not being set.

Add logger

fmtlib is used for printing various INFO, WARN, ERROR kind of diagnostics and messages. fmtlib is great for formatting, but it would be nice to migrate to a proper logging mechanism that allows (easily) setting log level and/or piping logs to file

cuda hit detection

Hit detection is written to respect bland arrays, but only works on cpu. Add cuda kernels to do local_maxima and connected components.

Along the way, think through how to improve hit detection

Make full pipeline device-aware

The underlying arrays & ops work on the cuda/cpu backend through drift integration. It's possible to set a scan and its data to a device. All other large data structures need to be device-aware as well.

above scan level:

cadence
observation_target

below scan level:

integrated_flags
frequency_drift_plane

Once these are in, the full pipeline should be able to run with a split between cuda/cpu

Compute drifts and metadata outside integration kernels and pass in as an argument

The frequency_drift_plane::drift_rate is a vector of info/md about drifts. This is computed in each kernel and the structure itself has fields like:

index in plane
drift_rate_slope (unitless using bins)
drift_rate_Hz_per_sec
desmeared_bins

It's impossible to calculate drift_rate_Hz_per_sec inside the kernels because they don't have the unit information, and each kernel is doing this same work (so repeated code). It would be better to compute everything outside of the kernels then pass this vector along as an argument to the kernels.

This would also make it a bit easier to provide the drift range arguments as unitless number of drifts or as a unit Hz/sec range.

Fix (de)serialization of hits w/ capnproto

In the shuffle of scan/coarse_channel class hierarchy, I think the cap'n proto (de)serialization got broken. I can now write hits to a file from a scan, but I don't think the reader makes sense and causes a crash from python.

Rethink serialization of hits from

list of hits
coarse channel
scan
observation_target
cadence

and being able to read them back in.

Also look at reverting compatibility to the older cap'n proto definitions if it makes sense.

.dat file in/out

Read/write dat files as well as capn proto

cuda binary packaging

Right now, the wheels are built for cpu-only. Add cuda to the wheels and make loading it a runtime dependency (in case cuda isn't ready). The second part can be deferred to a separate issue since most runtimes we care about will have cuda

Return and track drift integration metadata

For passing info in to drift integration it would be nice to have an option that passes drift rate ranges in physical units Hz/sec but still be able to pass discrete bins

some info that needs to come out of the drift integration process

integration length
desmearing (would be used to adjust noise threshold appropriately)
range searched

Optimize kernel launch sizes

Max useful thread count is obtainable with

    cudaDeviceProp props;
    cudaGetDeviceProperties(&props, spectrum_grid.device().device_id);
    props.multiProcessorCount * props.maxThreadsPerMultiProcessor // gives max useful thread count

Update kernel launches to use that number when determining grid sizes

parallelize the multiple-output axis argument cuda reductions

The cuda reductions are pretty decent per reduction, but the bland-impl of spectral kurtosis does a per-channel reduction (actually two) that is the longest part of processing. The reduction itself is parallelized (but at current data sizes is less than 1 warp, so not really) but the multiple reductions (1M+) are not parallelized. The easy thing to do will be add support for launching multiple blocks to parallelize reductions of multiple channels at once

Crash when requesting hits on cuda

I'm not sure if it's just trigerring executing or some other effect, but requesting hits() without first setting device to cpu causes a crash even if every operation in compute graph can happen on a gpu. Here's a backtrace:

(cuda-gdb) bt
#0  0x00007ffff0c66fd0 in cudbgReportDriverInternalError () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1  0x00007ffff0c6b800 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2  0x00007ffff0fad6f9 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007ffff0c6b84a in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007ffff0c6c1d6 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5  0x00007ffff0f7a492 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6  0x00007ffff0d65aa3 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7  0x0000555555cbd863 in __cudart516 ()
#8  0x0000555555cbd928 in __cudart1336 ()
#9  0x00007ffff7bc8a99 in __pthread_once_slow () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x0000555555d0bec9 in __cudart1612 ()
#11 0x0000555555cb44c7 in __cudart514 ()
#12 0x0000555555cde6e0 in cudaSetDevice ()
#13 0x00005555555d881b in bland::detail::blandDLTensor::blandDLTensor (this=0x7fffffff87f0, shape=..., dtype=..., device=..., strides=...) at /datax/scratch/nwest/Projects/bliss/bland/bland/bland_tensor_internals.cpp:93
#14 0x00005555555db767 in bland::ndarray::ndarray (this=0x7fffffff87f0, dims=..., dtype=..., device=...) at /datax/scratch/nwest/Projects/bliss/bland/bland/ndarray.cpp:124
#15 0x00005555555ec54e in bland::to (src=..., dest_dev=...) at /datax/scratch/nwest/Projects/bliss/bland/bland/ops/ops.cpp:52
#16 0x00005555555dbb3d in bland::ndarray::to (this=0x7fffffff8b00, dest=...) at /datax/scratch/nwest/Projects/bliss/bland/bland/ndarray.cpp:257
#17 0x00005555555e9b7c in bland::ndarray_deferred::operator bland::ndarray (this=0x7fffffff8cb0) at /datax/scratch/nwest/Projects/bliss/bland/bland/ndarray_deferred.cpp:47
#18 0x00005555555a8ff9 in operator() (__closure=0x5555577a0db0) at /datax/scratch/nwest/Projects/bliss/bliss/flaggers/spectral_kurtosis.cpp:35
#19 0x00005555555a9b84 in std::__invoke_impl<bland::ndarray, bliss::flag_spectral_kurtosis(bliss::coarse_channel, float, float)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...)
    at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/invoke.h:61
#20 0x00005555555a9a68 in std::__invoke_r<bland::ndarray, bliss::flag_spectral_kurtosis(bliss::coarse_channel, float, float)::<lambda()>&>(struct {...} &) (__fn=...) at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/invoke.h:116
#21 0x00005555555a98ff in std::_Function_handler<bland::ndarray(), bliss::flag_spectral_kurtosis(bliss::coarse_channel, float, float)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_function.h:291
--Type <RET> for more, q to quit, c to continue without paging--
#22 0x00005555555e9eb1 in std::function<bland::ndarray ()>::operator()() const (this=0x5555577a1b90) at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_function.h:590
#23 0x00005555555e9ae0 in bland::ndarray_deferred::operator bland::ndarray (this=0x7fffffff8fa0) at /datax/scratch/nwest/Projects/bliss/bland/bland/ndarray_deferred.cpp:41
#24 0x0000555555598ef7 in operator() (__closure=0x5555577a1da0) at /datax/scratch/nwest/Projects/bliss/bliss/drift_search/integrate_drifts.cpp:43
#25 0x000055555559a376 in std::__invoke_impl<bliss::frequency_drift_plane, bliss::integrate_drifts(bliss::coarse_channel, bliss::integrate_drifts_options)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...)
    at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/invoke.h:61
#26 0x0000555555599ffa in std::__invoke_r<bliss::frequency_drift_plane, bliss::integrate_drifts(bliss::coarse_channel, bliss::integrate_drifts_options)::<lambda()>&>(struct {...} &) (__fn=...)
    at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/invoke.h:116
#27 0x0000555555599bde in std::_Function_handler<bliss::frequency_drift_plane(), bliss::integrate_drifts(bliss::coarse_channel, bliss::integrate_drifts_options)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_function.h:291
#28 0x0000555555593b25 in std::function<bliss::frequency_drift_plane ()>::operator()() const (this=0x5555577a20f0) at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_function.h:590
#29 0x0000555555592566 in bliss::coarse_channel::integrated_drift_plane (this=0x7fffffffa930) at /datax/scratch/nwest/Projects/bliss/bliss/core/coarse_channel.cpp:279
#30 0x000055555559f4c3 in bliss::protohit_search (dedrifted_coarse_channel=..., options=...) at /datax/scratch/nwest/Projects/bliss/bliss/drift_search/protohit_search.cpp:30
#31 0x000055555559cd5d in bliss::hit_search[abi:cxx11](bliss::coarse_channel, bliss::hit_search_options) (dedrifted_scan=..., options=...) at /datax/scratch/nwest/Projects/bliss/bliss/drift_search/hit_search.cpp:19
#32 0x000055555559d2a1 in operator() (__closure=0x5555577acff0) at /datax/scratch/nwest/Projects/bliss/bliss/drift_search/hit_search.cpp:69
#33 0x000055555559dbb4 in std::__invoke_impl<std::__cxx11::list<bliss::hit>, bliss::hit_search(bliss::scan, bliss::hit_search_options)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...)
    at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/invoke.h:61
#34 0x000055555559daa5 in std::__invoke_r<std::__cxx11::list<bliss::hit>, bliss::hit_search(bliss::scan, bliss::hit_search_options)::<lambda()>&>(struct {...} &) (__fn=...) at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/invoke.h:116
#35 0x000055555559d909 in std::_Function_handler<std::__cxx11::list<bliss::hit, std::allocator<bliss::hit> >(), bliss::hit_search(bliss::scan, bliss::hit_search_options)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_function.h:291
#36 0x0000555555593795 in std::function<std::__cxx11::list<bliss::hit, std::allocator<bliss::hit> > ()>::operator()() const (this=0x5555577acfc0) at /mnt_home2/nwest/.conda/envs/bliss-dev/x86_64-conda-linux-gnu/include/c++/11.4.0/bits/std_function.h:590
#37 0x0000555555592041 in bliss::coarse_channel::hits[abi:cxx11]() const (this=0x555557785320) at /datax/scratch/nwest/Projects/bliss/bliss/core/coarse_channel.cpp:132
#38 0x0000555555589023 in bliss::scan::hits[abi:cxx11]() (this=0x7fffffffba20) at /datax/scratch/nwest/Projects/bliss/bliss/core/scan.cpp:246
#39 0x000055555556b3be in main (argc=1, argv=0x7fffffffdf28) at /datax/scratch/nwest/Projects/bliss/bliss/justrun.cpp:63

Pass channelization scheme through arguments

There's no metadata in fil h5 files that indicates the number of coarse channels or the number of fine channels per coarse channel so we use foff and tsamp to infer that according to the BL data paper.

There are other channelization schemes in use and since we only need to know the number of fine channels per coarse channel as long as the other md is present we can support them with an optional parameter for the number of fine channels per coarse channel.

Add that parameter as an optional argument and continue inferring if it's not given. This came up from some ATA data and it's a good suggestion from Carmen to just allow it to be passed through as an arg rather than force inferring the numbers

slow cuda hit search

the connected components-based hit search is faster than the cpu version, but can be slower in some cases. Should work on some optimizations to clean this up and be faster.

One major thing that might help a bit is to get rid of the concept of the custom neighborhood and just go for an l1 or l2 distance so looping over a neighborhood can be much faster and we can look at neighbor-of-neighbors pretty quickly.

per-coarse channel API

Right now there's an API around cadence, observation_target, scan, to slice out a coarse channel and make a copy of the current type focused only on that coarse channel.

That works perfectly fine and might use improvement, but we also need a mechanism to run the pipeline in a "per-coarse-channel" way. With the current tooling that would require making that slice, then sending the slice through the pipeline which is a bit cumbersome. What would be great would be something that makes it dead-simple to efficiently (both compute time and RAM/vRAM)

cuda pipeline

Add a device-aware API to pipelines.

It is currently possible to extract data and mask, send those to some of the lower-level APIs for flagging, noise estimation, etc. If you happen to have a coarse_channel with data/mask on a device, that will work transparently; however, there's no API right now to make a cadence, observation_target, scan, coarse_channel always move to the device.

Think through and add that API, demonstrate it works to send a single coarse channel through the pipeline (or some subset)

Error when reading scan hits from capnproto file

# The files can then be read into scan, cadence, observation_target objects using respective methods
read_hits = bliss.io.read_scan_hits_from_file("/home/ssheikh/bliss-test/hits_obs0-unknown_0.cp")

Produces an error:

RuntimeError: write_hits_to_file: could not open file for writing (fd=-1, error=No such file or directory)

Add spectrum plots showing hits with option for macro or zoomed in view

There's some old plotting code in the plot_utils that plots the whole spectrum with all hits.

Add the ability to

plot a single hit with some sensible spectrum bandwidth
plot the full coarse channel with all hits
plot an event with some sensible spectrum bandwidth
plot all events in a coarse channel

Most of this exists in notebook cells and just needs to be packaged up nicely