Giter Club home page Giter Club logo

nighthawk's People

Contributors

bmvandoren avatar haroldmills avatar ses4j avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nighthawk's Issues

Warning: "set on a copy of a slice"

Thank you for your work on Nighthawk. I wanted to report something I am seeing in my logs when running with Vesper, and I wonder if it would be worth fixing:

2023-06-05 23:30:43,852 INFO                 /opt/conda/envs/nighthawk-0.2.0/lib/python3.10/site-packages/nighthawk/run_reconstructed_model.py:107: SettingWithCopyWarning: 
2023-06-05 23:30:43,852 INFO                 A value is trying to be set on a copy of a slice from a DataFrame.
2023-06-05 23:30:43,852 INFO                 Try using .loc[row_indexer,col_indexer] = value instead
2023-06-05 23:30:43,852 INFO                 
2023-06-05 23:30:43,852 INFO                 See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
2023-06-05 23:30:43,852 INFO                   df_split['tmp'] = range(df_split.shape[0])

It seems to reference the middle of:

def split_long_detections(df,max_duration=5):
df = df.reset_index(drop=True)
is_too_long = df['end_sec']-df['start_sec'] > max_duration
df_keep = df.loc[~is_too_long]
df_split = df.loc[is_too_long]
df_split['tmp'] = range(df_split.shape[0])
df_split = df_split.groupby('tmp', group_keys=False).apply(split_long_detections_helper,max_duration=max_duration)
df_split = df_split.drop('tmp',axis=1)
df_split = df_split.reset_index(drop=True)
df_out = pd.concat([df_keep,df_split])
df_out = df_out.sort_values('start_sec').reset_index(drop=True)
return df_out

Allow control of "max number of processors used".

Hi! So excited for this!

(now for the complaining...)

I am running on a 12 core Windows box and trying to run Nighthawk on all my old NFC files while doing other work. I'm unable to because it pegs every CPU. It would be nice if it was "nicer", e.g. allowing me to limit it to only 4 processors or something.

image

Nighthawk can't process audio files less than one second in duration

Example:

Processed 0.4 seconds of audio in 1.5 seconds, 0.2 times faster than real time.
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/nighthawk/bin/nighthawk", line 8, in <module>
    sys.exit(main())
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/run_nighthawk.py", line 14, in main
    nh.run_detector_on_files(
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 63, in run_detector_on_files
    detections = _run_detector_on_file(
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 145, in _run_detector_on_file
    return run_reconstructed_model.run_model_on_file(
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/run_reconstructed_model.py", line 742, in run_model_on_file
    preds, bad_inds, steps = process_file(audio_model,
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 182, in _get_model_predictions
    input_count = len(predictions[0])
IndexError: list index out of range

The model simply needs input of at least a full second in order to make predictions. (Actually, under default settings, in order to return a detection the model needs a file of at least 1.2 seconds given 20% overlap and a requirement of two detections in a row to trigger).

Possible solutions:

  • For input audio less than 1 s (or 1.2 s), pad with zeros and then pass to model. Pros: could take audio less than 1 s. Cons: Nighthawk will return a detection longer than the input (confusing). Under default settings, Nighthawk would also be guaranteed not to return any detections for files less than 0.20 s because that is less than the overlap size (1 s * 0.20 = 0.20 s). Could throw a warning under these conditions but still process audio.
  • Throw a more informative error message when a user provides a file less than 1 s (or 1.2 s) long.

Pinned scikit-learn version number slows package installation.

To avoid warnings from scikit-learn about potential probability calibration version incompatibilities, that package is currently pinned to version 1.0.1. That version is about a year and a half old, though, and using it causes installation of the Nighthawk package to trigger a build of a scikit-learn Python wheel, which slows installation considerably. It would be good to either be able to use a more recent version of scikit-learn that doesn't trigger such a build (the latest version, 1.2.2, does not), or perhaps eliminate the use of scikit-learn altogether by implementing our own version of probability calibration.

Allow output of .WAV clips and/or spectrograms of detections.

Hi - I think there's value in being able to write out clips and spectrograms from the detections file. I have a script I modified to work with the nighthawk output .csv, which is available here if anyone wants to try it:

https://github.com/ses4j/nfc-tools/blob/main/nfc-processing/getsnips.py

I didn't know the best way it might be integrated: as a separate tool? Separate but included in the package? As a subcommand of Nighthawk (like nighthawk clip ...)? With --wav-output? (My only issue with the last is then it wouldn't be possible to rerun with different options, etc, if you wanted more control...)

Thoughts?

Crash while running: numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype('float64'), dtype('<U2')) -> None

idk what caused this but it happened on a file. I can send the file if you want it, it's a 40MB recording. It's probably in your Box account.

It never generated the audacity file, but it had generated a CSV, which was empty. So maybe this is just 'what happens when you have no detections and try to generate an audacity file'?

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\wc\nfc-tools\.venv\Scripts\nighthawk.exe\__main__.py", line 7, in <module>
  File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\run_nighthawk.py", line 14, in main
    nh.run_detector_on_files(
  File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\detector.py", line 82, in run_detector_on_files
    _write_detection_audacity_label_file(output_file_path, detections)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\detector.py", line 275, in _write_detection_audacity_label_file
    detections['pred_cat_prob'] = detections['predicted_category'] + ' (' + detections['prob'].astype(float).round(3).astype(str) + ')'
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\common.py", line 81, in new_method
    return method(self, other)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\arraylike.py", line 186, in __add__
    return self._arith_method(other, operator.add)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\series.py", line 6108, in _arith_method
    return base.IndexOpsMixin._arith_method(self, other, op)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\base.py", line 1348, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\array_ops.py", line 232, in arithmetic_op
    res_values = _na_arithmetic_op(left, right, op)  # type: ignore[arg-type]
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\array_ops.py", line 171, in _na_arithmetic_op
    result = func(left, right)
numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype('float64'), dtype('<U2')) -> None

Installation yields wrong version of TensorFlow on M1 Macs.

On Kevin Tolan's M1 MacBook Pro, Nighthawk 0.1.0 gets installed with the regular tensorflow package instead of with Apple's tensorflow-macos package, and this causes nighthawk to fail because of an illegal machine instruction when it runs.

Nighthawk's pyproject.toml file aims to install TensorFlow in the form of the tensorflow-macos package on Apple silicon (i.e. M1 and M2) Macs and the regular tensorflow package everywhere else. It tries to do this with conditional dependencies that look at the platform_system and platform_machine environment markers, assuming that they will be "Darwin" and "arm64" if and only if the platform is an Apple silicon one. That works for my M2 Mac mini, but not for Kevin's M1 MacBook Pro. The marker values there are "Darwin" and "x86_64", which unfortunately are the same as for Intel Macs (or at least my 2019 MacBook Pro).

I do not see other markers in PEP 508 that seem like they would be useful for identifying Apple silicon Macs, and Google searches have not turned up a solution. A workaround that seemed to work in Kevin's case was to install the wrong TensorFlow package with Nighthawk and then do:

pip uninstall tensorflow
pip install tensorflow-mac

to replace it with the correct one.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.