bmvandoren / nighthawk Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 3.0 195.66 MB

Nighthawk is a machine learning model for acoustic monitoring of nocturnal bird migration.

License: Other

Python 31.58% PureBasic 68.42%

nighthawk's People

Contributors

Stargazers

Watchers

Forkers

ses4j agmacpha kellingnick

nighthawk's Issues

Apply a max duration for merged detections

Don't allow postprocessing merge function to create detections more than 5 s (?) long.

Warning: "set on a copy of a slice"

Thank you for your work on Nighthawk. I wanted to report something I am seeing in my logs when running with Vesper, and I wonder if it would be worth fixing:

2023-06-05 23:30:43,852 INFO                 /opt/conda/envs/nighthawk-0.2.0/lib/python3.10/site-packages/nighthawk/run_reconstructed_model.py:107: SettingWithCopyWarning: 
2023-06-05 23:30:43,852 INFO                 A value is trying to be set on a copy of a slice from a DataFrame.
2023-06-05 23:30:43,852 INFO                 Try using .loc[row_indexer,col_indexer] = value instead
2023-06-05 23:30:43,852 INFO                 
2023-06-05 23:30:43,852 INFO                 See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
2023-06-05 23:30:43,852 INFO                   df_split['tmp'] = range(df_split.shape[0])

It seems to reference the middle of:

Nighthawk/nighthawk/run_reconstructed_model.py

Lines 101 to 114 in ea78866

 def split_long_detections(df,max_duration=5): 

 df = df.reset_index(drop=True) 

 is_too_long = df['end_sec']-df['start_sec'] > max_duration 

 df_keep = df.loc[~is_too_long] 

 df_split = df.loc[is_too_long] 

 df_split['tmp'] = range(df_split.shape[0]) 

 df_split = df_split.groupby('tmp', group_keys=False).apply(split_long_detections_helper,max_duration=max_duration) 

 df_split = df_split.drop('tmp',axis=1) 

 df_split = df_split.reset_index(drop=True) 

 df_out = pd.concat([df_keep,df_split]) 

 df_out = df_out.sort_values('start_sec').reset_index(drop=True) 

 return df_out

Add Laridae probability calibration.

There are currently sigmoid probability calibrations for all taxa other than Laridae.

Allow control of "max number of processors used".

Hi! So excited for this!

(now for the complaining...)

I am running on a 12 core Windows box and trying to run Nighthawk on all my old NFC files while doing other work. I'm unable to because it pegs every CPU. It would be nice if it was "nicer", e.g. allowing me to limit it to only 4 processors or something.

Nighthawk can't process audio files less than one second in duration

Example:

Processed 0.4 seconds of audio in 1.5 seconds, 0.2 times faster than real time.
Traceback (most recent call last):
  File "/home/ec2-user/miniconda3/envs/nighthawk/bin/nighthawk", line 8, in <module>
    sys.exit(main())
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/run_nighthawk.py", line 14, in main
    nh.run_detector_on_files(
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 63, in run_detector_on_files
    detections = _run_detector_on_file(
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 145, in _run_detector_on_file
    return run_reconstructed_model.run_model_on_file(
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/run_reconstructed_model.py", line 742, in run_model_on_file
    preds, bad_inds, steps = process_file(audio_model,
  File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 182, in _get_model_predictions
    input_count = len(predictions[0])
IndexError: list index out of range

The model simply needs input of at least a full second in order to make predictions. (Actually, under default settings, in order to return a detection the model needs a file of at least 1.2 seconds given 20% overlap and a requirement of two detections in a row to trigger).

Possible solutions:

For input audio less than 1 s (or 1.2 s), pad with zeros and then pass to model. Pros: could take audio less than 1 s. Cons: Nighthawk will return a detection longer than the input (confusing). Under default settings, Nighthawk would also be guaranteed not to return any detections for files less than 0.20 s because that is less than the overlap size (1 s * 0.20 = 0.20 s). Could throw a warning under these conditions but still process audio.
Throw a more informative error message when a user provides a file less than 1 s (or 1.2 s) long.

Pinned scikit-learn version number slows package installation.

To avoid warnings from scikit-learn about potential probability calibration version incompatibilities, that package is currently pinned to version 1.0.1. That version is about a year and a half old, though, and using it causes installation of the Nighthawk package to trigger a build of a scikit-learn Python wheel, which slows installation considerably. It would be good to either be able to use a more recent version of scikit-learn that doesn't trigger such a build (the latest version, 1.2.2, does not), or perhaps eliminate the use of scikit-learn altogether by implementing our own version of probability calibration.

Allow output of .WAV clips and/or spectrograms of detections.

Hi - I think there's value in being able to write out clips and spectrograms from the detections file. I have a script I modified to work with the nighthawk output .csv, which is available here if anyone wants to try it:

https://github.com/ses4j/nfc-tools/blob/main/nfc-processing/getsnips.py

I didn't know the best way it might be integrated: as a separate tool? Separate but included in the package? As a subcommand of Nighthawk (like nighthawk clip ...)? With --wav-output? (My only issue with the last is then it wouldn't be possible to rerun with different options, etc, if you wanted more control...)

Thoughts?

Crash while running: numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype('float64'), dtype('<U2')) -> None

idk what caused this but it happened on a file. I can send the file if you want it, it's a 40MB recording. It's probably in your Box account.

It never generated the audacity file, but it had generated a CSV, which was empty. So maybe this is just 'what happens when you have no detections and try to generate an audacity file'?

Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\wc\nfc-tools\.venv\Scripts\nighthawk.exe\__main__.py", line 7, in <module>
  File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\run_nighthawk.py", line 14, in main
    nh.run_detector_on_files(
  File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\detector.py", line 82, in run_detector_on_files
    _write_detection_audacity_label_file(output_file_path, detections)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\detector.py", line 275, in _write_detection_audacity_label_file
    detections['pred_cat_prob'] = detections['predicted_category'] + ' (' + detections['prob'].astype(float).round(3).astype(str) + ')'
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\common.py", line 81, in new_method
    return method(self, other)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\arraylike.py", line 186, in __add__
    return self._arith_method(other, operator.add)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\series.py", line 6108, in _arith_method
    return base.IndexOpsMixin._arith_method(self, other, op)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\base.py", line 1348, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\array_ops.py", line 232, in arithmetic_op
    res_values = _na_arithmetic_op(left, right, op)  # type: ignore[arg-type]
  File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\array_ops.py", line 171, in _na_arithmetic_op
    result = func(left, right)
numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype('float64'), dtype('<U2')) -> None

Installation yields wrong version of TensorFlow on M1 Macs.

On Kevin Tolan's M1 MacBook Pro, Nighthawk 0.1.0 gets installed with the regular tensorflow package instead of with Apple's tensorflow-macos package, and this causes nighthawk to fail because of an illegal machine instruction when it runs.

Nighthawk's pyproject.toml file aims to install TensorFlow in the form of the tensorflow-macos package on Apple silicon (i.e. M1 and M2) Macs and the regular tensorflow package everywhere else. It tries to do this with conditional dependencies that look at the platform_system and platform_machine environment markers, assuming that they will be "Darwin" and "arm64" if and only if the platform is an Apple silicon one. That works for my M2 Mac mini, but not for Kevin's M1 MacBook Pro. The marker values there are "Darwin" and "x86_64", which unfortunately are the same as for Intel Macs (or at least my 2019 MacBook Pro).

I do not see other markers in PEP 508 that seem like they would be useful for identifying Apple silicon Macs, and Google searches have not turned up a solution. A workaround that seemed to work in Kevin's case was to install the wrong TensorFlow package with Nighthawk and then do:

pip uninstall tensorflow
pip install tensorflow-mac

to replace it with the correct one.

bmvandoren / nighthawk Goto Github PK

nighthawk's People

Contributors

Stargazers

Watchers

Forkers

nighthawk's Issues

Apply a max duration for merged detections

Warning: "set on a copy of a slice"

Add Laridae probability calibration.

Allow control of "max number of processors used".

Nighthawk can't process audio files less than one second in duration

Pinned scikit-learn version number slows package installation.

Allow output of .WAV clips and/or spectrograms of detections.

Crash while running: numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype('float64'), dtype('<U2')) -> None

Installation yields wrong version of TensorFlow on M1 Macs.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def split_long_detections(df,max_duration=5):
	df = df.reset_index(drop=True)
	is_too_long = df['end_sec']-df['start_sec'] > max_duration
	df_keep = df.loc[~is_too_long]
	df_split = df.loc[is_too_long]

	df_split['tmp'] = range(df_split.shape[0])
	df_split = df_split.groupby('tmp', group_keys=False).apply(split_long_detections_helper,max_duration=max_duration)
	df_split = df_split.drop('tmp',axis=1)
	df_split = df_split.reset_index(drop=True)

	df_out = pd.concat([df_keep,df_split])
	df_out = df_out.sort_values('start_sec').reset_index(drop=True)
	return df_out