bmvandoren / nighthawk Goto Github PK
View Code? Open in Web Editor NEWNighthawk is a machine learning model for acoustic monitoring of nocturnal bird migration.
License: Other
Nighthawk is a machine learning model for acoustic monitoring of nocturnal bird migration.
License: Other
Don't allow postprocessing merge function to create detections more than 5 s (?) long.
Thank you for your work on Nighthawk. I wanted to report something I am seeing in my logs when running with Vesper, and I wonder if it would be worth fixing:
2023-06-05 23:30:43,852 INFO /opt/conda/envs/nighthawk-0.2.0/lib/python3.10/site-packages/nighthawk/run_reconstructed_model.py:107: SettingWithCopyWarning:
2023-06-05 23:30:43,852 INFO A value is trying to be set on a copy of a slice from a DataFrame.
2023-06-05 23:30:43,852 INFO Try using .loc[row_indexer,col_indexer] = value instead
2023-06-05 23:30:43,852 INFO
2023-06-05 23:30:43,852 INFO See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
2023-06-05 23:30:43,852 INFO df_split['tmp'] = range(df_split.shape[0])
It seems to reference the middle of:
Nighthawk/nighthawk/run_reconstructed_model.py
Lines 101 to 114 in ea78866
There are currently sigmoid probability calibrations for all taxa other than Laridae.
Hi! So excited for this!
(now for the complaining...)
I am running on a 12 core Windows box and trying to run Nighthawk on all my old NFC files while doing other work. I'm unable to because it pegs every CPU. It would be nice if it was "nicer", e.g. allowing me to limit it to only 4 processors or something.
Example:
Processed 0.4 seconds of audio in 1.5 seconds, 0.2 times faster than real time.
Traceback (most recent call last):
File "/home/ec2-user/miniconda3/envs/nighthawk/bin/nighthawk", line 8, in <module>
sys.exit(main())
File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/run_nighthawk.py", line 14, in main
nh.run_detector_on_files(
File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 63, in run_detector_on_files
detections = _run_detector_on_file(
File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 145, in _run_detector_on_file
return run_reconstructed_model.run_model_on_file(
File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/run_reconstructed_model.py", line 742, in run_model_on_file
preds, bad_inds, steps = process_file(audio_model,
File "/home/ec2-user/miniconda3/envs/nighthawk/lib/python3.10/site-packages/nighthawk/detector.py", line 182, in _get_model_predictions
input_count = len(predictions[0])
IndexError: list index out of range
The model simply needs input of at least a full second in order to make predictions. (Actually, under default settings, in order to return a detection the model needs a file of at least 1.2 seconds given 20% overlap and a requirement of two detections in a row to trigger).
Possible solutions:
To avoid warnings from scikit-learn about potential probability calibration version incompatibilities, that package is currently pinned to version 1.0.1. That version is about a year and a half old, though, and using it causes installation of the Nighthawk package to trigger a build of a scikit-learn Python wheel, which slows installation considerably. It would be good to either be able to use a more recent version of scikit-learn that doesn't trigger such a build (the latest version, 1.2.2, does not), or perhaps eliminate the use of scikit-learn altogether by implementing our own version of probability calibration.
Hi - I think there's value in being able to write out clips and spectrograms from the detections file. I have a script I modified to work with the nighthawk output .csv, which is available here if anyone wants to try it:
https://github.com/ses4j/nfc-tools/blob/main/nfc-processing/getsnips.py
I didn't know the best way it might be integrated: as a separate tool? Separate but included in the package? As a subcommand of Nighthawk (like nighthawk clip ...
)? With --wav-output
? (My only issue with the last is then it wouldn't be possible to rerun with different options, etc, if you wanted more control...)
Thoughts?
idk what caused this but it happened on a file. I can send the file if you want it, it's a 40MB recording. It's probably in your Box account.
It never generated the audacity file, but it had generated a CSV, which was empty. So maybe this is just 'what happens when you have no detections and try to generate an audacity file'?
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\wc\nfc-tools\.venv\Scripts\nighthawk.exe\__main__.py", line 7, in <module>
File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\run_nighthawk.py", line 14, in main
nh.run_detector_on_files(
File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\detector.py", line 82, in run_detector_on_files
_write_detection_audacity_label_file(output_file_path, detections)
File "C:\wc\nfc-tools\.venv\lib\site-packages\nighthawk\detector.py", line 275, in _write_detection_audacity_label_file
detections['pred_cat_prob'] = detections['predicted_category'] + ' (' + detections['prob'].astype(float).round(3).astype(str) + ')'
File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\common.py", line 81, in new_method
return method(self, other)
File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\arraylike.py", line 186, in __add__
return self._arith_method(other, operator.add)
File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\series.py", line 6108, in _arith_method
return base.IndexOpsMixin._arith_method(self, other, op)
File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\base.py", line 1348, in _arith_method
result = ops.arithmetic_op(lvalues, rvalues, op)
File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\array_ops.py", line 232, in arithmetic_op
res_values = _na_arithmetic_op(left, right, op) # type: ignore[arg-type]
File "C:\wc\nfc-tools\.venv\lib\site-packages\pandas\core\ops\array_ops.py", line 171, in _na_arithmetic_op
result = func(left, right)
numpy.core._exceptions._UFuncNoLoopError: ufunc 'add' did not contain a loop with signature matching types (dtype('float64'), dtype('<U2')) -> None
On Kevin Tolan's M1 MacBook Pro, Nighthawk 0.1.0 gets installed with the regular tensorflow
package instead of with Apple's tensorflow-macos
package, and this causes nighthawk
to fail because of an illegal machine instruction when it runs.
Nighthawk's pyproject.toml
file aims to install TensorFlow in the form of the tensorflow-macos
package on Apple silicon (i.e. M1 and M2) Macs and the regular tensorflow
package everywhere else. It tries to do this with conditional dependencies that look at the platform_system
and platform_machine
environment markers, assuming that they will be "Darwin"
and "arm64"
if and only if the platform is an Apple silicon one. That works for my M2 Mac mini, but not for Kevin's M1 MacBook Pro. The marker values there are "Darwin"
and "x86_64"
, which unfortunately are the same as for Intel Macs (or at least my 2019 MacBook Pro).
I do not see other markers in PEP 508 that seem like they would be useful for identifying Apple silicon Macs, and Google searches have not turned up a solution. A workaround that seemed to work in Kevin's case was to install the wrong TensorFlow package with Nighthawk and then do:
pip uninstall tensorflow
pip install tensorflow-mac
to replace it with the correct one.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.