The orcacnn from axiom-data-science

Display spectrogram wave on homepage for uploaded acoustic data

For every uploaded audio file, display its corresponding spectrogram wave on the homepage to better assist in detecting orca calls through visual as well as auditory means.
Something close to OrcaAL comes to mind.

Deciding on ChunkSize

This would be the best time to decide on the chunkSize since we'll be starting with the development of detection and classification model soon, so it's better to take a proper decision for this.
First of all, let me show the difference between the chunkSizes of 1s, 2s and 3s:

When chunkSize = 1s (shows 3 spectrograms of 1s each )
When chunkSize = 2s
When chunkSize = 3s

Why I would choose 1s chunkSize?

Keeping in mind the further classification and template matching/extraction methods:

First reason will be that the calls can be distinctly seen in each spectrogram. There is a very less chance of spectrograms overlapping for 1s chunks than in the other 2s and 3s chunks. Since they won't overlap, the calls for further classification models can be directly taken as the whole spectrogram without any template extraction process.
Though the 1s chunks are kinda blurry, the input to CNN model are often image sizes of around 224x224 dimension which is less than the dimension of each of the above chunks (around 600x400), so still the 1s chunk is a good choice.
The 3s chunks are also a good choice, but much of the frequency-time area is merely noise, it's better to avoid it rather than crop each of them individually.

Template Matching also works out well for 1s chunks as I just tried, we can just loop over the 1s directory with the template for the given pod.

Template (I've just cut out a portion spanning the width of the call since that would be enough right?):

Matched image result:

So, is it a go for 1s chunks? @yosoyjay
I would like to know if you have any ideas, criticism or suggestions.

Normalization of audio does not produce any significant change

Hi
I am trying to normalize the data that we have but I see that there is no significant change in the output waveform.
Do we need to consider this for pre-processing our data? or is it enough to just have all the samples downsampled to a particular sampling frequency and a fixed length?

def audio_norm(data):
    max_data = np.max(data)
    min_data = np.min(data)
    data = (data-min_data)/(max_data-min_data+1e-6)
    return data-0.5

Make data available to CoLab

Ensure that data is available in CoLab as soon as it is delivered.

Develop killer whale detection model

The goal here is to develop the killer whale detection model trained on passive acoustic data from SE Alaska and, perhaps, augmented with data from other sources.

1. Develop model to recognize kw calls
2. Use model developed in 1. to generate labeled kw samples
(1 and 2 are likely to be iterative
Research potential sources of humpback whale sounds to improve model discrimination between humpback and kw
Develop test to distinguish between kw and humpback

Build model to distinguish between pods in SE Alaska sourced data

The goal is to build and develop a model that can identify the pod from which the killer whale called originated from a given sample call.

- Develop test and training set from the SE Alaska call catalogue
- Develop model
- Integrate model into a command line tool that can be used to label the pod of a set of killer whale calls.

Average predictions from the 4 ML models to find start and end times of orca calls

Currently, there are 4 models uploaded to help in the prediction of orca calls. Some of these are checkpoint models which were saved while training during the GSoC period.

The 4 models offer different accuracy or number of predicted 1-second calls for the same unsampled acoustic data. Unfortunately, at this point, there is no one model for all.

The idea here is to average the predictions from the base models so that we can be more accurate about the predicted orca calls and their duration of occurrence in the acoustic/input data.

Since the autonomous recordings or the input data is divided into 1-second chunks, this makes the whole process easier. For eg., if 3 of the 4 base models predict that there is an orca call at the 37th second of the input data, we consider that there is a high probability of an orca call at that time. To generalize this observation, we consider with high confidence the presence of an orca call only if two or more of the base models predict it.

Tossing all the predictions this way to a .csv file seems like a good idea at this point in time. In addition, there can be 2 sections in the csv file

for predictions with high probability
predictions where further human assistance/intervention is required.

Evaluate model designs

The primary job of this tool is to detect orca in passive acoustic datasets comprised of a pipeline consisting of stages to: 1) preprocess and standardize input data, 2) apply a model to identify the presence of orca within the dataset, 3) apply additional models to extract additional information about the orca call including signal type and pod, and 4) create a summary of the results of the pipeline indicating presence of orca along with time and any additional information.

The issue here is to develop and evaluate models for stage 2 of the pipeline. Different model designs will be described and applied on the dataset developed in #1.

Develop summarizer

The primary job of this tool is to detect orca in passive acoustic datasets comprised of a pipeline consisting of stages to: 1) preprocess and standardize input data, 2) apply a model to identify the presence of orca within the dataset, 3) apply additional models to extract additional information about the orca call including signal type and pod, and 4) create a summary of the results of the pipeline indicating presence of orca along with time and any additional information.

The issue here is to develop a tool that summarizes what was learned in #3 to create stage 4 of the pipeline. The output should encapsulate all of the information extracted from the pipeline and output in formats amenable for humans (html?) and apis (json?).

Automate changes in sensitivity and loudness of real-world orca samples for pre-processing stage

This issue brings up the common problem of standardization of the real-world orca samples fed before the pre-processing stage in our model pipeline.
While developing the pipeline, it was found that the longer autonomous recordings required manual changes in sensitivity and loudness. Being a time-consuming task, it needs to be automated.

Develop data pre-processing methods

The goal here is to develop the code to prepare the data for model development. Existing methods have been prototyped before the start of the project and will form the basis of the methods.

Develop methods to standardize data
Develop methods to visualize the data
Convert exploratory code to Python script that can be deployed in backed of web app
Ensure documentation is sufficiently adequate for new users to quickly pickup and apply

MBARI data credit

Please credit MBARI in this source code if their sound data was used in conjunction with Dan Olsen's data for validating model performance. This is per the collaboration agreement you have in place with them. Thank you!

Modification in the padding function in preprocessing.py

if len(data) > input_length:    
        max_offset = len(data) - input_length    
        offset = np.random.randint(max_offset)    
        data = data[offset:(input_length + offset)]

The case when input length is less than the actual data, in that case, wouldn't downsampling be better than chopping the data? Either downsampling the signal through averaging or skipping elements.

let n = length of actual data
let m = expected length
so an average of the signal for every n/m element occurs like below to make the array of size m

def average(arr, l = n/m):    
    end =  l * int(len(arr)/l)    
    return numpy.mean(arr[:end].reshape(-1, l), 1)

And then pad some constant for remaining values if n/m is not an integer ?

Create input standardizer to preprocess audio data

The primary job of this tool is to detect orca in passive acoustic datasets comprised of a pipeline consisting of stages to: 1) preprocess and standardize input data, 2) apply a model to identify the presence of orca within the dataset, 3) apply additional models to extract additional information about the orca call including signal type and pod, and 4) create a summary of the results of the pipeline indicating presence of orca along with time and any additional information.

The issue here is to develop part 1 of the pipeline to develop both the format of the standardized data and the tool to create such a dataset.

Develop detection tool

The primary job of this tool is to detect orca in passive acoustic datasets comprised of a pipeline consisting of stages to: 1) preprocess and standardize input data, 2) apply a model to identify the presence of orca within the dataset, 3) apply additional models to extract additional information about the orca call including signal type and pod, and 4) create a summary of the results of the pipeline indicating presence of orca along with time and any additional information.

The issue here is to apply the model(s) developed in #2 to create stage 2 of the pipeline.

Minor typo in README

Noticed that to be consistent with the surrounding explanation that 67391498.180916010013 should be instead 67391498.180916010313. Then it is consistent with the subsequently stated time of 1:03 am (and 13 seconds).

axiom-data-science / orcacnn Goto Github PK

orcacnn's People

Contributors

Stargazers

Watchers

Forkers

orcacnn's Issues

Why I would choose 1s chunkSize?

Recommend Projects

Recommend Topics

Recommend Org