See BirdCLEF 2021 - Birdcall Identification Kaggle competition.
Install dependencies:
$ pip install -e '.[dev]'
On Linux, you also need to run sudo apt-get install libsndfile1
.
Install also TensorFlow if not installed in your environment:
$ pip install -e .[tf]
$ dvc pull
Create smaller dataset in data/
folder:
$ python -m src.sample
$ python -m src.download
$ python -m src.train --model smoke-test --data-dir data --metadata-csv train_metadata_small.csv
Install kernel:
$ python -m ipykernel install --user --name bird-3.8.1 --display-name "Python (bird-3.8.1)"
- Overview of BirdCLEF 2021
- Where to start: A collection of resources
- Best working note awards
- EfficientNet explained
- LifeCLEF2022
- freefield1010 dataset
- Birdcall Identification Using CNN and Gradient Boosting Decision Trees with Weak and Noisy Supervision
- Winning solution of BirdCLEF2021
- Bird audio detection challenge
- mixup: Beyond Empirical Risk Minimization
- SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
- TensorFlow IO Audio
- Simple Audio Recognition with TensorFlow
Create a GCP VM with at least 100 GB disk space. Give write access to Google Storage API.
SSH to the instance using:
$ gcloud compute ssh INSTANCE_NAME
Install pip
, tmux
and unzip
:
$ sudo apt install python3-pip tmux unzip
Install Kaggle CLI:
$ pip3 install kaggle
Make directory .kaggle
and transfer kaggle.json
from your machine:
$ scp ~/.kaggle/kaggle.json USERNAME@VM_IP:~/.kaggle/kaggle.json
Create new tmux session:
$ tmux new -s kimmo
Download data to folder data/
:
$ chmod 600 ~/.kaggle/kaggle.json
$ ./.local/bin/kaggle competitions download birdclef-2021 -p data
Detach from the session with Ctrl+b d
and attach with tmux a -t kimmo
.
Extract and copy data to Google bucket bird-clef-kimmo
:
$ unzip data/birdclef-2021.zip -d data
$ gsutil -m rsync -r data gs://bird-clef-kimmo/data
List and stop instances:
$ gcloud compute instances list
$ gcloud compute instances stop INSTANCE_NAME
Create a user-managed notebook in Vertex AI Workbench.
SSH to the instance with jupyter
username:
$ gcloud compute ssh jupyter@bird-explore
Setup SSH configuration:
$ gcloud compute config-ssh
Switch User
in ~/.ssh/config
:
# ~/.ssh/config
Host some-host
User jupyter
Connecting from VS Code using the SSH host should now use jupyter
as user, allowing you to use /home/jupyter
for files and save remotely.
You can also setup port forwarding to localhost
with:
$ gcloud compute ssh jupyter@bird-explore -- -N -L 8080:localhost:8080
Sign in to Kaggle. Follow the instructions to prepare ~/.kaggle/kaggle.json
file.
See the Data page.
Download the full 39 GiB dataset:
$ kaggle competitions download birdclef-2021 -p data
Download single file:
$ kaggle competitions download birdclef-2021 -p data/train_short_audio/acafly -f train_short_audio/acafly/XC109605.ogg
List all files in CSV format
$ kaggle competitions files birdclef-2021 --csv
Download train_metadata.csv
:
$ kaggle competitions download birdclef-2021 -p data -f train_metadata.csv
$ unzip data/train_metadata.csv.zip -d data