Bird call identification

See BirdCLEF 2021 - Birdcall Identification Kaggle competition.

Getting started

Install dependencies:

$ pip install -e '.[dev]'

On Linux, you also need to run sudo apt-get install libsndfile1.

Install also TensorFlow if not installed in your environment:

$ pip install -e .[tf]

Pull data files

$ dvc pull

Create down-sampled dataset

Create smaller dataset in data/ folder:

$ python -m src.sample

Download subset of files from Google Storage bucket

$ python -m src.download

Train model

$ python -m src.train --model smoke-test --data-dir data --metadata-csv train_metadata_small.csv

Setup Jupyter

Install kernel:

$ python -m ipykernel install --user --name bird-3.8.1 --display-name "Python (bird-3.8.1)"

Resources

Tips

Downloading data

Create a GCP VM with at least 100 GB disk space. Give write access to Google Storage API.

SSH to the instance using:

$ gcloud compute ssh INSTANCE_NAME

Install pip, tmux and unzip:

$ sudo apt install python3-pip tmux unzip

Install Kaggle CLI:

$ pip3 install kaggle

Make directory .kaggle and transfer kaggle.json from your machine:

$ scp ~/.kaggle/kaggle.json USERNAME@VM_IP:~/.kaggle/kaggle.json

Create new tmux session:

$ tmux new -s kimmo

Download data to folder data/:

$ chmod 600 ~/.kaggle/kaggle.json
$ ./.local/bin/kaggle competitions download birdclef-2021 -p data

Detach from the session with Ctrl+b d and attach with tmux a -t kimmo.

Extract and copy data to Google bucket bird-clef-kimmo:

$ unzip data/birdclef-2021.zip -d data
$ gsutil -m rsync -r data gs://bird-clef-kimmo/data

List and stop instances:

$ gcloud compute instances list
$ gcloud compute instances stop INSTANCE_NAME

Setting up a development instance in Vertex AI

Create a user-managed notebook in Vertex AI Workbench.

SSH to the instance with jupyter username:

$ gcloud compute ssh jupyter@bird-explore

Setup SSH configuration:

$ gcloud compute config-ssh

Switch User in ~/.ssh/config:

# ~/.ssh/config
Host some-host
  User jupyter

Connecting from VS Code using the SSH host should now use jupyter as user, allowing you to use /home/jupyter for files and save remotely.

You can also setup port forwarding to localhost with:

$ gcloud compute ssh jupyter@bird-explore -- -N -L 8080:localhost:8080

Setting up `kaggle`

Working with data

See the Data page.

Download the full 39 GiB dataset:

$ kaggle competitions download birdclef-2021 -p data

Download single file:

$ kaggle competitions download birdclef-2021 -p data/train_short_audio/acafly -f train_short_audio/acafly/XC109605.ogg

List all files in CSV format

$ kaggle competitions files birdclef-2021 --csv

Download train_metadata.csv:

$ kaggle competitions download birdclef-2021 -p data -f train_metadata.csv
$ unzip data/train_metadata.csv.zip -d data

ksaaskil / birdclef-2021 Goto Github PK

birdclef-2021's Introduction

Bird call identification

Getting started

Pull data files

Create down-sampled dataset

Download subset of files from Google Storage bucket

Train model

Setup Jupyter

Resources

Tips

Downloading data

Setting up a development instance in Vertex AI

Setting up `kaggle`

Working with data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

ksaaskil / birdclef-2021 Goto Github PK

birdclef-2021's Introduction

Bird call identification

Getting started

Pull data files

Create down-sampled dataset

Download subset of files from Google Storage bucket

Train model

Setup Jupyter

Resources

Tips

Downloading data

Setting up a development instance in Vertex AI

Setting up kaggle

Working with data

Recommend Projects

Recommend Topics

Recommend Org

Setting up `kaggle`