Giter Club home page Giter Club logo

lucamarini22 / marine-anomaly-detection Goto Github PK

View Code? Open in Web Editor NEW
3.0 4.0 0.0 37.78 MB

Semantic segmentation of marine anomalies using semi-supervised learning (FixMatch for semantic segmentation) on Sentinel-2 multispectral images.

License: GNU General Public License v3.0

Python 86.71% Jupyter Notebook 12.56% Shell 0.73%
deep-learning deeplearning marine-litter semantic-segmentation semi-supervised-learning weakly-supervised-learning fixmatchseg marida earth-observation sentinel-2

marine-anomaly-detection's Introduction

Report

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage

About the project

The marine-anomaly-detection project provides code to apply the state of the art of semi-supervised learning techniques to marine anomaly detection semantic segmentation problems on satellite imagery of marine regions. The considered anomalies are marine-litter (marine debris), ships, clouds, and algae/organic materials.

The code builds on and extends the following two repositories:

  • FixMatch-pytorch implementation based on PyTorch. Compared to the original repository, this repository adapts FixMatch to be used for semantic segmentation tasks and to work with multispectral images.
  • marine-debris.github.io, which provides the code to work with the MARIDA dataset.

Getting Started

Prerequisites

It is recommended to use conda to set-up the environment. conda will take care of all requirements for you. For a detailed list of required packages, please refer to the conda environment file.

Installation

  1. Get micromamba, or Miniconda, or similar. Micromamba is preferred to Miniconda for its greater speed in creating the virtual environment.
  2. Clone the repo.
    git clone https://github.com/lucamarini22/marine-anomaly-detection.git
  3. Setup and activate the environment. This will create a conda environment called marine-anomaly-detection.
    micromamba env create -f environment.yml
    micromamba activate marine-anomaly-detection
  4. [Optional] Install the local package.
    pip install -e .
    

Set up dataset

To launch the training on MARIDA, it is necessary to download the dataset. The dataset can be downloaded here and has the following structure:

  • patches: folder containing the patches (multispectral images).
  • splits: folder containing split files of the training, validation, and test sets.

The --patches_path and --splits_path arguments in marineanomalydetection/parse_args_train.py file shall be respectively point to the patches and splits folders.

Usage

Training

  1. Create a Weight and Biases account to keep track of the runs.
  2. Set the values of the hyperparameters in this config.yaml file.
  3. Enter the main folder.
    cd /marine-anomaly-detection/
  4. Create a Sweep to keep track of your training runs.
    wandb sweep --project <project-name> <config-file.yaml>
    
  5. Specify all the values of the arguments in marineanomalydetection/parse_args_train.py you did not specify in <config-file.yaml>.
  6. Start an agent and execute $NUM training runs.
    wandb agent --count $NUM <your-entity/sweep-demo-cli/sweepID>
    

Evaluation

  1. Evaluate a model.
    python evaluation.py --model_path=<path_to_model>
    
  2. Visualize the predictions of the last evaluated model by running the cells of the notebook Visualize Predictions.ipynb. Specify the variable tile_name to see the predictions for the patches of the specified tile.

marine-anomaly-detection's People

Contributors

lucamarini22 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

marine-anomaly-detection's Issues

One thing to think about in general maybe if you want your code at all to be usable with other datasets. Because it seems to me you are writing it in a way that will make it extremely difficult to ever use any other dataset? ๐Ÿค”

          One thing to think about in general maybe if you want your code at all to be usable with other datasets. Because it seems to me you are writing it in a way that will make it extremely difficult to ever use any other dataset? ๐Ÿค”

Originally posted by @gomezzz in #23 (comment)

I would suggest you start using cfg files and just have a debug.cfg etc. Right now it seems you are changing default values in your code all the time. This is very dangerous as you may introduce errors or forget to change things back. What I would suggest is:

          I would suggest you start using cfg files and just have a debug.cfg etc. Right now it seems you are changing default values in your code all the time. This is very dangerous as you may introduce errors or forget to change things back. What I would suggest is:
  1. Have toml cfg files. One is default which you never touch ideally and you can write a small test that checks that the code still works with it (what I do for this is create a mock dataset of a few images to train for one iteration just to sanity check I didn't break my training pipeline. Aside from that have a "debug.toml." or whatever you want for testing. not all have to be git-tracked either.
  2. when you run a training run, with your outputs , logs etc copy the cfg file then you can always replicate what you did
  3. use cfg to define as many of your hard-coded strings and constants as possible. (I see some hard-coded paths everywhere)

Originally posted by @gomezzz in #23 (comment)

Some comments here and there. In general, I would try to break up your file structure a bit. If I read your filenames it is pretty hard to understand how things interact and in most files you have a lot of static functions, so it seems like a file is not something that combines things that have one clear purpose but just a bunch of thrown together functions. Would try to divide by purpose / activity into subfolders like : imageprocessing, io, datastructure or something like that with dedicated files for specific things

          Some comments here and there. In general, I would try to break up your file structure a bit. If I read your filenames it is pretty hard to understand how things interact and in most files you have a lot of static functions, so it seems like a file is not something that combines things that have one clear purpose but just a bunch of thrown together functions. Would try to divide by purpose / activity into subfolders like : imageprocessing, io, datastructure or something like that with dedicated files for specific things

A few things I point out depend a bit on how much you want this to be code for one project that should be comprehensible, but doesn't have to be reused or built-on later. If the latter then you may want to consider using:

Hope it can help โœŒ๏ธ

Originally posted by @gomezzz in #3 (review)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.