Giter Club home page Giter Club logo

marcel-simon / arcticseals Goto Github PK

View Code? Open in Web Editor NEW

This project forked from microsoft/arcticseals

1.0 0.0 0.0 36.91 MB

A deep learning project in cooperation with the NOAA Marine Mammal Lab to detect & classify arctic seals in aerial imagery to understand how they’re adapting to a changing world.

License: MIT License

Batchfile 0.01% C# 0.61% Python 13.77% Dockerfile 0.06% Shell 0.02% Jupyter Notebook 81.40% JavaScript 3.71% CSS 0.02% HTML 0.41%

arcticseals's Introduction

Arctic Seals Hackathon Project

This is the workspace for the Microsoft 2018 OneWeek Hackathon project Find Arctic Seals with Deep Learning. Other background materials (presentations, etc.) can be found in our Arctic Seals Hackathon Team.

To get write access to this repo, submit a request here.

Data

The data directory contains the following dataset files from NOAA:

  • train.csv (5,256 records): Hotspot detection data for which we have all corresponding imagery data (see below). Currently all of these hotspots refer to images in dataset ArcticSealsData01.
  • test.csv (1,368 records): Same format and distrbution of train.csv, suitable for cross-validation.

Each record in the CSV files refers to a hotspot that the NOAA thermal detection system picked up and that was classified by a human into either "Animal" (true positive) or "Anomaly" (false positive). Each hotspot is unique (no duplicates). The column schema is as follows:

  • hotspot_id: unique ID
  • timestamp: GMT/UTC timestamp (always corresponds to thermal image timestamp)
  • filt_thermal16: Filename of the 16-bit PNG containing the raw FLIR image data
  • filt_thermal8: Filename of the 8-bit JPG containing the annotated FLIR image data (hotspots circled)
  • filt_color: Filename of the 8-bit JPG containing a color image taken at or near the same time as the thermal image. The timestamp encoded in the filename may be different from the thermal timestamp by up to 60 seconds (but typically less than 1 second).
  • x_pos/y_pos: Location of the hotspot in the thermal image
  • thumb_*: Bounding box of the hotspot in the color image. NOTE: some of these values are negative, as the bounding box is always 512x512 even if the hotspot is at the edge of the image.
  • hotspot_type: "Animal" or "Anomaly"
  • species_id: "Bearded Seal", "Ringed Seal", "UNK Seal", "Polar Bear" or "NA" (for anomalies)

Raw Hotspot Data

In the data directory there is also a raw.csv (15,454 records) containing all hotspot detections from the NOAA 2016 survey flights (includes more seals but also more types of animals, more anomalies, hotspots marked as duplicates, etc.). We do not yet have the imagery corresponding to all of these hotspots, only about 2.5TB out of 19TB.

Imagery

The actual image files are located in Azure storage, grouped into datasets each containing thousands of either color or thermal images. You can get these as .tar archives or .vhdx virtual disks; each contains the same data.

  • ArcticSealsData01_Color (88GB): tar vhdx
  • ArcticSealsData02_Color (89GB): tar vhdx
  • ArcticSealsData03_Color (269GB): tar vhdx
  • ArcticSealsData04_Color (648GB): tar vhdx
  • ArcticSealsData05_Color (627GB): tar vhdx
  • ArcticSealsData06_Color (535GB): tar vhdx
  • ArcticSealsData07_Color (219GB): tar vhdx

The thermal data, since it's relatively small, has been combined into fewer files. Note that there is more thermal data than we have corresponding color data for.

  • ArcticSealsData01_Thermal (1GB): tar vhdx
  • ArcticSealsData02-07_Thermal (31GB): tar vhdx
  • ArcticSealsData08-99_Thermal (41GB): tar vhdx

In Windows, you can easily mount the .vhdx files on your machine by double-clicking them.

The timestamp pattern embedded in the filenames has two possible forms - you may see, for example, either 160408_020848.724 or 20160408020848.724GMT. In all cases you should use the filename-embedded timestamp to sequence/correlate images, not whatever timestamp your file system claims.

We also have the ArcticSealsData01 files as individual files in Azure storage that can be accessed as shown below. However, if you are going to do any bulk operations it's more efficient to download the tar/vhdx files.

Finally, if you want to use Azure Storage Explorer (for example) to access the entire blob container, use this connection string:

BlobEndpoint=https://arcticseals.blob.core.windows.net/;SharedAccessSignature=sv=2017-11-09&ss=b&srt=sco&sp=rl&se=2019-06-13T07:12:17Z&st=2018-06-13T23:12:17Z&spr=https&sig=2v7zAzhq2cw1%2BWseuNAKiTp5Qc4zzBclw3LqdDnANYg%3D

Code

The project is meant to accomodate many different approaches, frameworks, languages, etc. Linux is the primary supported dev environment, though some GUI tools are Windows-only.

Organization

Hackathon members are welcome to add whatever code you like to this repo, but please follow these guidelines:

  • Put your source code in its own directory inside the src directory.
  • Add a README.md file to your code directory explaining what your code does and how it works.
  • If there are dependencies that need to be installed and/or build steps that need to be performed, add any necessary code to the build.bat script to run the relevant package manager commands, compile steps, etc., to ensure your code is fully runnable locally.
    • Alternatively, it is also ok if your code only builds from within an IDE; if so just make a note of that in your README.md.
  • If applicable, add a script that runs your code to the root directory. If it takes command line arguments, please show help text if it is run without arguments.

Notes

Additional notes from 10/16/2018 NOAA sync meeting.

  • Filename timestamps with very small millisecond values are actually for the previous second. Metadata for a given image will have the correct timestamp.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

arcticseals's People

Contributors

pbaer avatar aarayas1 avatar matekd avatar jomalsan avatar marcel-simon avatar cipop avatar agentmorris avatar eraoul avatar yunmeizhu avatar microsoftopensource avatar cosminpa avatar kate-goldenring avatar msftgits avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.