mapbox / robosat Goto Github PK

Semantic segmentation on aerial and satellite imagery. Extracts features such as: buildings, parking lots, roads, water, clouds

License: MIT License

Python 97.55% HTML 1.77% Shell 0.04% Makefile 0.65%

satellite-imagery aerial-imagery machine-learning segmentation openstreetmap

robosat's People

Contributors

Stargazers

Watchers

Forkers

gurpreetshanky geolibrerian sharadshingade cuulee shepherdmeng neo4reo jizhihang nojuman pankeshgupta baifengbai ml-lab 19ai lp249839965 huyhoang17 tonyle9 hoestag manaswinidas barrycug cartosquare hogwartsrico lytk01 apburnes hkcaesar lulzzz fxp rvaughan tahamsaglam jithinraj tchen0123 jonnydubowsky lancerliusong hehuanshu96 cxz dgreyling thearchiver gninnur aipacino mrwangxc wenzhu888 mehrdad-shokri happog yannnnnnnnnnnn mediaeater pingyangtiaer lotus205 spatialsmart ethan-nelson chongyang915 hymilex zzm422 dorbodwolf vvvictorlee highwayns youhayouhashine zijin2 smilejx meywether gisdeveloper2017 jamiewanderi rumicuna tayalkshitij friedrichknuth hy9be getfox dadream jjmata yangtian62 freshbo anke522 dragonemperorg jesse-japps rayoly annewallace marsbroshok thinkacting dean12 mapconcierge geoholix valeriaepifanceva mandelag gangothri1329 geogeek1911 oxygenshaka friendshipity liweijia seongl fesowola amandasaurus 123helloworld123 wilixx balrajashwath xiongyan vivzqs stark525 rishi8313 xwyangjshb sharadgupta27 amin-tgz element84 myjere

robosat's Issues

Add explanatory illustrations for pipeline to readme

At the moment our readme is quite dull with tools explained in plain text. We need to add illustrations to visually explain the pipeline to users on first sight. Here are some scribbles:

Generalize tile resolution in tools

We work with 512x512 pixel image tiles at the moment.

There is no guarantee this is the user's image resolution, though.
At the same time no tool is bound to this resolution and should work e.g. with 256x256 image tiles, too.

Tasks:

Generalize all tools and functions to accept a tile resolution argument the user can pass in

Remove RandomSubsetSampler

At the moment we have a RandomSubsetSampler which can provide a random subset of the dataset for training and validation. In addition it is currently integrated into the weights and stats tool, and values are in the config.

We should remove this sampler and let users manage their datasets on their own. I see no longer the need for us to provide a random subset sampler for on-the-fly sampling.

Tasks

Remove RandomSubsetSampler
Remove integration into tools
Remove config values for sampler

Refactor all path handling to use pathlib

Python3's pathlib abstracts path handling and provides functionality which is otherwise not present or scattered across several modules. In addition it overloads operators for easier path joining.

from pathlib import Path

https://docs.python.org/3/library/pathlib.html

We should refactor all path handling in all our tools to use pathlib.

Tasks:

Use pathlib for everything path related

Save all-background mask in `rs rasterize` for hard-negative mining

At the moment rs rasterize generates masks which always have a feature in them.

When we initially train on these masks and their corresponding images, we don't have background-only images in the training set. The images come with background pixels near the features, but the model might still pick up false positives.

Once we have a model trained on this initial dataset we use it to predict on tiles we know don't have a feature in their image and use false positives to feed them back into the dataset (it's important to put them into all splits, not just the training dataset).

This is called hard-negative mining where we "mine" for model mistakes a couple of times, put false positives back into the dataset, and then re-train, and repeat until we have a solid dataset.

Another option is to start with random negatives which is easier to do but the dataset then can get pretty big and often contains images which do not really help the model to train.

In both cases we need to background images. And we need to add them back into the dataset with a corresponding all-background mask. We don't provide such a mask at the moment.

Task

write out a all-background mask for users to do hard negative mining
document hard-negative mining process and how rs compare and rs subset helps

Workaround:

import numpy as np
from PIL import Image
from robosat.colors import make_palette

bg = np.zeros(shape=(512,512), dtype=np.uint8)                                                         
img = Image.fromarray(bg, mode='P')
img.putpalette(make_palette('denim', 'orange'))                                                      
img.save('bg.png', optimize=True)

Provide nvidia-docker Dockerfile for easy GPU environment setup

We should provide a Dockerfile with the latest NVIDIA GPU drivers, the latest CUDA and cuDNN.

These docker containers can then be started via the nvidia-docker plugin to abstract away driver differences on the host which allows users to more easily getting started without manually having to go through the painful NVIDIA driver and package situation.

The goal here is to provide a self-contained Docker image for users ready to run the toolchain.

Tasks:

Write Dockerfile for nvidia-docker plugin
Install NVIDIA drivers, CUDA, cuDNN
Install RoboSat deps. and make binaries docker run-able
Test
Write docs
Set up automated Docker Hub builds

Benchmark using pinned CUDA memory in data loaders

At the moment the data loaders load up images from the dataset, do pre-processing (like normalization), and then convert the images into tensors. Then we copy the data from CPU memory to GPU memory. This can be made more efficient by putting data into page-locked memory and using DMA to the data onto the GPU in async. fashion.

Look into functionality for pinning memory and async and non-blocking data transfers:

Note: the last time we used this we ran into some PyTorch-internal deadlocks. We need to carefully evaluate this, benchmark it, and figure out if it makes sense to go this route.

Tasks:

Check out docs for cuda semantics
Change memory copying behavior
Benchmark and test for both training as well as prediction

Clarify that the model toml file is a template

For a beginner, using the provided config parameters can give unexpected results. This happened to me unaware that some of these settings should be edited. Below are some of my changes.


--- a/config/model-unet.toml
+++ b/config/model-unet.toml
@@ -15,10 +15,10 @@
   batch_size = 2
 
   # Image side size in pixels.
-  image_size = 512
+  image_size = 256
 
   # Directory where to save checkpoints to during training.
-  checkpoint = '/tmp/pth/'
+  checkpoint = '/data/robosat/tmp/pth/'

Propose we make it explicit that these are templates that should edited before running a training.

Mirror tiles at borders if adjacent tiles are missing

At the moment we predict the tile segmentation probabilities by adding a border to the tile. The idea is to do prediction on the larger images to get masks and then crop out the original mask.

This border is made up of (e.g. 32) pixels from the eight adjacent tiles:

x x x
x o x
x x x

Predicting on tile o means we add a small border band from all x tiles.

There are two cases when adjacent tiles may not be present:

hard negative mining on a randomly sampled set of tiles
predicting at the border of the dataset (e.g. predicting on multiple tifs)

When there are adjacent tiles missing we currently have a black border. This can lead to false predictions. We should instead mirror the image at the border when there are no adjacent tiles. This will reduce or eliminate the tile border problems when there are tiles missing.

Task

Mirror image at border when adjacent tile is missing; here

Refactor top level robosat module

We still have a lot of code in the top level module from a time before we had sub-modules:

https://github.com/mapbox/robosat/tree/52299b855b77225b795b015944e31cbddb9ce44b/robosat

We should refactor the code here, e.g. putting the modelling code into a robosat.model sub-module.

Tasks:

Refactor top-level robosat directory into sub-modules

Implement test-time augmentations for `rs predict`

We should implement optional support for test-time augmentation in rs predict.

Here is how it works: when predicting for a tile we will not only predict on the tile as is but in addition predict e.g. on the rotated and flipped versions of it. Then we would un-do the rotation or flipping on the masks, and merge these multiple predictions into one output.

Users can already do this by duplicating the slippy map directory with tiles to predict on, rotating or flipping them per slippy map directory. Then users need to run rs predict on all slippy map directories. And finally undo the transformations on the probability masks before using rs masks's support for model ensembles to get masks.

In contrast implementing test-time augmentation in rs predict needs to transform each tile on the fly (with our transformations) and then undo the transformations on the fly, too.

Tasks

Implement optional switch in rs predict for test-time augmentations
Predict on tile and transformed tile in rs predict
Undo transformations and merge predictions

Implement RetinaNet for Object Detection

I see no reason why we can't implement object detection into robosat for specific use-cases.

The pre-processing and post-processing needs to be slightly adapted to work with bounding boxes but otherwise we can re-use probably 90% of what's already there.

This ticket tracks the task of implementing RetinaNet as an object detection architecture:

https://arxiv.org/abs/1612.03144 - Feature Pyramid Networks for Object Detection
https://arxiv.org/abs/1708.02002 - Focal Loss for Dense Object Detection

RetinaNet because it is an state of the art single-shot object detection architecture following our 80/20 philosophy where we favor simplicity and maintainability, and focus on the 20% of the causes responsible for 80% of the effects. It's simple, elegant and on par with the complex Faster-RCNN wrt. accuracy and runtime.

Here are the three basic ideas; please read the papers for in-depth details:

Use a feature pyramid network (FPN) as a backbone. FPNs augment backbone's like ResNet adding top-down and lateral connections (a bit similar to what the u-net is doing) to handle features at multiple scales.
On top of a FPN build two heads: one for object classification and one for bounding box regression. Have in the order of ~100k bounding boxes.
Use focal loss because the ratio between positive bounding boxes and negative bounding boxes is very skewed. Focal loss allows us to adapt the standard cross entropy loss reducing the loss for easy samples (based on confidence).

Focal Loss

Feature Pyramid Network (FPN)

RetinaNet

Tasks

Automatically refine generated training dataset masks

At the moment we generate the segmentation masks based on OpenStreetMap geometries in rs rasterize. There is no standard of how fine or coarse geometries are mapped in OpenStreetMap. Sometimes we get fine-detailed masks, sometimes they can be very coarse.

See the following for quite a good mask:

Image	Mask

We should check if the cv2.floodFill algorithm can help us automatically refining the masks.

It works as follows: start out with a seed pixel in the image and from there grow a region as long as the neighboring pixels are "similar" by color. We probably need to experiment with different color spaces, e.g. converting RGB into HSV and then maybe only using the H channel? The problem I'm seeing here is huge color differences: think cars of different colors, lane marking, parking lot concrete. Needs experimentation.

Tasks:

Look into the flood fill algorithm
Experiment to see if it can help refining the training dataset masks

Note: this does not depend on parking lots. The same applies e.g. for buildings, roads, etc.

Calculate shape area and add as property to extracted GeoJSON features

In order for users to prioritize the extracted shapes, filter out tiny or huge shapes, and to calculate statistics for the extracted shapes we should calculate the area in m^2 and add it as a property for the extracted GeoJSON feature. This needs to happen in the rs merge tool - the last step in the pipeline.

Tasks:

calculate feature area in rs merge; add property

Note: make sure we calculate the area in an equal-area projection. See for example how we do this in the iou calculation for shapes currently in the rs merge tool.

Implement optional random feature sampling for `rs extract`

For features like buildings we want to sample OpenStreetMap when extracting geometries in rs extract.

The osmium handlers in robosat.osm should take a sampler and then for every OpenStreetMap entity call back ask the sampler if they should handle this entity or not.

For the sampler we have a few options:

let user pass a number n of samples (e.g. 20k); we take the first n and after that just drop features. Problem: we don't randomly sample from all geographical areas; not a good idea
let the user pass a fraction f of samples (e.g. 0.1); in the osm call backs we take a random number r in [0, 1] and keep the sample if the number if r < f. Problem: users want a fixed amount of samples (e.g. 20k) but a fraction will change depending on how many features there are in osm. For example with parking lots a fraction of 0.1 is maybe a few thousands, with buildings it's millions.
do two passes over the data; in the first pass count how many features there are in osm, then come up with a fraction to keep; then in the second pass we use approach 2. Problem: needs two passes over the data, and two separate handlers for one feature.
use an online algorithm for random sampling: reservoir sampling. It's an algorithm for randomly sampling k items out of a stream of unknown size. This is a good read.

Tasks:

Implement a ReservoirSampler class; it takes a size n of max. number of items to randomly sample from a stream of unknown size.
Let our osmium handlers take a ReservoirSampler; in the osm entity call backs they push features into the reservoir. And in the save function they save features from the reservoir. The reservoir is responsible for keeping or discarding features doing the sampling.
Add an optional argument to the rs extract tool for users to set the sample size; pass this argument to the sampler.

Note: now that we have the rs dedupe tool deduplicating detections against OpenStreetMap we need to think about how to design the interface here. The dedupe tool currently ready in the OpenStreetMap features created in the extract tool. If we randomly sample features in extract we can no longer use it for deduplication.

Unable to load model weights: `rs train` should always use DataParallel

The DataParallel wrapper slightly changes the model's internal layer names. In rs predict as well as rs serve we always wrap the model in a DataParallel wrapper, both for CUDA as well as for prediction on CPUs.

In contrast rs train currently only uses the DataParallel wrapper when using CUDA. This can lead to the following problem: when training on CPUs, then saving the checkpoints without the DataParallel wrapper, and then predicting (with CUDA or CPUs) loading the model fails with cryptic error messages since the model's layer names now don't match up.

Task

Always wrap model in DataParallel in rs train
Test training on CPUs, and then predicting on both CUDA as well as CPUs
Test training on GPUs, and then predicting on both CUDA as well as CPUs

Provide abstract base classes for handlers

We have two use-cases for handlers:

rs extract where handlers have to walk over OpenStreetMap gathering polygons
rs features where handlers have to do post-processing on segmentation masks

We should provide abstract base classes for both use-cases making the interface explicit.

Docs: https://docs.python.org/3/library/abc.html#abc.abstractmethod

Tasks

rs extract - write handler abstract base class
rs features - write handler abstract base class

Refactor robosat to be a proper library with utility tools on top of it

At the moment we almost have a clean split in place: we have command line utilities in robosat.tools and everything else split off of it. What we should do is making sure all functionality is in packages and not in robosat.tools: all the command line tools should do is parse command line arguments and then delegate to robosat library functions.

Here are an examples where this is not the case currently:

The convert tool has the functionality for splitting off masks
And the masks tool has the functionality to do the soft-voting

These features should be encapsulated in robosat packages and then only getting called in the tools.

Why are we doing this? We want to create a Python robosat package folks can install and re-use. Think:

pip install robosat

from robosat.ensemble import softvote
....

Our tools should have a dependency on this robosat package; but not the other way around.

Errors in `rs extract` when extracting .osm.pbf

1.Download Leipzig.osm.pbf from https://download.bbbike.org/osm/bbbike/Leipzig/
2.Run ./rs extract --type building Leipzig.osm.pbf /data/extracted/
3.Getting error:

Traceback (most recent call last):
File "shapely/speedups/_speedups.pyx", line 234, in shapely.speedups._speedups.geos_linearring_from_py
AttributeError: 'list' object has no attribute 'array_interface'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/unexpected/robosat/robosat/tools/main.py", line 55, in
args.func(args)
File "/home/unexpected/robosat/robosat/tools/extract.py", line 27, in main
handler.apply_file(filename=args.map, locations=True)
File "/home/unexpected/robosat/robosat/osm/building.py", line 32, in way
shape = shapely.geometry.shape(geometry)
File "/home/unexpected/.local/lib/python3.6/site-packages/shapely/geometry/geo.py", line 35, in shape
return Polygon(ob["coordinates"][0], ob["coordinates"][1:])
File "/home/unexpected/.local/lib/python3.6/site-packages/shapely/geometry/polygon.py", line 240, in init
ret = geos_polygon_from_py(shell, holes)
File "/home/unexpected/.local/lib/python3.6/site-packages/shapely/geometry/polygon.py", line 494, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely/speedups/_speedups.pyx", line 321, in shapely.speedups._speedups.geos_linearring_from_py
ValueError: A LinearRing must have at least 3 coordinate tuples

Remove support for resuming from a checkpoint

At the moment we a resume feature in place for training where users can resume training from a checkpoint.

robosat/robosat/tools/train.py

Line 43 in 2067cb7

 parser.add_argument("--resume", type=str, required=False, help="checkpoint to resume training from") 

Initially we were thinking about using this feature to speed up subsequent training runs and do fine-tuning.

We actually never used it, it's not properly tested, and we don't have a use-case for it anymore. We should remove the code which comes with this feature and simplify our training script.

Task

remove resume feature from rs train

Include matplotlib.use('Agg') in the utils.py

This line imports on the pyplot

robosat/robosat/utils.py

Line 1 in b2b6df2

import matplotlib.pyplot as plt

Any reason not to include the following lines below?

import matplotlib
matplotlib.use('Agg')

The README mentions this as environment variable, putting it directly in the code avoid setting this up in your environment.

Automatically generate a road training set based on OpenStreetMap

We need a training set with road masks for the areas we have high-resolution satellite imagery for.

Repeat process in #15 with roads. The tag we care about is highway=*. In OpenStreetMap:

For road width in addition to width and lane tags:

Problems and open questions:

Roads in OpenStreetMap are modelled with ways and not polygons. Which means we can only guestimate their width (look into road width and lane tags) to create a mask for them. We can also use the highway classification (think: primary vs service roads).
The road classification does not have to be consistent. A primary road in SF could be tagged secondary in DC. Does it make sense to create our own road classes, e.g. paved / unpaved?
Do we want a binary road / not road binary classifier or do we e.g. want to be able to detect link roads? Can we distinguish link roads from the imagery?

Tasks:

Check which highway tags we want
Check if highway classification is the same across geographies
Check which tags to use for road width estimation
Implement osmium handler for rs extract
Spec out ticket for road post-processing

Warmup epochs with frozen pre-trained encoder weights to initialize decoder

At the moment we are using a pre-trained ResNet as an encoder in our encoder-decoder architecture:

robosat/robosat/unet.py

Lines 94 to 100 in 8b7566e

 self.resnet = resnet50(pretrained=pretrained) 

 self.enc0 = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu, self.resnet.maxpool) 

 self.enc1 = self.resnet.layer1 # 256 

 self.enc2 = self.resnet.layer2 # 512 

 self.enc3 = self.resnet.layer3 # 1024 

 self.enc4 = self.resnet.layer4 # 2048

robosat/robosat/unet.py

Lines 123 to 134 in 8b7566e

 enc0 = self.enc0(x) 

 enc1 = self.enc1(enc0) 

 enc2 = self.enc2(enc1) 

 enc3 = self.enc3(enc2) 

 enc4 = self.enc4(enc3) 

 center = self.center(nn.functional.max_pool2d(enc4, kernel_size=2, stride=2)) 

 dec0 = self.dec0(torch.cat([enc4, center], dim=1)) 

 dec1 = self.dec1(torch.cat([enc3, dec0], dim=1)) 

 dec2 = self.dec2(torch.cat([enc2, dec1], dim=1)) 

 dec3 = self.dec3(torch.cat([enc1, dec2], dim=1))

We are currently training the model as is with all layers unfrozen.

We should investigate if freezing the ResNet encoder and running a few warmup epochs to initialize the decoder layers (then unfreezing parts or all of the ResNet) helps.

Here is how we can freeze the encoder - unfreezing works similarly:

def freeze(self):
    for layer in (self.enc0, self.enc1, self.enc2, self.enc3, self.enc4):
        for param in layer.parameters():
            param.requires_grad = False

Provide ONNX exporter for trained models

At the moment we can train models and get PyTorch-specific .pth checkpoints. We then can load these checkpoints and run prediction. There are two problems with this approach, though:

The checkpoints depend on PyTorch. To predict with these models we need to deploy our Python code with PyTorch and all its dependencies. This can get quite heavy for resource-constrained environments like AWS Lambda. Ideally we want to deploy e.g. a static C++ binary.
The checkpoints in addition depend on the exact Python class for the model. If we change the class even slightly we will not be able to load these models anymore. This also means model checkpoints are bound to specific Python classes: you need both the .pth file and the Python class for the model the checkpoint was trained on for deserialization and prediction. Storing .pth files alone only goes half the way.

Here is what we tried already internally and what we should implement here again:

Provide a model exporter which traces trained PyTorch models and exports them to ONNX protobuf
Provide (or at least document) e.g. Caffe2 shim which loads an ONNX model again and predicts

The ONNX abstraction will allow users to export to a variety of backends e.g. Caffe2 or TensorFlow.

Pass device to torch.load instead of map_location function

See for context: pytorch/pytorch#7178. We currently are passing devices around except for torch.load where we have to pass a map_location function argument.

In pytorch/pytorch#7339 this got changed and we can now pass the device to the torch.load function.

Note: 0.4.0 does not yet include these changes; we need to wait for another feature release.

Tasks:

Pass device to load function
Test and merge in sync with next PyTorch release

Time to download will take a year

I am trying to do building extraction for the state of Utah as the best data is about 14,000 houses so far (Microsoft building footprint data). Almost all of the houses are on the north side of Salt Lake City, so really not a huge total footprint.

Currently it says it's going to take about 333 days to download the corresponding raster data from ./rs download ...

I don't even think I can use a free subscription to download all this data. Is their something else I can do to get the raster data?

BTW - I didn't understand why the cover command needed a zoom, since building data is small, I just set it to 22:
./rs cover ../building-extraction.geojson --zoom 22 ../cover-tiles.csv

I am sure that's not a problem though...

Any ideas? Is there a program to send in a hard drive for the raster data?

Thanks,
Craig,

Integrate automated code formatting

Similar to gofmt, rustfmt, clang-format, etc. in other languages we should integrate a code formatter for Python.

Tasks:

Look into https://github.com/ambv/black
Format repo.
Integrate with Travis to let errors build on format violations

wrong docs: `rs download` requires the URL

The Readme says for rs download:

Downloads aerial or satellite imagery from the Mapbox Maps API (by default) based on a list of tiles.

But the command requires the URL to be entered, and there is no default as far as I can see

Validation sample folder documentation

rs train is missing documentation about validation samples. In particular that it needs folders similar to the ones in training folder:

%base%/validation/images/
%base%/validation/labels/

Without these folders (and actual data in them) training crashes after 1st epoch with not so obvious message:

Epoch: 1/10
Train: 100%|#################################| 10/10 [07:58<00:00, 47.82s/batch]
Train loss: 0.3844, mean IoU: 0.2952
Validate: 0batch [00:00, ?batch/s]
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/robosat/tools/__main__.py", line 42, in <module>
    args.func(args)
  File "/home/robosat/tools/train.py", line 105, in main
    val_hist = validate(val_loader, num_classes, device, net, criterion)
  File "/home/robosat/tools/train.py", line 186, in validate
    return {'loss': running_loss / num_samples, 'iou': iou.get()}
ZeroDivisionError: division by zero

P.S. Clear error message would also help.

Support batch-rasterization in `rs rasterize`

When rs rasterize runs, the memory usage of the process grows and grows, and on my machine is eventually killed by the linux OOM killer. This is the command I use, which tries to generate 28k mask tiles:

./rs rasterize --dataset ./data/config.toml --zoom 12 ./data/ie-buildings.geojson ./data/bld-cover.csv ./data

Handle instance segmentation by adding second output channel for touching borders

At the moment we don't do instance segmentation. This allows our model to be very simple while still achieving amazing results for our current use-cases. Where it breaks down, though, is use cases like extracting buildings in very densely populated areas.

See for example a quick prototype for Tanzania where the segmentation mask alone can not distinguish between touching buildings.

And while proper instance segmentation models are much more complicated, there is one trick we can pull off based on what the folks in https://arxiv.org/abs/1806.00844 propose.

Add a second channel to the output. The first channel will be the segmentation mask as it is right now. The second channel will represent touching features - and only the border between features.
Train with ground truth segmentation masks and compute and rasterize borders where features touch.
After prediction feed results through the watershed transform to divide touching features into multiple features.

As a result we will get instance segmentation and can distinguish between buildings in the use-case above.

Replace U-Net encoder with pre-trained ResNet encoder

At the moment our segmentation model trains a U-Net'ish architecture from scratch. And even though the results are pretty good already both training and prediction is quite slow.

We should use a pre-trained ResNet for the encoder part and only learn the decoder from scratch.

This will be also a step towards a combined segmentation and object detection model (#12) where we then can implement a feature pyramid network on top of the ResNet encoder and add two heads: one for segmentation, one for bounding box regression. Instance segmentation should then also be possible similar to the approach in Mask-RCNN.

training is unhappy with uneven number of inputs and target

 % ./rs train --model ./config/model-unet.toml --dataset ./config/dataset-building.toml
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/connor/Documents/Regia/building-ccn/robosat-master/robosat/tools/__main__.py", line 57, in <module>
    args.func(args)
  File "/home/connor/Documents/Regia/building-ccn/robosat-master/robosat/tools/train.py", line 74, in main
    train_loader, val_loader = get_dataset_loaders(model, dataset)
  File "/home/connor/Documents/Regia/building-ccn/robosat-master/robosat/tools/train.py", line 199, in get_dataset_loaders
    [os.path.join(path, "training", "images")], os.path.join(path, "training", "labels"), transform
  File "/home/connor/Documents/Regia/building-ccn/robosat-master/robosat/datasets.py", line 58, in __init__
    assert len(self.target) == len(self.inputs[0]), "same number of tiles in inputs and target"
AssertionError: same number of tiles in inputs and target

I followed the guide to the letter. I ensure 19 zoom throughout. I create the dataset with 80% training and 20% validation.

Here is how I subset both images and rasters (masks):
./rs subset ../download-utah ../training.csv ../dataset/training/images
./rs subset ../raster-images ../training.csv ../dataset/training/labels

Do you have any suggestions on what I should tweak?

Thanks,
Craig.

Allow arbitrary number of input channels in ResNet encoder (not only RGB)

With #46 we are changing our model architecture from training the encoder and decoder from scratch to using a pre-trained ResNet for the encoder. The pre-trained ResNet uses three channels (RGB) for the input layer through.

We need to be able to add arbitrary channels, say, RGB + water mask + elevation + lidar. To to this we need to construct a wrapper module architecture extending the ResNet architecture, copying weights over, and initializing the new channels with zero. In addition the channel-wise mean and std dev needs to be adapted.

Tasks

Figure out how to extend the ResNet input channels
Figure out how to copy over the pre-trained ResNet parts
Let users construct a model with arbitrary channels
Adapt mean and std dev

`rs rasterize` requires an undocumented `--dataset` option

rs rasterize won't work unless you pass in a --dataset option, this isn't documented, and the file format isn't documented. For the record here's a file I've cobbled together, it makes the command not complain:

[common]
classes = ["notbuilding", "building"]
colors = ["dark", "white"]

Simplify training: use Adam and remove the need for manual tuning

At the moment we are using stochastic gradient descent and a multi-step weight decay policy.

In this setup the user has to set

the initial sgd learning rate
the sgd momentum to use
the weight decay milestones
the weight decay factor

robosat/robosat/tools/train.py

Lines 83 to 85 in 2067cb7

 optimizer = SGD(net.parameters(), lr=model["opt"]["lr"], momentum=model["opt"]["momentum"]) 

 scheduler = MultiStepLR(optimizer, milestones=model["opt"]["milestones"], gamma=model["opt"]["gamma"])

And while this allows for great flexibility and control over details it might be to complicated for our users. We should look into replacing our current setup e.g. with the Adam optimizer only setting the initial learning rate and the weight decay.

We can then set these two values to reasonable defaults and users can get started without thinking too much about parameters and without having to run multiple experiments just to get basic parameters figured out.

Tasks

Implement Adam optimizer with learning rate and weight decay
Benchmark and check results; if it looks reasonable go for it
Remove sgd parameters from config; use learning rate and weight decay only

Generalize post-processing handlers across zoom levels

At the moment we have a parking lot handler tuned with thresholds specifically for zoom level 18:

robosat/robosat/features/parking.py

Lines 11 to 103 in 4cc4091

 class ParkingHandler: 

 kernel_size_denoise = 20 

 kernel_size_grow = 20 

 simplify_threshold = 0.01 

 def __init__(self): 

 self.features = [] 

 def apply(self, tile, mask): 

 if tile.z != 18: 

 raise NotImplementedError('Parking lot post-processing thresholds are tuned for z18') 

 # The post-processing pipeline removes noise and fills in smaller holes. We then 

 # extract contours, simplify them and transform tile pixels into coordinates. 

 denoised = denoise(mask, self.kernel_size_denoise) 

 grown = grow(denoised, self.kernel_size_grow) 

 # Contours have a hierarchy: for example an outer ring, and an inner ring for a polygon with a hole. 

 # 

 # The ith hierarchy entry is a tuple with (next, prev, fst child, parent) for the ith polygon with: 

 # - next is the index into the polygons for the next polygon on the same hierarchy level 

 # - prev is the index into the polygons for the previous polygon on the same hierarchy level 

 # - fst child is the index into the polygons for the ith polygon's first child polygon 

 # - parent is the index into the polygons for the ith polygon's single parent polygon 

 # 

 # In case of non-existend indices their index value is -1. 

 multipolygons, hierarchy = contours(grown) 

 # In the following we re-construct the hierarchy walking from polygons up to the top-most polygon. 

 # We then crete a GeoJSON polygon with a single outer ring and potentially multiple inner rings. 

 # 

 # Note: we currently do not handle multipolygons which are nested even deeper. 

 # This seems to be a bug in the OpenCV Python bindings; the C++ interface 

 # returns a vector<vec4> but here it's always wrapped in an extra list. 

 assert len(hierarchy) == 1, 'always single hierarchy for all polygons in multipolygon' 

 hierarchy = hierarchy[0] 

 assert len(multipolygons) == len(hierarchy), 'polygons and hierarchy in sync' 

 polygons = [simplify(polygon, self.simplify_threshold) for polygon in multipolygons] 

 # Todo: generalize and move to features.core 

 # All child ids in hierarchy tree, keyed by root id. 

 features = collections.defaultdict(set) 

 for i, (polygon, node) in enumerate(zip(polygons, hierarchy)): 

 if len(polygon) < 3: 

 print('Warning: simplified feature no longer valid polygon, skipping', file=sys.stderr) 

 continue 

 _, _, _, parent_idx = node 

 ancestors = list(parents_in_hierarchy(i, hierarchy)) 

 # Only handles polygons with a nesting of two levels for now => no multipolygons. 

 if len(ancestors) > 1: 

 print('Warning: polygon ring nesting level too deep, skipping', file=sys.stderr) 

 continue 

 # A single mapping: i => {i} implies single free-standing polygon, no inner rings. 

 # Otherwise: i => {i, j, k, l} implies: outer ring i, inner rings j, k, l. 

 root = ancestors[-1] if ancestors else i 

 features[root].add(i) 

 for outer, inner in features.items(): 

 rings = [featurize(tile, polygons[outer], mask.shape[:2])] 

 # In mapping i => {i, ..} i is not a child. 

 children = inner.difference(set([outer])) 

 for child in children: 

 rings.append(featurize(tile, polygons[child], mask.shape[:2])) 

 assert 0 < len(rings), 'at least one outer ring in a polygon' 

 geometry = geojson.Polygon(rings) 

 shape = shapely.geometry.shape(geometry) 

 if shape.is_valid: 

 self.features.append(geojson.Feature(geometry=geometry)) 

 else: 

 print('Warning: extracted feature is not valid, skipping', file=sys.stderr) 

 def save(self, out): 

 collection = geojson.FeatureCollection(self.features) 

 with open(out, 'w') as fp: 

 geojson.dump(collection, fp)

Instead what we should do is:

generalize thresholds and base them on meters
compute meters per pixel based on the zoom level

Here is how to compute meters per pixel on a specific zoom level:

https://wiki.openstreetmap.org/wiki/Zoom_levels

Tasks

implement function to compute meters per pixel on a zoom level
base thresholds on meters

This will allow us to have feature post-processing independent of zoom levels.

`rs download` only downloads first handful of tiles

I am trying to download the tiles for part of Ireland, and rs download only downloads the first ~300 tiles and then stops. When I run it again, it doesn't download anything more. This is the command I used and the output (I just prints the empty progress barm and exits after ~2 sec):

./rs download --rate 1 --ext jpg "https://api.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=[ACCESTOKEN]" ./data/bld-cover.csv ./data/tiles/
0%| | 0/28637 [00:00<?, ?image/s]

Maybe I have hit a limit in the Mapbox API, if so rs download should tell the user that.

Implement additional data augmentations for training

In #19 we implemented data augmentations like random rotations and random flipping. In the context of aerial and satellite imagery we should implement additional augmentations:

Implement random scaling (image with bicubic, mask with nearest).
Implement random color jitter in HSV color space (image only).
Implement motion blur (directional, not just gaussian blur). Mostly for drone imagery use-case.
Implement stitching artifacts. Simulate with shear (offset of part of the image) and blur on only one side.
Implement changes in contrast.

Note: implement contrast jitter as something like add -0.2..0.2 and gamma (x**n) 0.8..1.2 to the luminance channel. That’s important because the camera is doing auto-exposure and you can get very different contrast profiles for under cloud v. under sun, the same feature in and out of a tree’s shadow, etc.

Up- or downsample probabilities between zoom levels

Right now rs masks is able to combine multiple slippy map directories with probabilities. The idea is that we can have multiple models and this feature allows us for model ensembles.

We should add functionality to upsample or downsample slippy map directories with probabilities based on zoom levels.

The use-case is as follows:

we train models to detect roads on zoom level 19, very close up with details
we train models to detect roads on zoom level 16, further away high-level view
we predict and get two slippy maps probs/19/x/y.png, probs/16/x/y.png
we want to combine these predictions into masks on a specific zoom level, say z18
we need to downsample the z19 predictions and upsample the z16 predictions

We can either extend rs masks to upsample or downsample. Or we add a new tool which can transform a slippy map directory on zoom level z0 into a slippy map directory on zoom level z1, with z0 != z1.

Implementation notes:

mercantile has functionality for children and parent tiles
pillow has functionality for up- and downsampling

Implement joint transformations for data augmentation

We need to implement joint transformations for data augmentations modifying both the image as well as the mask at the same time. Existing transformations working on a single input:

http://pytorch.org/docs/master/torchvision/transforms.html

In contrast to these implementations we need to build joint transformations. A joint transformation is a transformation working on image and mask at the same time. The difference to the existing PyTorch transformations is that e.g. when we rotate the image by a random angle we also need to rotate the mask by the exact same angle.

We need at least

random rotations
random horizontal, vertical flipping

Probably later down the line: color jitter, noise, blur, scaling.

Tasks:

Implement joint transformation abstraction
Implement specific augmentation transformations

Ignore .DS_Store from the dataset directory

Running the stack locally on a mac, the dataset directory will have the.DS_Store file which throws an error during rs train.

./rs train --model config/model-unet.toml --dataset config/dataset-building.toml
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/maning/projects/git/robosat/robosat/tools/__main__.py", line 42, in <module>
    args.func(args)
  File "/Users/maning/projects/git/robosat/robosat/tools/train.py", line 85, in main
    train_loader, val_loader = get_dataset_loaders(model, dataset)
  File "/Users/maning/projects/git/robosat/robosat/tools/train.py", line 218, in get_dataset_loaders
    target_transform)
  File "/Users/maning/projects/git/robosat/robosat/datasets.py", line 55, in __init__
    self.inputs = [SlippyMapTiles(inp, fn) for inp, fn in zip(inputs, input_transforms)]
  File "/Users/maning/projects/git/robosat/robosat/datasets.py", line 55, in <listcomp>
    self.inputs = [SlippyMapTiles(inp, fn) for inp, fn in zip(inputs, input_transforms)]
  File "/Users/maning/projects/git/robosat/robosat/datasets.py", line 26, in __init__
    self.tiles = [(tile, path) for tile, path in tiles_from_slippy_map(root)]
  File "/Users/maning/projects/git/robosat/robosat/datasets.py", line 26, in <listcomp>
    self.tiles = [(tile, path) for tile, path in tiles_from_slippy_map(root)]
  File "/Users/maning/projects/git/robosat/robosat/tiles.py", line 77, in tiles_from_slippy_map
    for name in os.listdir(os.path.join(root, z, x)):
NotADirectoryError: [Errno 20] Not a directory: 'dataset/validation/images/21/.DS_Store'

Make sure extracted polygons follow the GeoJSON spec's right-hand rule

Since August 2016 there is https://tools.ietf.org/html/rfc7946 with

Note: the [GJ2008] specification did not discuss linear ring winding
order. For backwards compatibility, parsers SHOULD NOT reject
Polygons that do not follow the right-hand rule.

and normative changes section in https://tools.ietf.org/html/rfc7946#appendix-B.1

o Polygon rings MUST follow the right-hand rule for orientation
(counterclockwise external rings, clockwise internal rings).

At the moment it seems like our extracted polygons do not follow this winding order rule.

Quickfix on the consumer side is: https://github.com/mapbox/geojson-rewind

Task:

make sure rs features respects the right-hand rule when extracting polygon rings

Replace deconvolutional upsampling in decoder with nearest upsampling plus convs

At the moment we upsample with a scaling factor of two in the decoder by using deconvolutions (transposed convolutions). Instead we should use the following simpler approach which should work just as well an not have the segmentation mask checker board problem:

upsample in nearest neighbor mode
add convolutions after upsampling

We could also switch the order first doing convolution on the low-res feature maps in case we need to save some memory. But I don't think that's a constraint we have right now.

Explore combination of cross-entropy and dice coefficient loss

Currently we optimize for the cross-entropy loss function in our segmentation model training.

We should try out a combination of cross-entropy and the dice coefficient, e.g.

loss = cross-entropy - ln(dice)

loss = w1 * cross-entropy - w2 * (1 - dice)  # e.g. with w1=w2=0.5

Tasks:

Implement dice loss
Run out experiments with a combination of cross entropy and dice loss
Use combination of cross entropy and dice loss if it improves results

Fetch neighbouring images when downloading

Buffer from neighbouring images is added to the image being used for training (robosat/tiles.py#buffer_tile_image). Maybe rs download should try fetching images neighbouring the ones in csv file?
For example: if training set is made of three images (marked as T), maybe neighbouring images (marked as N) should also be downloaded:

.NNN
NNTN
NTTN
NNNN

This would decrease (or in most cases totally eradicate) the need to add "no data" or "mirroring" buffers?

Move simplification to `rs merge` pipeline step

Currently in rs feature we extract polygons and simplify them. Then in rs merge we buffer, then union, then unbuffer for merging shapes across tile boundaries. This leads to polygons that are no longer simplified.

We should investigate not simplifying in rs feature. Instead merge un-simplified geometries, then simplify in rs merge at the end of the pipeline.

Tasks:

Remove simplification in rs feature
Implement simplification in rs merge after merging shapes

Implement Feature Pyramid Network for semantic segmentation

Splitting off of #12. Eventually we want to implement an object detection architecture in addition to out current semantic segmentation architecture. The RetinaNet (ticketed in #12) is a perfect fit for our goals. It will be based on top of a ResNet feature extractor and a feature pyramid network.

We can use the ResNet feature extractor and the feature pyramid network already for semantic segmentation. Then later down the line we can extend it adding a bounding box regression head and get a single unified simple architecture for both object detection as well as semantic segmentation.

Resources:

https://arxiv.org/abs/1612.03144 - Feature Pyramid Networks for Object Detection
http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf

Here is the main gist from the second PDF:

The ResNet will give us the leftmost upward path. The downward path and the lateral connections then make up the feature pyramid network. The rightmost part is the semantic segmentation head.

Tasks

Implement the feature pyramid network on top of ResNet
Implement semantic segmentation head on top of the feature pyramid network

Refactor datasets to use slippy map dataset abstractions

At the moment we still have our old dataset classes not yet working with the slippy map directory datsets:

https://github.com/mapbox/robosat/blob/52299b855b77225b795b015944e31cbddb9ce44b/robosat/datasets.py

Tasks:

remove the ImageDirectory dataset as it is unused
implement the BufferedSlippyMapDirectory on top of the SlippyMapDirectoryDataset

Investigate difference in `rs cover` vs `tile-cover` tile list

When generating lists of tile ids covering geometries we are seeing a difference between the tile-cover and the rs cover generated tile ids. For the same geometries:

wc -l /tmp/tile-cover.tiles
116230 /tmp/tile-cover.tiles

wc -l /tmp/rs-cover.tiles 
115248 /tmp/rs-cover.tiles

Our rs cover tool does not include 982 tiles the tile-cover tool gives us. We need to look into this.

Tasks:

Figure out difference between tile-cover and rs cover

	self.resnet = resnet50(pretrained=pretrained)

	self.enc0 = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu, self.resnet.maxpool)
	self.enc1 = self.resnet.layer1 # 256
	self.enc2 = self.resnet.layer2 # 512
	self.enc3 = self.resnet.layer3 # 1024
	self.enc4 = self.resnet.layer4 # 2048

	enc0 = self.enc0(x)
	enc1 = self.enc1(enc0)
	enc2 = self.enc2(enc1)
	enc3 = self.enc3(enc2)
	enc4 = self.enc4(enc3)

	center = self.center(nn.functional.max_pool2d(enc4, kernel_size=2, stride=2))

	dec0 = self.dec0(torch.cat([enc4, center], dim=1))
	dec1 = self.dec1(torch.cat([enc3, dec0], dim=1))
	dec2 = self.dec2(torch.cat([enc2, dec1], dim=1))
	dec3 = self.dec3(torch.cat([enc1, dec2], dim=1))

	optimizer = SGD(net.parameters(), lr=model["opt"]["lr"], momentum=model["opt"]["momentum"])

	scheduler = MultiStepLR(optimizer, milestones=model["opt"]["milestones"], gamma=model["opt"]["gamma"])

	class ParkingHandler:
	kernel_size_denoise = 20
	kernel_size_grow = 20
	simplify_threshold = 0.01

	def __init__(self):
	self.features = []

	def apply(self, tile, mask):
	if tile.z != 18:
	raise NotImplementedError('Parking lot post-processing thresholds are tuned for z18')

	# The post-processing pipeline removes noise and fills in smaller holes. We then
	# extract contours, simplify them and transform tile pixels into coordinates.

	denoised = denoise(mask, self.kernel_size_denoise)
	grown = grow(denoised, self.kernel_size_grow)

	# Contours have a hierarchy: for example an outer ring, and an inner ring for a polygon with a hole.
	#
	# The ith hierarchy entry is a tuple with (next, prev, fst child, parent) for the ith polygon with:
	# - next is the index into the polygons for the next polygon on the same hierarchy level
	# - prev is the index into the polygons for the previous polygon on the same hierarchy level
	# - fst child is the index into the polygons for the ith polygon's first child polygon
	# - parent is the index into the polygons for the ith polygon's single parent polygon
	#
	# In case of non-existend indices their index value is -1.

	multipolygons, hierarchy = contours(grown)

	# In the following we re-construct the hierarchy walking from polygons up to the top-most polygon.
	# We then crete a GeoJSON polygon with a single outer ring and potentially multiple inner rings.
	#
	# Note: we currently do not handle multipolygons which are nested even deeper.

	# This seems to be a bug in the OpenCV Python bindings; the C++ interface
	# returns a vector<vec4> but here it's always wrapped in an extra list.
	assert len(hierarchy) == 1, 'always single hierarchy for all polygons in multipolygon'
	hierarchy = hierarchy[0]

	assert len(multipolygons) == len(hierarchy), 'polygons and hierarchy in sync'

	polygons = [simplify(polygon, self.simplify_threshold) for polygon in multipolygons]

	# Todo: generalize and move to features.core

	# All child ids in hierarchy tree, keyed by root id.
	features = collections.defaultdict(set)

	for i, (polygon, node) in enumerate(zip(polygons, hierarchy)):
	if len(polygon) < 3:
	print('Warning: simplified feature no longer valid polygon, skipping', file=sys.stderr)
	continue

	_, _, _, parent_idx = node

	ancestors = list(parents_in_hierarchy(i, hierarchy))

	# Only handles polygons with a nesting of two levels for now => no multipolygons.
	if len(ancestors) > 1:
	print('Warning: polygon ring nesting level too deep, skipping', file=sys.stderr)
	continue

	# A single mapping: i => {i} implies single free-standing polygon, no inner rings.
	# Otherwise: i => {i, j, k, l} implies: outer ring i, inner rings j, k, l.
	root = ancestors[-1] if ancestors else i

	features[root].add(i)

	for outer, inner in features.items():
	rings = [featurize(tile, polygons[outer], mask.shape[:2])]

	# In mapping i => {i, ..} i is not a child.
	children = inner.difference(set([outer]))

	for child in children:
	rings.append(featurize(tile, polygons[child], mask.shape[:2]))

	assert 0 < len(rings), 'at least one outer ring in a polygon'

	geometry = geojson.Polygon(rings)
	shape = shapely.geometry.shape(geometry)

	if shape.is_valid:
	self.features.append(geojson.Feature(geometry=geometry))
	else:
	print('Warning: extracted feature is not valid, skipping', file=sys.stderr)

	def save(self, out):
	collection = geojson.FeatureCollection(self.features)

	with open(out, 'w') as fp:
	geojson.dump(collection, fp)

mapbox / robosat Goto Github PK

robosat's People

Contributors

Stargazers

Watchers

Forkers

robosat's Issues

Focal Loss

Feature Pyramid Network (FPN)

RetinaNet

Recommend Projects

Recommend Topics

Recommend Org