Giter Club home page Giter Club logo

robosat's People

Contributors

bkowshik avatar daniel-j-h avatar devinaconley avatar dragonemperorg avatar hy9be avatar jqtrde avatar manaswinidas avatar maning avatar marsbroshok avatar mnorelli avatar ocourtin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

robosat's Issues

Generalize tile resolution in tools

We work with 512x512 pixel image tiles at the moment.

  • There is no guarantee this is the user's image resolution, though.
  • At the same time no tool is bound to this resolution and should work e.g. with 256x256 image tiles, too.

Tasks:

  • Generalize all tools and functions to accept a tile resolution argument the user can pass in

Remove RandomSubsetSampler

At the moment we have a RandomSubsetSampler which can provide a random subset of the dataset for training and validation. In addition it is currently integrated into the weights and stats tool, and values are in the config.

We should remove this sampler and let users manage their datasets on their own. I see no longer the need for us to provide a random subset sampler for on-the-fly sampling.

Tasks

  • Remove RandomSubsetSampler
  • Remove integration into tools
  • Remove config values for sampler

Save all-background mask in `rs rasterize` for hard-negative mining

At the moment rs rasterize generates masks which always have a feature in them.

When we initially train on these masks and their corresponding images, we don't have background-only images in the training set. The images come with background pixels near the features, but the model might still pick up false positives.

Once we have a model trained on this initial dataset we use it to predict on tiles we know don't have a feature in their image and use false positives to feed them back into the dataset (it's important to put them into all splits, not just the training dataset).

This is called hard-negative mining where we "mine" for model mistakes a couple of times, put false positives back into the dataset, and then re-train, and repeat until we have a solid dataset.

Another option is to start with random negatives which is easier to do but the dataset then can get pretty big and often contains images which do not really help the model to train.

In both cases we need to background images. And we need to add them back into the dataset with a corresponding all-background mask. We don't provide such a mask at the moment.

Task

  • write out a all-background mask for users to do hard negative mining
  • document hard-negative mining process and how rs compare and rs subset helps

Workaround:

import numpy as np
from PIL import Image
from robosat.colors import make_palette

bg = np.zeros(shape=(512,512), dtype=np.uint8)                                                         
img = Image.fromarray(bg, mode='P')
img.putpalette(make_palette('denim', 'orange'))                                                      
img.save('bg.png', optimize=True)

Provide nvidia-docker Dockerfile for easy GPU environment setup

We should provide a Dockerfile with the latest NVIDIA GPU drivers, the latest CUDA and cuDNN.

These docker containers can then be started via the nvidia-docker plugin to abstract away driver differences on the host which allows users to more easily getting started without manually having to go through the painful NVIDIA driver and package situation.

The goal here is to provide a self-contained Docker image for users ready to run the toolchain.

Tasks:

  • Write Dockerfile for nvidia-docker plugin
  • Install NVIDIA drivers, CUDA, cuDNN
  • Install RoboSat deps. and make binaries docker run-able
  • Test
  • Write docs
  • Set up automated Docker Hub builds

Benchmark using pinned CUDA memory in data loaders

At the moment the data loaders load up images from the dataset, do pre-processing (like normalization), and then convert the images into tensors. Then we copy the data from CPU memory to GPU memory. This can be made more efficient by putting data into page-locked memory and using DMA to the data onto the GPU in async. fashion.

Look into functionality for pinning memory and async and non-blocking data transfers:

Note: the last time we used this we ran into some PyTorch-internal deadlocks. We need to carefully evaluate this, benchmark it, and figure out if it makes sense to go this route.

Tasks:

  • Check out docs for cuda semantics
  • Change memory copying behavior
  • Benchmark and test for both training as well as prediction

Clarify that the model toml file is a template

For a beginner, using the provided config parameters can give unexpected results. This happened to me unaware that some of these settings should be edited. Below are some of my changes.


--- a/config/model-unet.toml
+++ b/config/model-unet.toml
@@ -15,10 +15,10 @@
   batch_size = 2
 
   # Image side size in pixels.
-  image_size = 512
+  image_size = 256
 
   # Directory where to save checkpoints to during training.
-  checkpoint = '/tmp/pth/'
+  checkpoint = '/data/robosat/tmp/pth/'

Propose we make it explicit that these are templates that should edited before running a training.

Mirror tiles at borders if adjacent tiles are missing

At the moment we predict the tile segmentation probabilities by adding a border to the tile. The idea is to do prediction on the larger images to get masks and then crop out the original mask.

This border is made up of (e.g. 32) pixels from the eight adjacent tiles:

x x x
x o x
x x x

Predicting on tile o means we add a small border band from all x tiles.

halfdone
composite

There are two cases when adjacent tiles may not be present:

  • hard negative mining on a randomly sampled set of tiles
  • predicting at the border of the dataset (e.g. predicting on multiple tifs)

When there are adjacent tiles missing we currently have a black border. This can lead to false predictions. We should instead mirror the image at the border when there are no adjacent tiles. This will reduce or eliminate the tile border problems when there are tiles missing.

Task

  • Mirror image at border when adjacent tile is missing; here

Implement test-time augmentations for `rs predict`

We should implement optional support for test-time augmentation in rs predict.

Here is how it works: when predicting for a tile we will not only predict on the tile as is but in addition predict e.g. on the rotated and flipped versions of it. Then we would un-do the rotation or flipping on the masks, and merge these multiple predictions into one output.

Users can already do this by duplicating the slippy map directory with tiles to predict on, rotating or flipping them per slippy map directory. Then users need to run rs predict on all slippy map directories. And finally undo the transformations on the probability masks before using rs masks's support for model ensembles to get masks.

In contrast implementing test-time augmentation in rs predict needs to transform each tile on the fly (with our transformations) and then undo the transformations on the fly, too.

Tasks

  • Implement optional switch in rs predict for test-time augmentations
  • Predict on tile and transformed tile in rs predict
  • Undo transformations and merge predictions

Implement RetinaNet for Object Detection

I see no reason why we can't implement object detection into robosat for specific use-cases.

The pre-processing and post-processing needs to be slightly adapted to work with bounding boxes but otherwise we can re-use probably 90% of what's already there.

This ticket tracks the task of implementing RetinaNet as an object detection architecture:

RetinaNet because it is an state of the art single-shot object detection architecture following our 80/20 philosophy where we favor simplicity and maintainability, and focus on the 20% of the causes responsible for 80% of the effects. It's simple, elegant and on par with the complex Faster-RCNN wrt. accuracy and runtime.

Here are the three basic ideas; please read the papers for in-depth details:

  • Use a feature pyramid network (FPN) as a backbone. FPNs augment backbone's like ResNet adding top-down and lateral connections (a bit similar to what the u-net is doing) to handle features at multiple scales.
  • On top of a FPN build two heads: one for object classification and one for bounding box regression. Have in the order of ~100k bounding boxes.
  • Use focal loss because the ratio between positive bounding boxes and negative bounding boxes is very skewed. Focal loss allows us to adapt the standard cross entropy loss reducing the loss for easy samples (based on confidence).

Focal Loss

focal-loss

Feature Pyramid Network (FPN)

fpn

RetinaNet

retina-net

Tasks

  • Read the fpn paper
  • Read the focal loss paper
  • Implement FPN
  • Implement RetinaNet
  • Spec out and handle differences in pre and post-processing

Automatically refine generated training dataset masks

At the moment we generate the segmentation masks based on OpenStreetMap geometries in rs rasterize. There is no standard of how fine or coarse geometries are mapped in OpenStreetMap. Sometimes we get fine-detailed masks, sometimes they can be very coarse.

See the following for quite a good mask:

Image Mask
95309 95309

We should check if the cv2.floodFill algorithm can help us automatically refining the masks.

It works as follows: start out with a seed pixel in the image and from there grow a region as long as the neighboring pixels are "similar" by color. We probably need to experiment with different color spaces, e.g. converting RGB into HSV and then maybe only using the H channel? The problem I'm seeing here is huge color differences: think cars of different colors, lane marking, parking lot concrete. Needs experimentation.

Tasks:

  • Look into the flood fill algorithm
  • Experiment to see if it can help refining the training dataset masks

Note: this does not depend on parking lots. The same applies e.g. for buildings, roads, etc.

Calculate shape area and add as property to extracted GeoJSON features

In order for users to prioritize the extracted shapes, filter out tiny or huge shapes, and to calculate statistics for the extracted shapes we should calculate the area in m^2 and add it as a property for the extracted GeoJSON feature. This needs to happen in the rs merge tool - the last step in the pipeline.

Tasks:

  • calculate feature area in rs merge; add property

Note: make sure we calculate the area in an equal-area projection. See for example how we do this in the iou calculation for shapes currently in the rs merge tool.

Implement optional random feature sampling for `rs extract`

For features like buildings we want to sample OpenStreetMap when extracting geometries in rs extract.

The osmium handlers in robosat.osm should take a sampler and then for every OpenStreetMap entity call back ask the sampler if they should handle this entity or not.

For the sampler we have a few options:

  1. let user pass a number n of samples (e.g. 20k); we take the first n and after that just drop features. Problem: we don't randomly sample from all geographical areas; not a good idea
  2. let the user pass a fraction f of samples (e.g. 0.1); in the osm call backs we take a random number r in [0, 1] and keep the sample if the number if r < f. Problem: users want a fixed amount of samples (e.g. 20k) but a fraction will change depending on how many features there are in osm. For example with parking lots a fraction of 0.1 is maybe a few thousands, with buildings it's millions.
  3. do two passes over the data; in the first pass count how many features there are in osm, then come up with a fraction to keep; then in the second pass we use approach 2. Problem: needs two passes over the data, and two separate handlers for one feature.
  4. use an online algorithm for random sampling: reservoir sampling. It's an algorithm for randomly sampling k items out of a stream of unknown size. This is a good read.

Tasks:

  • Implement a ReservoirSampler class; it takes a size n of max. number of items to randomly sample from a stream of unknown size.
  • Let our osmium handlers take a ReservoirSampler; in the osm entity call backs they push features into the reservoir. And in the save function they save features from the reservoir. The reservoir is responsible for keeping or discarding features doing the sampling.
  • Add an optional argument to the rs extract tool for users to set the sample size; pass this argument to the sampler.

Note: now that we have the rs dedupe tool deduplicating detections against OpenStreetMap we need to think about how to design the interface here. The dedupe tool currently ready in the OpenStreetMap features created in the extract tool. If we randomly sample features in extract we can no longer use it for deduplication.

Unable to load model weights: `rs train` should always use DataParallel

The DataParallel wrapper slightly changes the model's internal layer names. In rs predict as well as rs serve we always wrap the model in a DataParallel wrapper, both for CUDA as well as for prediction on CPUs.

In contrast rs train currently only uses the DataParallel wrapper when using CUDA. This can lead to the following problem: when training on CPUs, then saving the checkpoints without the DataParallel wrapper, and then predicting (with CUDA or CPUs) loading the model fails with cryptic error messages since the model's layer names now don't match up.

Task

  • Always wrap model in DataParallel in rs train
  • Test training on CPUs, and then predicting on both CUDA as well as CPUs
  • Test training on GPUs, and then predicting on both CUDA as well as CPUs

Refactor robosat to be a proper library with utility tools on top of it

At the moment we almost have a clean split in place: we have command line utilities in robosat.tools and everything else split off of it. What we should do is making sure all functionality is in packages and not in robosat.tools: all the command line tools should do is parse command line arguments and then delegate to robosat library functions.

Here are an examples where this is not the case currently:

  • The convert tool has the functionality for splitting off masks
  • And the masks tool has the functionality to do the soft-voting

These features should be encapsulated in robosat packages and then only getting called in the tools.


Why are we doing this? We want to create a Python robosat package folks can install and re-use. Think:

pip install robosat
from robosat.ensemble import softvote
....

Our tools should have a dependency on this robosat package; but not the other way around.

Errors in `rs extract` when extracting .osm.pbf

1.Download Leipzig.osm.pbf from https://download.bbbike.org/osm/bbbike/Leipzig/
2.Run ./rs extract --type building Leipzig.osm.pbf /data/extracted/
3.Getting error:

Traceback (most recent call last):
File "shapely/speedups/_speedups.pyx", line 234, in shapely.speedups._speedups.geos_linearring_from_py
AttributeError: 'list' object has no attribute 'array_interface'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/unexpected/robosat/robosat/tools/main.py", line 55, in
args.func(args)
File "/home/unexpected/robosat/robosat/tools/extract.py", line 27, in main
handler.apply_file(filename=args.map, locations=True)
File "/home/unexpected/robosat/robosat/osm/building.py", line 32, in way
shape = shapely.geometry.shape(geometry)
File "/home/unexpected/.local/lib/python3.6/site-packages/shapely/geometry/geo.py", line 35, in shape
return Polygon(ob["coordinates"][0], ob["coordinates"][1:])
File "/home/unexpected/.local/lib/python3.6/site-packages/shapely/geometry/polygon.py", line 240, in init
ret = geos_polygon_from_py(shell, holes)
File "/home/unexpected/.local/lib/python3.6/site-packages/shapely/geometry/polygon.py", line 494, in geos_polygon_from_py
ret = geos_linearring_from_py(shell)
File "shapely/speedups/_speedups.pyx", line 321, in shapely.speedups._speedups.geos_linearring_from_py
ValueError: A LinearRing must have at least 3 coordinate tuples

Remove support for resuming from a checkpoint

At the moment we a resume feature in place for training where users can resume training from a checkpoint.

parser.add_argument("--resume", type=str, required=False, help="checkpoint to resume training from")

Initially we were thinking about using this feature to speed up subsequent training runs and do fine-tuning.

We actually never used it, it's not properly tested, and we don't have a use-case for it anymore. We should remove the code which comes with this feature and simplify our training script.

Task

  • remove resume feature from rs train

Automatically generate a road training set based on OpenStreetMap

We need a training set with road masks for the areas we have high-resolution satellite imagery for.

Repeat process in #15 with roads. The tag we care about is highway=*. In OpenStreetMap:

For road width in addition to width and lane tags:

Problems and open questions:

  • Roads in OpenStreetMap are modelled with ways and not polygons. Which means we can only guestimate their width (look into road width and lane tags) to create a mask for them. We can also use the highway classification (think: primary vs service roads).
  • The road classification does not have to be consistent. A primary road in SF could be tagged secondary in DC. Does it make sense to create our own road classes, e.g. paved / unpaved?
  • Do we want a binary road / not road binary classifier or do we e.g. want to be able to detect link roads? Can we distinguish link roads from the imagery?

Tasks:

  • Check which highway tags we want
  • Check if highway classification is the same across geographies
  • Check which tags to use for road width estimation
  • Implement osmium handler for rs extract
  • Spec out ticket for road post-processing

Warmup epochs with frozen pre-trained encoder weights to initialize decoder

At the moment we are using a pre-trained ResNet as an encoder in our encoder-decoder architecture:

robosat/robosat/unet.py

Lines 94 to 100 in 8b7566e

self.resnet = resnet50(pretrained=pretrained)
self.enc0 = nn.Sequential(self.resnet.conv1, self.resnet.bn1, self.resnet.relu, self.resnet.maxpool)
self.enc1 = self.resnet.layer1 # 256
self.enc2 = self.resnet.layer2 # 512
self.enc3 = self.resnet.layer3 # 1024
self.enc4 = self.resnet.layer4 # 2048

robosat/robosat/unet.py

Lines 123 to 134 in 8b7566e

enc0 = self.enc0(x)
enc1 = self.enc1(enc0)
enc2 = self.enc2(enc1)
enc3 = self.enc3(enc2)
enc4 = self.enc4(enc3)
center = self.center(nn.functional.max_pool2d(enc4, kernel_size=2, stride=2))
dec0 = self.dec0(torch.cat([enc4, center], dim=1))
dec1 = self.dec1(torch.cat([enc3, dec0], dim=1))
dec2 = self.dec2(torch.cat([enc2, dec1], dim=1))
dec3 = self.dec3(torch.cat([enc1, dec2], dim=1))

We are currently training the model as is with all layers unfrozen.

We should investigate if freezing the ResNet encoder and running a few warmup epochs to initialize the decoder layers (then unfreezing parts or all of the ResNet) helps.

Here is how we can freeze the encoder - unfreezing works similarly:

def freeze(self):
    for layer in (self.enc0, self.enc1, self.enc2, self.enc3, self.enc4):
        for param in layer.parameters():
            param.requires_grad = False

Provide ONNX exporter for trained models

At the moment we can train models and get PyTorch-specific .pth checkpoints. We then can load these checkpoints and run prediction. There are two problems with this approach, though:

  1. The checkpoints depend on PyTorch. To predict with these models we need to deploy our Python code with PyTorch and all its dependencies. This can get quite heavy for resource-constrained environments like AWS Lambda. Ideally we want to deploy e.g. a static C++ binary.
  2. The checkpoints in addition depend on the exact Python class for the model. If we change the class even slightly we will not be able to load these models anymore. This also means model checkpoints are bound to specific Python classes: you need both the .pth file and the Python class for the model the checkpoint was trained on for deserialization and prediction. Storing .pth files alone only goes half the way.

Here is what we tried already internally and what we should implement here again:

  • Provide a model exporter which traces trained PyTorch models and exports them to ONNX protobuf
  • Provide (or at least document) e.g. Caffe2 shim which loads an ONNX model again and predicts

The ONNX abstraction will allow users to export to a variety of backends e.g. Caffe2 or TensorFlow.

Pass device to torch.load instead of map_location function

See for context: pytorch/pytorch#7178. We currently are passing devices around except for torch.load where we have to pass a map_location function argument.

In pytorch/pytorch#7339 this got changed and we can now pass the device to the torch.load function.

Note: 0.4.0 does not yet include these changes; we need to wait for another feature release.

Tasks:

  • Pass device to load function
  • Test and merge in sync with next PyTorch release

Time to download will take a year

I am trying to do building extraction for the state of Utah as the best data is about 14,000 houses so far (Microsoft building footprint data). Almost all of the houses are on the north side of Salt Lake City, so really not a huge total footprint.

Currently it says it's going to take about 333 days to download the corresponding raster data from ./rs download ...

I don't even think I can use a free subscription to download all this data. Is their something else I can do to get the raster data?

BTW - I didn't understand why the cover command needed a zoom, since building data is small, I just set it to 22:
./rs cover ../building-extraction.geojson --zoom 22 ../cover-tiles.csv

I am sure that's not a problem though...

Any ideas? Is there a program to send in a hard drive for the raster data?

Thanks,
Craig,

wrong docs: `rs download` requires the URL

The Readme says for rs download:

Downloads aerial or satellite imagery from the Mapbox Maps API (by default) based on a list of tiles.

But the command requires the URL to be entered, and there is no default as far as I can see

Validation sample folder documentation

rs train is missing documentation about validation samples. In particular that it needs folders similar to the ones in training folder:

  • %base%/validation/images/
  • %base%/validation/labels/

Without these folders (and actual data in them) training crashes after 1st epoch with not so obvious message:

Epoch: 1/10
Train: 100%|#################################| 10/10 [07:58<00:00, 47.82s/batch]
Train loss: 0.3844, mean IoU: 0.2952
Validate: 0batch [00:00, ?batch/s]
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/robosat/tools/__main__.py", line 42, in <module>
    args.func(args)
  File "/home/robosat/tools/train.py", line 105, in main
    val_hist = validate(val_loader, num_classes, device, net, criterion)
  File "/home/robosat/tools/train.py", line 186, in validate
    return {'loss': running_loss / num_samples, 'iou': iou.get()}
ZeroDivisionError: division by zero

P.S. Clear error message would also help.

Support batch-rasterization in `rs rasterize`

When rs rasterize runs, the memory usage of the process grows and grows, and on my machine is eventually killed by the linux OOM killer. This is the command I use, which tries to generate 28k mask tiles:

./rs rasterize --dataset ./data/config.toml --zoom 12 ./data/ie-buildings.geojson ./data/bld-cover.csv ./data

Handle instance segmentation by adding second output channel for touching borders

At the moment we don't do instance segmentation. This allows our model to be very simple while still achieving amazing results for our current use-cases. Where it breaks down, though, is use cases like extracting buildings in very densely populated areas.

See for example a quick prototype for Tanzania where the segmentation mask alone can not distinguish between touching buildings.

dense

And while proper instance segmentation models are much more complicated, there is one trick we can pull off based on what the folks in https://arxiv.org/abs/1806.00844 propose.

  • Add a second channel to the output. The first channel will be the segmentation mask as it is right now. The second channel will represent touching features - and only the border between features.
  • Train with ground truth segmentation masks and compute and rasterize borders where features touch.
  • After prediction feed results through the watershed transform to divide touching features into multiple features.

As a result we will get instance segmentation and can distinguish between buildings in the use-case above.

Replace U-Net encoder with pre-trained ResNet encoder

At the moment our segmentation model trains a U-Net'ish architecture from scratch. And even though the results are pretty good already both training and prediction is quite slow.

We should use a pre-trained ResNet for the encoder part and only learn the decoder from scratch.

This will be also a step towards a combined segmentation and object detection model (#12) where we then can implement a feature pyramid network on top of the ResNet encoder and add two heads: one for segmentation, one for bounding box regression. Instance segmentation should then also be possible similar to the approach in Mask-RCNN.

training is unhappy with uneven number of inputs and target

 % ./rs train --model ./config/model-unet.toml --dataset ./config/dataset-building.toml
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/connor/Documents/Regia/building-ccn/robosat-master/robosat/tools/__main__.py", line 57, in <module>
    args.func(args)
  File "/home/connor/Documents/Regia/building-ccn/robosat-master/robosat/tools/train.py", line 74, in main
    train_loader, val_loader = get_dataset_loaders(model, dataset)
  File "/home/connor/Documents/Regia/building-ccn/robosat-master/robosat/tools/train.py", line 199, in get_dataset_loaders
    [os.path.join(path, "training", "images")], os.path.join(path, "training", "labels"), transform
  File "/home/connor/Documents/Regia/building-ccn/robosat-master/robosat/datasets.py", line 58, in __init__
    assert len(self.target) == len(self.inputs[0]), "same number of tiles in inputs and target"
AssertionError: same number of tiles in inputs and target

I followed the guide to the letter. I ensure 19 zoom throughout. I create the dataset with 80% training and 20% validation.

Here is how I subset both images and rasters (masks):
./rs subset ../download-utah ../training.csv ../dataset/training/images
./rs subset ../raster-images ../training.csv ../dataset/training/labels

Do you have any suggestions on what I should tweak?

Thanks,
Craig.

Allow arbitrary number of input channels in ResNet encoder (not only RGB)

With #46 we are changing our model architecture from training the encoder and decoder from scratch to using a pre-trained ResNet for the encoder. The pre-trained ResNet uses three channels (RGB) for the input layer through.

We need to be able to add arbitrary channels, say, RGB + water mask + elevation + lidar. To to this we need to construct a wrapper module architecture extending the ResNet architecture, copying weights over, and initializing the new channels with zero. In addition the channel-wise mean and std dev needs to be adapted.

Tasks

  • Figure out how to extend the ResNet input channels
  • Figure out how to copy over the pre-trained ResNet parts
  • Let users construct a model with arbitrary channels
  • Adapt mean and std dev

`rs rasterize` requires an undocumented `--dataset` option

rs rasterize won't work unless you pass in a --dataset option, this isn't documented, and the file format isn't documented. For the record here's a file I've cobbled together, it makes the command not complain:

[common]
classes = ["notbuilding", "building"]
colors = ["dark", "white"]

Simplify training: use Adam and remove the need for manual tuning

At the moment we are using stochastic gradient descent and a multi-step weight decay policy.

In this setup the user has to set

  • the initial sgd learning rate
  • the sgd momentum to use
  • the weight decay milestones
  • the weight decay factor

optimizer = SGD(net.parameters(), lr=model["opt"]["lr"], momentum=model["opt"]["momentum"])
scheduler = MultiStepLR(optimizer, milestones=model["opt"]["milestones"], gamma=model["opt"]["gamma"])

And while this allows for great flexibility and control over details it might be to complicated for our users. We should look into replacing our current setup e.g. with the Adam optimizer only setting the initial learning rate and the weight decay.

We can then set these two values to reasonable defaults and users can get started without thinking too much about parameters and without having to run multiple experiments just to get basic parameters figured out.

Tasks

  • Implement Adam optimizer with learning rate and weight decay
  • Benchmark and check results; if it looks reasonable go for it
  • Remove sgd parameters from config; use learning rate and weight decay only

Generalize post-processing handlers across zoom levels

At the moment we have a parking lot handler tuned with thresholds specifically for zoom level 18:

class ParkingHandler:
kernel_size_denoise = 20
kernel_size_grow = 20
simplify_threshold = 0.01
def __init__(self):
self.features = []
def apply(self, tile, mask):
if tile.z != 18:
raise NotImplementedError('Parking lot post-processing thresholds are tuned for z18')
# The post-processing pipeline removes noise and fills in smaller holes. We then
# extract contours, simplify them and transform tile pixels into coordinates.
denoised = denoise(mask, self.kernel_size_denoise)
grown = grow(denoised, self.kernel_size_grow)
# Contours have a hierarchy: for example an outer ring, and an inner ring for a polygon with a hole.
#
# The ith hierarchy entry is a tuple with (next, prev, fst child, parent) for the ith polygon with:
# - next is the index into the polygons for the next polygon on the same hierarchy level
# - prev is the index into the polygons for the previous polygon on the same hierarchy level
# - fst child is the index into the polygons for the ith polygon's first child polygon
# - parent is the index into the polygons for the ith polygon's single parent polygon
#
# In case of non-existend indices their index value is -1.
multipolygons, hierarchy = contours(grown)
# In the following we re-construct the hierarchy walking from polygons up to the top-most polygon.
# We then crete a GeoJSON polygon with a single outer ring and potentially multiple inner rings.
#
# Note: we currently do not handle multipolygons which are nested even deeper.
# This seems to be a bug in the OpenCV Python bindings; the C++ interface
# returns a vector<vec4> but here it's always wrapped in an extra list.
assert len(hierarchy) == 1, 'always single hierarchy for all polygons in multipolygon'
hierarchy = hierarchy[0]
assert len(multipolygons) == len(hierarchy), 'polygons and hierarchy in sync'
polygons = [simplify(polygon, self.simplify_threshold) for polygon in multipolygons]
# Todo: generalize and move to features.core
# All child ids in hierarchy tree, keyed by root id.
features = collections.defaultdict(set)
for i, (polygon, node) in enumerate(zip(polygons, hierarchy)):
if len(polygon) < 3:
print('Warning: simplified feature no longer valid polygon, skipping', file=sys.stderr)
continue
_, _, _, parent_idx = node
ancestors = list(parents_in_hierarchy(i, hierarchy))
# Only handles polygons with a nesting of two levels for now => no multipolygons.
if len(ancestors) > 1:
print('Warning: polygon ring nesting level too deep, skipping', file=sys.stderr)
continue
# A single mapping: i => {i} implies single free-standing polygon, no inner rings.
# Otherwise: i => {i, j, k, l} implies: outer ring i, inner rings j, k, l.
root = ancestors[-1] if ancestors else i
features[root].add(i)
for outer, inner in features.items():
rings = [featurize(tile, polygons[outer], mask.shape[:2])]
# In mapping i => {i, ..} i is not a child.
children = inner.difference(set([outer]))
for child in children:
rings.append(featurize(tile, polygons[child], mask.shape[:2]))
assert 0 < len(rings), 'at least one outer ring in a polygon'
geometry = geojson.Polygon(rings)
shape = shapely.geometry.shape(geometry)
if shape.is_valid:
self.features.append(geojson.Feature(geometry=geometry))
else:
print('Warning: extracted feature is not valid, skipping', file=sys.stderr)
def save(self, out):
collection = geojson.FeatureCollection(self.features)
with open(out, 'w') as fp:
geojson.dump(collection, fp)

Instead what we should do is:

  • generalize thresholds and base them on meters
  • compute meters per pixel based on the zoom level

Here is how to compute meters per pixel on a specific zoom level:

https://wiki.openstreetmap.org/wiki/Zoom_levels

Tasks

  • implement function to compute meters per pixel on a zoom level
  • base thresholds on meters

This will allow us to have feature post-processing independent of zoom levels.

`rs download` only downloads first handful of tiles

I am trying to download the tiles for part of Ireland, and rs download only downloads the first ~300 tiles and then stops. When I run it again, it doesn't download anything more. This is the command I used and the output (I just prints the empty progress barm and exits after ~2 sec):

./rs download --rate 1 --ext jpg "https://api.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=[ACCESTOKEN]" ./data/bld-cover.csv ./data/tiles/
0%| | 0/28637 [00:00<?, ?image/s]

Maybe I have hit a limit in the Mapbox API, if so rs download should tell the user that.

Implement additional data augmentations for training

In #19 we implemented data augmentations like random rotations and random flipping. In the context of aerial and satellite imagery we should implement additional augmentations:

  • Implement random scaling (image with bicubic, mask with nearest).
  • Implement random color jitter in HSV color space (image only).
  • Implement motion blur (directional, not just gaussian blur). Mostly for drone imagery use-case.
  • Implement stitching artifacts. Simulate with shear (offset of part of the image) and blur on only one side.
  • Implement changes in contrast.

Note: implement contrast jitter as something like add -0.2..0.2 and gamma (x**n) 0.8..1.2 to the luminance channel. That’s important because the camera is doing auto-exposure and you can get very different contrast profiles for under cloud v. under sun, the same feature in and out of a tree’s shadow, etc.

Up- or downsample probabilities between zoom levels

Right now rs masks is able to combine multiple slippy map directories with probabilities. The idea is that we can have multiple models and this feature allows us for model ensembles.

We should add functionality to upsample or downsample slippy map directories with probabilities based on zoom levels.

The use-case is as follows:

  • we train models to detect roads on zoom level 19, very close up with details
  • we train models to detect roads on zoom level 16, further away high-level view
  • we predict and get two slippy maps probs/19/x/y.png, probs/16/x/y.png
  • we want to combine these predictions into masks on a specific zoom level, say z18
  • we need to downsample the z19 predictions and upsample the z16 predictions

We can either extend rs masks to upsample or downsample. Or we add a new tool which can transform a slippy map directory on zoom level z0 into a slippy map directory on zoom level z1, with z0 != z1.

Implementation notes:

  • mercantile has functionality for children and parent tiles
  • pillow has functionality for up- and downsampling

Implement joint transformations for data augmentation

We need to implement joint transformations for data augmentations modifying both the image as well as the mask at the same time. Existing transformations working on a single input:

http://pytorch.org/docs/master/torchvision/transforms.html

In contrast to these implementations we need to build joint transformations. A joint transformation is a transformation working on image and mask at the same time. The difference to the existing PyTorch transformations is that e.g. when we rotate the image by a random angle we also need to rotate the mask by the exact same angle.

We need at least

  • random rotations
  • random horizontal, vertical flipping

Probably later down the line: color jitter, noise, blur, scaling.

Tasks:

  • Implement joint transformation abstraction
  • Implement specific augmentation transformations

Ignore .DS_Store from the dataset directory

Running the stack locally on a mac, the dataset directory will have the.DS_Store file which throws an error during rs train.

./rs train --model config/model-unet.toml --dataset config/dataset-building.toml
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/maning/projects/git/robosat/robosat/tools/__main__.py", line 42, in <module>
    args.func(args)
  File "/Users/maning/projects/git/robosat/robosat/tools/train.py", line 85, in main
    train_loader, val_loader = get_dataset_loaders(model, dataset)
  File "/Users/maning/projects/git/robosat/robosat/tools/train.py", line 218, in get_dataset_loaders
    target_transform)
  File "/Users/maning/projects/git/robosat/robosat/datasets.py", line 55, in __init__
    self.inputs = [SlippyMapTiles(inp, fn) for inp, fn in zip(inputs, input_transforms)]
  File "/Users/maning/projects/git/robosat/robosat/datasets.py", line 55, in <listcomp>
    self.inputs = [SlippyMapTiles(inp, fn) for inp, fn in zip(inputs, input_transforms)]
  File "/Users/maning/projects/git/robosat/robosat/datasets.py", line 26, in __init__
    self.tiles = [(tile, path) for tile, path in tiles_from_slippy_map(root)]
  File "/Users/maning/projects/git/robosat/robosat/datasets.py", line 26, in <listcomp>
    self.tiles = [(tile, path) for tile, path in tiles_from_slippy_map(root)]
  File "/Users/maning/projects/git/robosat/robosat/tiles.py", line 77, in tiles_from_slippy_map
    for name in os.listdir(os.path.join(root, z, x)):
NotADirectoryError: [Errno 20] Not a directory: 'dataset/validation/images/21/.DS_Store'

Make sure extracted polygons follow the GeoJSON spec's right-hand rule

Since August 2016 there is https://tools.ietf.org/html/rfc7946 with

Note: the [GJ2008] specification did not discuss linear ring winding
order. For backwards compatibility, parsers SHOULD NOT reject
Polygons that do not follow the right-hand rule.

and normative changes section in https://tools.ietf.org/html/rfc7946#appendix-B.1

o Polygon rings MUST follow the right-hand rule for orientation
(counterclockwise external rings, clockwise internal rings).

At the moment it seems like our extracted polygons do not follow this winding order rule.

Quickfix on the consumer side is: https://github.com/mapbox/geojson-rewind

Task:

  • make sure rs features respects the right-hand rule when extracting polygon rings

Replace deconvolutional upsampling in decoder with nearest upsampling plus convs

At the moment we upsample with a scaling factor of two in the decoder by using deconvolutions (transposed convolutions). Instead we should use the following simpler approach which should work just as well an not have the segmentation mask checker board problem:

  • upsample in nearest neighbor mode
  • add convolutions after upsampling

We could also switch the order first doing convolution on the low-res feature maps in case we need to save some memory. But I don't think that's a constraint we have right now.

Explore combination of cross-entropy and dice coefficient loss

Currently we optimize for the cross-entropy loss function in our segmentation model training.

We should try out a combination of cross-entropy and the dice coefficient, e.g.

loss = cross-entropy - ln(dice)

or

loss = w1 * cross-entropy - w2 * (1 - dice)  # e.g. with w1=w2=0.5

Tasks:

  • Implement dice loss
  • Run out experiments with a combination of cross entropy and dice loss
  • Use combination of cross entropy and dice loss if it improves results

Fetch neighbouring images when downloading

Buffer from neighbouring images is added to the image being used for training (robosat/tiles.py#buffer_tile_image). Maybe rs download should try fetching images neighbouring the ones in csv file?
For example: if training set is made of three images (marked as T), maybe neighbouring images (marked as N) should also be downloaded:

.NNN
NNTN
NTTN
NNNN

This would decrease (or in most cases totally eradicate) the need to add "no data" or "mirroring" buffers?

Move simplification to `rs merge` pipeline step

Currently in rs feature we extract polygons and simplify them. Then in rs merge we buffer, then union, then unbuffer for merging shapes across tile boundaries. This leads to polygons that are no longer simplified.

We should investigate not simplifying in rs feature. Instead merge un-simplified geometries, then simplify in rs merge at the end of the pipeline.

Tasks:

  • Remove simplification in rs feature
  • Implement simplification in rs merge after merging shapes

Implement Feature Pyramid Network for semantic segmentation

Splitting off of #12. Eventually we want to implement an object detection architecture in addition to out current semantic segmentation architecture. The RetinaNet (ticketed in #12) is a perfect fit for our goals. It will be based on top of a ResNet feature extractor and a feature pyramid network.

We can use the ResNet feature extractor and the feature pyramid network already for semantic segmentation. Then later down the line we can extend it adding a bounding box regression head and get a single unified simple architecture for both object detection as well as semantic segmentation.

Resources:

Here is the main gist from the second PDF:

fpn

The ResNet will give us the leftmost upward path. The downward path and the lateral connections then make up the feature pyramid network. The rightmost part is the semantic segmentation head.

Tasks

  • Implement the feature pyramid network on top of ResNet
  • Implement semantic segmentation head on top of the feature pyramid network

Investigate difference in `rs cover` vs `tile-cover` tile list

When generating lists of tile ids covering geometries we are seeing a difference between the tile-cover and the rs cover generated tile ids. For the same geometries:

wc -l /tmp/tile-cover.tiles
116230 /tmp/tile-cover.tiles

wc -l /tmp/rs-cover.tiles 
115248 /tmp/rs-cover.tiles

Our rs cover tool does not include 982 tiles the tile-cover tool gives us. We need to look into this.

Tasks:

  • Figure out difference between tile-cover and rs cover

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.