developmentseed / label-maker Goto Github PK

View Code? Open in Web Editor NEW

454.0 53.0 111.0 19.28 MB

Data Preparation for Satellite Machine Learning

Home Page: http://devseed.com/label-maker/

License: MIT License

Python 99.85% Shell 0.15%

satellite-imagery data-preparation deep-learning computer-vision remote-sensing keras

label-maker's People

Stargazers

Watchers

Forkers

alexxnica kryndex smarthi cclauss geoyi hossein-madadi crikeli kjeanclaude cgoodier aspen01 pandinosaurus mcculloughrt azupanc cuulee betatim bobleegogogo jreiberkyle geogismx dlindenbaum bobleegogogogo sharadshingade singhvijay hokiepedro crashfunction bingxin70aa aniucd chaoer nojuman palacima joshwapiano giserh chaipat-ncm wfp-ose nieyi zhengzhiteng wiosen dgreyling gisstack nightinwhite fesowola neveroldmilk achyutjoshi sharadgupta27 lfeng1 skarakulak ibakerchen wouellette jinunmeng ahmadammar cdubbs512 geospatial-data-science whitereference rsip4sh climbthemt chloeahampton geobigdata joh10891 dexception victor045 nickelixir mtreml paumillet nnu-gisa hasekimi isennkubilay mboga anuragsinghchaudhary pallawisinghal gradpratik kalkan58 samux87 yhchen-gerineo robertdigital aktaseren rasterranger metavoyant fagan2888 lebusini digital-idiot ppm-geodata yili9111 qinhan-luo luke202001 yongjunhe11 arasharchor darkwlf geozcx ahnuljz alaa8082 vertragus benardonyango jeshy phanvantrong reloadbrain geoffreyporto chanjeunlam jonaslalin isabella232 sshyran nightkidfifa

label-maker's Issues

Object detection example: tfexample_decoder has no attribute 'BackupHandler'

@Geoyi Thank you for the great tutorial! I was able to follow through until the very last step to train the object detection model. Before I ran into the error, I had noticed two minor differences from your tutorial:

I only got 196 image tiles for the Mexico example instead of 385 image tiles as stated in your tutorial. I did try on two different computers, both of which got the same 196 image titles.
The original folder name in the zip file ssd_inception_v2_coco is ssd_inception_v2_coco_2017_11_17, so I had to change the folder name to ssd_inception_v2_coco. You might want to note this in your tutorial.

I went through the Object Detection Model Setup without any problem. However, when I started to train the model, I got an error AttributeError: module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler', which is similar to the issue reported here. I tried the suggested workaround to download the tfexample_decoder.py and replace my local tfexample_decoder.py, but it did not work. It seems the solution is to upgrade to tensorflow v1.4 or a nightly build, which requires cuda 9.0. However, I am not yet ready to upgrade cuda. Any advice will be appreciated! See below for more information about the errors.

System information

What is the top-level directory of the model you are using:

~/tensorflow/models/research/object_detection

OS Platform and Distribution: Linux Ubuntu 16.04, Elementary OS

TensorFlow installed from (source or binary): pip install tensorflow-gpu

TensorFlow version: v1.3.0

Python version: v3.6.3

CUDA/cuDNN version: CUDA 8.0, cudnn 6.0, nVidia driver: 384.111

GPU: nVidia GeForce GTX 1050 Ti 4GB memory

CPU: Intel x86-64 Intel Core i7-4790K @ 3.60GHz x 8, 32GB memory

Exact command to reproduce:

python train.py --logtostderr \
             --train_dir=training/ \
             --pipeline_config_path=training/ssd_inception_v2_coco.config

Source code / logs:

(py36) qiusheng@wu-office:/media/hdd/Data/developmentseed/models/research/object_detection$ python train.py --logtostderr              --train_dir=training/              --pipeline_config_path=training/ssd_inception_v2_coco.config
Traceback (most recent call last):
  File "train.py", line 169, in <module>
    tf.app.run()
  File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "train.py", line 165, in main
    worker_job_name, is_chief, FLAGS.train_dir)
  File "/media/hdd/Data/developmentseed/models/research/object_detection/trainer.py", line 235, in train
    train_config.prefetch_queue_capacity, data_augmentation_options)
  File "/media/hdd/Data/developmentseed/models/research/object_detection/trainer.py", line 59, in create_input_queue
    tensor_dict = create_tensor_dict_fn()
  File "train.py", line 122, in get_next
    worker_index=FLAGS.task)).get_next()
  File "/media/hdd/Data/developmentseed/models/research/object_detection/builders/dataset_builder.py", line 140, in build
    label_map_proto_file=label_map_proto_file)
  File "/media/hdd/Data/developmentseed/models/research/object_detection/data_decoders/tf_example_decoder.py", line 153, in __init__
    label_handler = slim_example_decoder.BackupHandler(
AttributeError: module 'tensorflow.contrib.slim.python.slim.data.tfexample_decoder' has no attribute 'BackupHandler'

Tried the suggested workaround to replace the local tfexample_decoder.py and got the following error:

(py36) qiusheng@wu-office:/media/hdd/Data/developmentseed/models/research/object_detection$ python train.py --logtostderr              --train_dir=training/              --pipeline_config_path=training/ssd_inception_v2_coco.config
Traceback (most recent call last):
  File "train.py", line 49, in <module>
    from object_detection import trainer
  File "/media/hdd/Data/developmentseed/models/research/object_detection/trainer.py", line 32, in <module>
    from object_detection.utils import variables_helper
  File "/media/hdd/Data/developmentseed/models/research/object_detection/utils/variables_helper.py", line 23, in <module>
    slim = tf.contrib.slim
  File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/lazy_loader.py", line 53, in __getattr__
    module = self._load()
  File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/util/lazy_loader.py", line 42, in _load
    module = importlib.import_module(self.__name__)
  File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/contrib/__init__.py", line 60, in <module>
    from tensorflow.contrib import slim
  File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/contrib/slim/__init__.py", line 44, in <module>
    from tensorflow.contrib.slim.python.slim.data import tfexample_decoder
  File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/data/tfexample_decoder.py", line 7
    <!DOCTYPE html>
    ^
SyntaxError: invalid syntax

Is resnet.py intended to work with the example configuration?

Running through the README verbatim and using the togo configuration successfully produces data.npz, but running the resnet.py example with that data produces this error:

~/p/maps〉python3 resnet.py
Using TensorFlow backend.
x_train shape: (8, 256, 256, 3)
8 train samples
3 test samples
class_weight {0: 1, 1: 0.0}
2018-04-13 10:58:18.219381: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
  File "resnet.py", line 83, in <module>
    workers=4)
  File "/usr/local/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/keras/engine/training.py", line 2160, in fit_generator
    val_x, val_y, val_sample_weight)
  File "/usr/local/lib/python3.6/site-packages/keras/engine/training.py", line 1480, in _standardize_user_data
    exception_prefix='target')
  File "/usr/local/lib/python3.6/site-packages/keras/engine/training.py", line 123, in _standardize_input_data
    str(data_shape))
ValueError: Error when checking target: expected fc1000 to have shape (2,) but got array with shape (3,)

I'm not sure whether resnet.py is intended to work with the togo configuration. If not that, is there a configuration that produces data that works with resnet.py?

Handle human-generated non-OSM labels

I work on Raster Vision at Azavea and this library seems to handle some of the same functionality. We create object detection training chips from GeoTIFFs and human-generated GeoJSON label files, and I'm thinking about how to extend label-maker to do this. Do you think this would be in the scope of label-maker, or is the focus just on generating labels from OSM data? Handling human-generated label files would make the library more general and usable with pre-existing benchmark datasets, for instance. I saw that there's an issue (#13) for handling GeoTIFFs instead of TMS URLs, which would be step one. The next step would be handling GeoJSON labels, or GeoTIFFs in the case of segmentation. Or maybe the right approach is to convert our data into vector tiles which this library already handles.

About label-maker for building detection in AWS

Hallo there,

I could run the building detection demo on my local CPU, since this procedure takes so long, so I am wondering that is that possible to run this label-maker+ tensorflow objecte detection API in AWS, did you guys have some suggestion or brief idea, how could I do that?

Regards

Downloading fails with large QA Tile files

While label-maker download mbtiles for united_states_of_america casues Invalid argument as follows:

File "/Users/xxx/xxxx/label-maker/label_maker/download.py", line 32, in download_mbtiles
    w.write(r.read())
OSError: [Errno 22] Invalid argument

after the mbtiles has showed be downloaded.

add mxnet and sagemaker image classification use case

I'm trying to build a quick image classification with mxnet with label-maker.

create image class(es) to classify.
- what class are we interested in classifying? @drewbo, @wronk any priority here?
- understand how we usually generate class config file and fetch data from OSM;
generate training and test dataset for image classification from label-maker.
pick an image classification neural net from mxnet;
- a quick look/pick of image classification from mxnet incubator;
- a quick assessment of the net that we need up picking
train the model
- on my local machine;
- on a cloud machine;
- document the process and give a valid assessment;
a final write-up about:
- Internal: how well does this data prep repository work and what can we do to improve it
- External: here's how to use this repo with MXNet (cost, training time, accuracy)
use Amazon SageMaker to train the mx-lenet;
- start an Amazon p2 instance to train the building classifier example;
- format the code mx-lenet.py to run with SageMaker;
- Document the training process;

error happend when use "pip install label_maker"

rasterio/_base.c(522): fatal error C1083: 无法打开包括文件: “cpl_conv.h”: No such file or directory

Preview should show background class for classification problems

Handling objects that straddle tiles

Given my understanding of how label-maker works, tile extents are static (and based on the mbtile files), and with ml_type=='object-detection', objects that straddle tile boundaries will be split up in the training data. Ideally, the tile bounds would be generated dynamically so that it could generate tiles that contain the entire object, which should help the model learn better. However, fixing this would probably complicate the implementation (which I think is elegant, btw), and might not be worth it assuming there aren't many objects that straddle tiles, or if you want to be able to detect partial, clipped objects. I was just wondering if this is something you've considered.

Filter out cloudy tiles

Using the Vietnam example ("bounding_box": [105.42,20.75,106.41,21.53]), I noticed that there are approximately 200 tiles (out of 2296 downloaded tiles) with very high cloud cover. At least 100 tiles appear to be completely white (100% cloud cover) with the smallest file size (1.7 kb - 3.3 kb). Most image tiles without cloud cover are larger than 5.0 kb. Using these high cloud cover image titles for training could potentially influence the classification accuracy. I know this problem is caused by the Mapbox satellite images, but I was wondering if it could be possible for the label-maker to somehow identify these high cloud cover image tiles and eliminate them from being used for training?

BTW, I trained the Vietnam example on my 64-bit Linux machine (i7-4790 CPU, 32 GB RAM, NVIDIA EVGA GeForce GTX 1050 Ti GPU) using the resnet.py script. The training took approximately 76 minutes. The test accuracy is 83.22%.

Thank you for developing such a great tool! It makes my life much easier!

Include a pre-trained model example?

Hi folks!

Over at @observablehq we've been having fun using TensorFlow.js to run lightweight ML models in-browser, like in this recent example.

I'd love to tinker with the same idea with a map-trained model, and wonder if the project might include a TensorFlow SavedModel or Keras pre-trained model as a starting point, or as part of the detailed walk-throughs. Would this be in-scope of the project, as a way to play around with the model's activations without needing to setup & train it from scratch?

Move documentation and examples to external docs site

label-maker has grown a lot in terms of options and examples, and all the new content has made it difficult to parse the README. We should move the docs and examples to one of the autogenerated docs platforms so it's better organized.

We first need to decide on a platform. Python offers some guidance here and the ReadTheDocs platform is also pretty popular. Any other good options?

preview fails with: missing 1 required positional argument: 'imagery_offset'

I am running label-maker version 0.3.1.

When I run label-maker preview -n 3, the following error is thrown:

Traceback (most recent call last):
  File "/opt/conda/bin/label-maker", line 11, in <module>
    sys.exit(cli())
  File "/opt/conda/lib/python3.6/site-packages/label_maker/main.py", line 87, in cli
    preview(dest_folder=dest_folder, number=number, **config)
TypeError: preview() missing 1 required positional argument: 'imagery_offset'

I don't see anything in the README or the cli help mentioning imagery_offset in regard to the preview command.

Some initial digging shows that preview() does require imagery_offset as a positional argument. I'm not sure why that isn't being populated.

preview creates n+1 tiles

label-maker preview -n 3 generates 4 example tiles for each class, not 3.

probable fix:
preview.py line 60
if n > number: -> if n >= number

Object detection prints the incorrect label summary

Determining labels for each tile
---
Buildings: 0 features in 0 tiles
Total tiles: 462

Support multiple training areas

Expand either the country or bounding_box property of config.json (or both) to allow for specifying multiple areas to pull training data from

`python setup.py install` causes problems

When I install the cloned package using
python setup.py install
there were issues importing new submodules I was working on (i.e., python couldn't find them).

This happened as I was trying to create label_maker/utils/plot_utils.py even though I put an empty __init__.py file in place and uninstalled/reinstalled the package. Without any changes to the code, pip install -e . worked just fine.

Error installing six 1.4.1

In my computer, I use python 2.7, I've installed python 3 using brew and I've set up alias python=python3 in my ~/. basrc . however, it throws 👇 an error:

vir $ pip install label-maker
Collecting label-maker
Collecting mapbox-vector-tile==1.2.0 (from label-maker)
Collecting olefile==0.44 (from label-maker)
Collecting Pillow==4.3.0 (from label-maker)
  Using cached Pillow-4.3.0-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting numpy==1.13.3 (from label-maker)
  Using cached numpy-1.13.3-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting pyclipper==1.0.6 (from label-maker)
  Using cached pyclipper-1.0.6-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting Shapely==1.6.3 (from label-maker)
  Using cached Shapely-1.6.3-cp27-cp27m-macosx_10_9_intel.macosx_10_9_x86_64.whl
Collecting pycurl==7.43.0.1 (from label-maker)
Collecting rasterio==1.0a12 (from label-maker)
  Using cached rasterio-1.0a12-cp27-cp27m-macosx_10_9_intel.macosx_10_9_x86_64.whl
Collecting six==1.10.0 (from label-maker)
  Using cached six-1.10.0-py2.py3-none-any.whl
Collecting mercantile==1.0.0 (from label-maker)
  Using cached mercantile-1.0.0-py2-none-any.whl
Collecting pyproj==1.9.5.1 (from label-maker)
Requirement already satisfied: geojson==2.3.0 in /Library/Python/2.7/site-packages (from label-maker)
Requirement already satisfied: click==6.7 in /Library/Python/2.7/site-packages (from label-maker)
Requirement already satisfied: Cerberus==1.1 in /Library/Python/2.7/site-packages (from label-maker)
Collecting tilepie==0.2.1 (from label-maker)
  Using cached tilepie-0.2.1-py2.py3-none-any.whl
Collecting protobuf==3.5.0.post1 (from label-maker)
  Using cached protobuf-3.5.0.post1-py2.py3-none-any.whl
Collecting requests==2.11.0 (from label-maker)
  Using cached requests-2.11.0-py2.py3-none-any.whl
Requirement already satisfied: humanize==0.5.1 in /Library/Python/2.7/site-packages (from label-maker)
Collecting mbutil==0.3.0 (from label-maker)
  Using cached mbutil-0.3.0-py2.py3-none-any.whl
Collecting homura==0.1.5 (from label-maker)
Requirement already satisfied: setuptools in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from mapbox-vector-tile==1.2.0->label-maker)
Collecting future (from mapbox-vector-tile==1.2.0->label-maker)
Requirement already satisfied: affine in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Requirement already satisfied: snuggs>=1.4.1 in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Requirement already satisfied: enum34 in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Collecting attrs (from rasterio==1.0a12->label-maker)
  Using cached attrs-17.4.0-py2.py3-none-any.whl
Requirement already satisfied: click-plugins in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Requirement already satisfied: cligj in /Library/Python/2.7/site-packages (from rasterio==1.0a12->label-maker)
Requirement already satisfied: certifi in /Library/Python/2.7/site-packages (from homura==0.1.5->label-maker)
Requirement already satisfied: pyparsing in /System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python (from snuggs>=1.4.1->rasterio==1.0a12->label-maker)
Installing collected packages: six, protobuf, future, Shapely, pyclipper, mapbox-vector-tile, olefile, Pillow, numpy, pycurl, attrs, rasterio, mercantile, pyproj, tilepie, requests, mbutil, homura, label-maker
  Found existing installation: six 1.4.1
    DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling six-1.4.1:
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/basecommand.py", line 215, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/commands/install.py", line 342, in run
    prefix=options.prefix_path,
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_set.py", line 778, in install
    requirement.uninstall(auto_confirm=True)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_install.py", line 754, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/req/req_uninstall.py", line 115, in remove
    renames(path, new_path)
  File "/Library/Python/2.7/site-packages/pip-9.0.1-py2.7.egg/pip/utils/__init__.py", line 267, in renames
    shutil.move(old, new)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 131, in copy2
    copystat(src, dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 103, in copystat
    os.chflags(dst, st.st_flags)
OSError: [Errno 1] Operation not permitted: '/var/folders/6v/__4yg2p950382cycwndc2cf80000gn/T/pip-23SqDP-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'

Don't save full labels.npz for sparse datasets

For very large areas with sparse data, it can be computational burdensome to save this enormous file, much of which may be "thrown out" when doing the eventual training (if it is background). I'd like to add the --sparse option which mimics the operation from images to only save a handful of non-class (background) tiles

Support GeoTIFF reading

Add support for reading from local GeoTIFF (rather than a tiled imagery endpoint)
Add support for reading from remote Cloud Optimized GeoTIFF (reading each tile as a window into the tif)

Color blind safe colors for object-detection/segmentation labels

In object detection and segmentation, we have to color bounding boxes and color fills once we get beyond binary labels. We should be colorblind-friendly when choosing those colors.

Options so far:

Colorbrewer2, which had a previous python implementation called brewer2mpl and is now Palettable
Optimized color palette by Paul Tol
Tableau 10 color palette that is color blind friendly and was recently added to matplotlib. Hex vals are available in the PR.

The most complete solution would be to find/write a function that takes number of colors as input, and spits back an idealized list of RGB/Hex vals. This might be overkill though -- a simpler solution would be to cycle through a fixed set of colors that are known to be color blind friendly (like Tableau 10 or something analogous). The latter solution is also nice because we won't add a dependency.

OSError: cannot identify image file <_io.BytesIO object at 0x7fc87f2ec678>

I'm following the Example Use: A building detector with TensorFlow API, and ran into an error when trying to generate the training/eval data.

System information

** What is the top-level directory of the model you are using: **
~/tensorflow/models/research/object_detection

OS Platform and Distribution: Linux Ubuntu 16.04, Elementary OS

TensorFlow installed from (source or binary): pip install tensorflow-gpu

TensorFlow version: v1.5.0

Python version: v3.6.0

CUDA/cuDNN version: CUDA 9.0, cudnn 7.0, nVidia driver: 384.111

GPU: nVidia GeForce GTX 1070 Ti 8GB memory

CPU: Intel x86-64 Intel Core i5-6400K @ 2.70GHz x 4, 16GB memory

Exact command to reproduce:

python tf_record_generation.py --label_input=labels.npz              --train_rd_path=data/train_buildings.record              --test_rd_path=data/test_buildings.record

Sample code / logs:

python tf_record_generation.py --label_input=labels.npz              --train_rd_path=data/train_buildings.record              --test_rd_path=data/test_buildings.record
You have 221 training tiles and 144 test tiles ready
Traceback (most recent call last):
  File "tf_record_generation.py", line 172, in <module>
    tf.app.run()
  File "/home/xban/.pyenv/versions/general/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
    _sys.exit(main(argv))
  File "tf_record_generation.py", line 143, in main
    tf_example = create_tf_example(group, train_dir)
  File "tf_record_generation.py", line 56, in create_tf_example
    image = Image.open(encoded_jpg_io)
  File "/home/xban/.pyenv/versions/general/lib/python3.6/site-packages/PIL/Image.py", line 2572, in open
    % (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x7fc87f2ec678>

Store intermediate results as pickle

I was wondering what the thinking was behind choosing np.save/np.load instead of pickle. I was preparing data for a fairly small region and ended up with 156250 tiles in total. Of which about 50000 are in one of my classes. Running label-maker images took a long time so I started investigating and switching from .npz to a pickle means it starts downloading tiles more or less immediately. Before I had to wait a very long time (minutes or more?) to get to the bit when it starts downloading.

If this sounds like something you'd be open to I'd contribute a PR.

Make rasterization improvements

The current rasterization code for segmentation labels doesn't support any additional options. For starters, it would be nice if there were an option to buffer Points and LineStrings so that they had a width greater than 1 pixel.

Proposed syntax would look like this in config.json:

"classes": [
    { "name": "Roads", "filter": ["has", "highway"], "buffer": 3 }
]

where 3 is the number of pixels to buffer the geometry by before rasterizing.

Create example with non-API restricted imagery

Create an example using non-API restricted imagery so users can start using the tool more easily:

Test reorganization

Create a better folder structure to separate unit and validation tests, use unittest library for both

Brainstorming replacing QA-Tiles

I need to rework my https://github.com/jremillard/images-to-osm project to use Mapbox tiles. The problem that label-maker is attempting to solve is right at the center of the planned rework. I just wanted to communicate what label-maker would look like if it was a perfect fit for my needs.

The input data (training ) to label-maker should be a set of geojson files. There is a rich and mature existing infrastructure of generating them from OSM and other data sources. They are easy to write code against in any language. Let other tools deal with it.

Label maker config would be

output zoom level OR a metric output (.5 m/pixel).
output image size for the training network (say 800x800), not an even tile boundary.
data augmentation options (center object, randomly slide object around, up/down, left/right flips, % scale change, edge buffer zone, allow clipped features, etc).
How many sample images to make.
training/validation split %.
Sat image TMS URL (someday support Bing when they can change the license).
Max sat image cache size, directory, also need max ago of sat image cache (mapbox is 30 days).
% of images to create that are negative samples (no objects in them).

The final output would be intermediate files (training, and validating), not the training images.

When the network is training, the intermediate files can be opened up, and single images can be generated on the fly from a python module. The python module would handle either fetching and forming the training images or getting them from the sat image cache. It would stitch the sat images together, crop them correctly, and output bounding boxes, segmentation masks, and instance masks. The one image at a time would allow data sets that don't fit into memory to be used, keep performance good, and not violate sat image caching licensing restrictions.

If you want to be really nice to people, have an option to write out MS COCO files, since basically everyone is using that data set right now for benchmarking.

Sagemaker 'NoneType object' issue with data in 'walkthrough-classification-mxnet-sagemaker' example

I've been following the walkthough found here (albeit with a smaller bounding box), and have initiated a Sagemaker Notebook instance. The data.npz file is sitting in the sagemaker folder, and I'm having no problem reading it when running the relevant sections of mx_lenet_sagemaker.py in a new notebook on the instance, however when I run the second cell of SageMaker_mx-lenet I hit the following error:

ValueError: Error training sagemaker-mxnet-2018-07-08-18-12-13-217: Failed Reason: AlgorithmError: uncaught exception during training: 'NoneType' object has no attribute 'read'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start
    fw.train()
  File "/usr/local/lib/python2.7/dist-packages/mxnet_container/train.py", line 191, in train
    model = user_module.train(**kwargs_to_pass)
  File "/opt/ml/code/mx_lenet_sagemaker.py", line 92, in train
    train_iter, val_iter = prep_data(data_path)
  File "/opt/ml/code/mx_lenet_sagemaker.py", line 14, in prep_data
    data = np.load(find_file(data_path, 'data.npz'))
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 402, in load
    magic = fid.read(N)
AttributeError: 'NoneType' object has no attribute 'read'

After several hours trying different fixes I'm having little to no luck debugging, but was hoping you could check the example to ensure it runs fine when you attempt it?

Problem when Create TFRecords from model training

Hallo there,

I got a error : NotfoundError, when I try to creat train_buildings.record like following:

hd_hao@DESKTOP-T5OM5NG:/mnt/d/DeepVGI/models/research/object_detection$ python tf_records_generation.py --label_input=labels.npz
You have 211 training tiles and 157 test tiles ready
Traceback (most recent call last):
File "tf_records_generation.py", line 175, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "tf_records_generation.py", line 142, in main
writer = tf.python_io.TFRecordWriter(FLAGS.train_rd_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/lib/io/tf_record.py", line 111, in init
compat.as_bytes(path), compat.as_bytes(compression_type), status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory

do you guys have any suggestion about this, I am running this on Ubuntu16.04, python3.5

Hao

Speed up tile downloads

Right now, tile downloads happen in series/synchronously. This is quite slow for larger data sets. We should take advantage of improved Python 3.6 asyncio support or an existing library to parallelize the downloads.

Sparse flag doesn't work for segmentation

The test for whether a tile is background is whether it matches the zero/background class:

class_match(ml_type, tile_results[k], 0)

For classification and object detection this works fine; however for segmentation it will return true for every tile that isn't completely non-background

Don't render objects outside of the tile

When creating segmentation labels, for each matching OSM feature in a tile, we convert the coordinates to pixel coordinates and then rasterize. Unfortunately, because some coordinates are slightly outside of the tile (although the feature is partially in), when we clamp the coordinates to our tile pixel bounds, we create some odd visual artifacts

About the precision of example (walkthrough-tensorflow-object-detection)

Hallo there,
I have tried witht he example of : A building detector with TensorFlow API at mexico city, I just follow all the default setting of boundary box and training model. Then I got my model trained as well as the test result, the question is that from the blog page" There are 227 buildings in the test dataset, and 191 buildings are predicted correctly by the model (84%)."
I was wondering that is this "84%" refer to the precision? If not, what's this number means? For my own model, I trained it around 20000, and select one of the check point in "training" folders, then got only precision around 20%, so I want to know how we could select the best checkpoint during the training? or we just monitoring the loss and select some checkpoint manually?

Best Regards,

Hao

errors occur when packaging for the two examples (Tanzania and Vietnam)

I ran into problems when testing the two building classification examples (Tanzania and Vietnam) on my Linux machine. All label-maker commands (i.e., download, labels, preview, images) worked fine except the label-maker package command.

For the Vietnam example, the label-maker images command did download 2296 tiles. However, when using the label-maker package command, I got the following errors. I checked the filenames of the titles shown in the error list below and found that they are not among the 2299 tiles downloaded. I had similar issues with the Tanzania example.

Any help will be appreciated!

************************************* errors *******************************************
(py36) qiusheng@office:/tmp$ label-maker images --dest Vietnam_building --config Vietnam.json
Downloading 2296 tiles to Vietnam_building/tiles
(py36) qiusheng@office:/tmp$ label-maker package --dest Vietnam_building --config Vietnam.json
Couldn't open Vietnam_building/tiles/103935-57783-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104132-57663-17.jpg, skipping
Couldn't open Vietnam_building/tiles/103998-57519-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104007-57744-17.jpg, skipping
Couldn't open Vietnam_building/tiles/103990-57746-17.jpg, skipping
Couldn't open Vietnam_building/tiles/103973-57637-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104061-57529-17.jpg, skipping
Couldn't open Vietnam_building/tiles/103977-57720-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104009-57794-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104224-57747-17.jpg, skipping
......
Couldn't open Vietnam_building/tiles/104194-57708-17.jpg, skipping
Couldn't open Vietnam_building/tiles/104265-57705-17.jpg, skipping
Traceback (most recent call last):
File "/home/qiusheng/anaconda3/envs/py36/bin/label-maker", line 11, in
sys.exit(cli())
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/label_maker/main.py", line 87, in cli
package_directory(dest_folder=dest_folder, **config)
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/label_maker/package.py", line 61, in package_directory
img = Image.open(image_file)
File "/home/qiusheng/anaconda3/envs/py36/lib/python3.6/site-packages/PIL/Image.py", line 2572, in open
% (filename if filename else fp))
OSError: cannot identify image file 'Vietnam_building/tiles/104173-57771-17.jpg'

About example: workthrough-tensorflow-object-detection

Hello there,
I make this new issue because I got some question about the train and test samples split steps in this example. For the demo area of Mexico, the area of interest consist of 227 tiles in total, so if my understanding is right that, this total 227 tiles should be split into 181 training samples and 46 test samples (using 0,8 split index). But it seem the output using tf_records_generation.py from the example is not like this.
I have a look on the code for tf_records_generation.py:

for tile in tiles:
        bboxes = labels[tile].tolist()
        width = 256
        height = 256
        if bboxes:
            for bbox in bboxes:
                if bbox[4] == 1:
                    cl_str = "building"
                    bbox = [max(0, min(255, x)) for x in bbox[0:4]]
                    y = ["{}.jpg".format(tile), width, height, cl_str, bbox[0], bbox[1], bbox[2], bbox[3]]
                    tf_tiles_info.append(y)

    split_index = int(len(tf_tiles_info) * 0.8)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    df = pd.DataFrame(tf_tiles_info, columns=column_name)
    # shuffle the dataframe
    df = df.sample(frac=1)
    train_df = df[:split_index]
    test_df = df[split_index:]

split_index = int(len(tf_tiles_info) * 0.8), this split here is split by building features number, right ? From my point of view, we have to split the total tiles and then as well as the building features in those tiles. Otherwise, the test tiles can not be promising to be independent of the training tiles, and the results of test make no sense anymore. If I did some mistakes here or I understand wrong, please feel free to point out.

Best;

Hao

Add an option for imagery offsets

When downloading images from a GeoTIFF, we can apply an offset to our "false tiles" in case the imagery isn't aligned with OpenStreetMap data. This option may help address part of #31 as well

Running through the readme sequentially results in a warning

Currently, if you run through the label-maker readme example - with the togo country config.json, and running in order:

label-maker download
label-maker labels
label-maker preview -n 10
label-maker images
label-maker package

The last step produces a warning for a missing file:

Saving QA tiles to data/togo.mbtiles
   100%     26.6 MiB       1.9 MiB/s            0:00:00 ETA
Retiling QA Tiles to zoom level 12 (takes a bit)
174704 features, 10870707 bytes of geometry, 4 bytes of separate metadata, 2597853 bytes of string pool
  99.9%  12/2060/1976
Determining labels for each tile
---
Roads: 11 tiles
Buildings: 9 tiles
Total tiles: 12
Writing out labels to data/labels.npz
Writing example images to data/examples
Downloading at most 10 tiles for class Roads
Downloading at most 10 tiles for class Buildings
Downloading 11 tiles to data/tiles
Couldn't open data/tiles/2063-1978-12.jpg, skipping
Saving packaged file to data/data.npz

Admittedly I haven't done enough digging yet, but it would look like there's possibly an off-by-one error here.

Better OpenStreetMap QA data fetching

Right now the process of obtaining and preparing OSM QA tiles is a bit slow because we:

Download an entire country QA file
Use tippecanoe-decode + stream-filter.js to create a GeoJSON of all features in just the desired bounding box.
Retile this data using tippecanoe to the desired zoom level.

In theory, this would be possible to do much faster (although with more dependencies) if there were a way to fetch arbitrary tiles (or bounding boxes) from OSM QA data through another method (possibly a third party service)

Standardize TIF extensions

This is part bug, part standardization, and part feature request. Right now preview doesn't work with TIF extensions. We also use two different lists for deciding if an imagery source should be handled differently. Finally, we should add .vrt to this list because we can read from Virtual Rasters just like GeoTIFF.

My recommended approach would be adding a list or function in utils.py and import that in other scripts.

downloading using a COG and download endpoint results in tiles without an extension

I am attempting to use label-maker images with a COG download endpoint. The tiles that are downloaded have no extension (and are pretty big). I believe this is due to the COG not being identified as a TIF.

An example of the download endpoint url is:

'https://api.planet.com/data/v1/download?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJRanpYU2t0cENBUjJKR0tZQmdXQ0k4SGtKMlkwY3VPTXdtbVNqRW1xQXlxRVBIYmNDZVNPTXBKd3dYQUFFdDF6MENYbng4bVVXMlVUNG1ubmxDUERtdz09IiwiaXRlbV90eXBlX2lkIjoiUFNPcnRob1RpbGUiLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsImV4cCI6MTUyNDg3NzIyMiwiaXRlbV9pZCI6Ijc2MDgxOF80ODQ4NzE4XzIwMTctMDktMTdfMGUyZiIsImFzc2V0X3R5cGUiOiJ2aXN1YWwifQ.W1Zcqw4MJ2A5anAuzkYW5UWS0C8jqjccg62YCSj7mjFtGTCCUsMOZ9HxhUxg9_KpLzt8_GGXn0YHdnvCLxqKew'

The function determining if an image is TIF (here) looks at the extension of the imagery entry in the config file. The URL given above, which is the imagery entry in my config file, has no extension.v Therefore, the image is not identified as TIF. Because the image is not identified as a TIF, it is downloaded as a TMS (here), which again looks for an extension (and finds none) and uses that to apply to the downloaded tile.

Is there a way I can force label-maker images to download my COG from the endpoint as a TIF?

Commands will fail if terminal pwd isn't in `/label-maker`

This fails:
~/Builds/label-maker/examples $ label-maker download
Breaking on this relative filepath here

This works:
~/Builds/label-maker $ label-maker download

Probably need to at least add something to the docs, but probably fix it so it's more agnostic of pwd

Installing label-maker on Ubuntu

I am attempting to install label-maker on Ubuntu and the following error message came up. Are the dependencies compatable with Linux systems?

Using cached https://files.pythonhosted.org/packages/77/d9/d272b38e6e25d2686e22f6058820298dadead69340b1c57ff84c87ef81f0/pycurl-7.43.0.1.tar.gz
  Complete output from command python setup.py egg_info:
  Traceback (most recent call last):
    File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 104, in configure_unix
    stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'curl-config': 'curl-config'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 841, in <module>
    ext = get_extension(sys.argv, split_extension_source=split_extension_source)
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 508, in get_extension
    ext_config = ExtensionConfiguration(argv)
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 72, in __init__
    self.configure()
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 108, in configure_unix
    raise ConfigurationError(msg)
__main__.ConfigurationError: Could not run curl-config: [Errno 2] No such file or directory: 'curl-config': 'curl-config'
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 104, in configure_unix
    stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'curl-config': 'curl-config'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 841, in <module>
    ext = get_extension(sys.argv, split_extension_source=split_extension_source)
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 508, in get_extension
    ext_config = ExtensionConfiguration(argv)
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 72, in __init__
    self.configure()
  File "/tmp/pip-install-_pmpop_f/pycurl/setup.py", line 108, in configure_unix
    raise ConfigurationError(msg)
__main__.ConfigurationError: Could not run curl-config: [Errno 2] No such file or directory: 'curl-config': 'curl-config'

----------------------------------------
  Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-_pmpop_f/pycurl/

U-Net segmentation example

Implement vanilla U-net model in Keras (carry simplified code from skynet-deeptrain)
Test it
.md to show how to train model on google's cloud
Write simple utility function code to plot a few segmentation predictions

Ideally, run this example on Google's cloud. TPUs (google accelerated hardware) don't look to be ready yet for prime time yet, but I've signed up for alerts

Update pylintrc so pylint is less annoying

In the pylint rc, we should tell it to ignore some of the warnings we've been manually overriding frequently and we're okay with ignoring.

So far, I'm thinking these:

global-statement
too-many-locals
too-many-arguments
too-many-branches

Better generator integration/example

Creating a data.npz file may result in too large a file for some projects. At that point it may be better to read the training data in with a generator (example: Keras flow_from_directory). The easiest way to do this now would be to skip the final step (label-maker package) and read directly from the labels.npz and the tiles directory. We should create either (1) an example showing how to do this or (2) additional scripts which perform this for the user depending upon configuration (or a combination of the two).

an error occurs while packaging the training dataset for object-detection

an error:
`Traceback (most recent call last):
File "/Users/xxxx/.virtualenvs/ml_data_gen_py3/bin/label-maker", line 11, in
load_entry_point('label-maker', 'console_scripts', 'label-maker')()
File "/Users/xxx/Documents/Development_Seed/Sat_Deeptrain/label-maker/label_maker/main.py", line 87, in cli
package_directory(dest_folder=dest_folder, **config)
File "/Users/xxx/Documents/Development_Seed/Sat_Deeptrain/label-maker/label_maker/package.py", line 48, in package_directory
for tile in tiles.files:
AttributeError: 'numpy.ndarray' object has no attribute 'files'
original codes over there are:

    if ml_type == 'object-detection':
        max_features = 0
        for tile in tiles.files:
            features = len(tiles[tile][0])
            if features > max_features:
                max_features = features

and fixed by

    if ml_type == 'object-detection':
        max_features = 0
        for tile in labels.files:
            features = len(labels[tile])
            if features > max_features:
                max_features = features

seems worked after the fixing, if I understood the workflow correctly.

Improve bounding box edge handling

The current behavior will clip the label to the bounding box but not the image

These should be made consistent (probably both clipped)

Avoid forcing data type for image arrays

In some places we force the dtype to be uint8 (i.e., integer on interval [0-255]). See package.py for an example.

In some situations, we might want to package tifs or raw satellite imagery where the data either isn't made up of integers or doesn't fall on this [0, 255] range. For example, it might be pre-scaled from [0-1] or extend beyond the standard range of RGB vals.

Creating labels is endless on big mbtiles

Tried to create labels on
Russia:

{
    "country": "russia",
    "bounding_box": [36.72353152233891, 55.3729665317045, 36.74095515209966, 55.38046857198313],
    "zoom": 12,
    "classes": [
        { "name": "Buildings", "filter": ["has", "building"] }
    ],
    "imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=TOKEN",
    "background_ratio": 1,
    "ml_type": "classification"
}

and US:

{
    "country": "united_states_of_america",
    "bounding_box": [-99.79180928271484,32.42732216399054,-99.67628117602538,32.51812432193046],
    "zoom": 12,
    "classes": [
        { "name": "Buildings", "filter": ["has", "building"] }
    ],
    "imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=TOKEN",
    "background_ratio": 1,
    "ml_type": "classification"
}

when running label-maker download process seems endless.
I started process for the whole night for russia, in the morning it was still active with no any results (geojson file has 0 bytes)

developmentseed / label-maker Goto Github PK

label-maker's People

Stargazers

Watchers

Forkers

label-maker's Issues

System information

System information

I got a error : NotfoundError, when I try to creat train_buildings.record like following:

Recommend Projects

Recommend Topics

Recommend Org